From the same archive

Mettre en temps une structure musicale : l'activité de composition de Voi(rex) par Philippe Leroux - Nicolas Donin, Jacques Theureau

April 14, 2005 01 h 01 min

Mettre en temps une structure musicale : l'activité de composition de Voi(rex) par Philippe Leroux - Nicolas Donin, Jacques Theureau

April 14, 2005 24 min

L'estimation de fréquences fondamentales multiples

May 12, 2005 52 min

La harpe électroacoustique

February 4, 2005 01 h 18 min

Utilisation de Modalys pour le projet VoxStruments, lutherie numérique intuitive et expressive - Nicholas Ellis, Joël Bensoam

October 17, 2007 49 min

Présentation des travaux l'équipe PdS dans le cadre du projet européen CLOSED : "Closing the Loop of Sound Evaluation and Design" - Olivier Houix

June 27, 2007 01 h 12 min

Sparse overcomplete methods, matching pursuit and basis pursuit - Bob L. Sturm

July 11, 2007 48 min

Transformations de type et de nature de la voix - Snorre Farner, Axel Roebel, Xavier Rodet

September 12, 2007 01 h 07 min

Segmentations et reconnaissances automatiques de phonèmes de la voix, temps différé, temps réel - Pierre Lanchantin, Julien Bloit, Xavier Rodet

September 19, 2007 01 h 13 min

Synthèse de la parole à partir du texte et construction d'une base de données d'unités de la voix - Christophe Veaux, Grégory Beller, Xavier Rodet

September 26, 2007 01 h 00 min

Projet ECOUTE - Jerome Barthelemy, Nicolas Donin, Geoffroy Peeters, Samuel Goldszmidt

October 3, 2007 01 h 12 min

Projet MusicDiscover - David Fenech Saint Genieys

October 10, 2007 01 h 10 min

Projet CASPAR - Jerome Barthelemy, Alain Bonardi

October 24, 2007 50 min

Projet CONSONNES 1ère partie - René Caussé, Vincent Freour, David Roze

November 21, 2007 57 min

Singing Synthesis with Neural Networks

0:00/0:00

Neural networks form the state of the art in modern speech synthesis and the very high quality of state of the art speech synthesis with neural networks motivates this study into using neural networks to improve the quality of singing synthesis.
This work is a first step towards integrating these neural networks into Ircam's singing synthesis system ISiS.

In the presentation we will discuss two approaches for using neural networks in ISiS. Compared to googles Tacotron2 and WaveNet the objective is to achieve increased control over F0 and loudness contours with models that allow training with significantly smaller databases.

First we investigate into using deep neural networks for synthesis of spectral envelops (formant filters) from melody, text, F0 and loudness control parameters aiming to replace the concatenative envelope synthesis in ISiS.
Second, we study a wavenet style speech excitation synthesizer with the aim to replace the Pulse and Noise (PaN) source model in ISiS. In combination these two components are expected to replace the complete signal processing framework used in ISiS.

The presentation will present preliminary results as well as insights into the technical details and the problems we have encountered along the way and which need to be addressed when using neural networks for singing synthesis.

speakers

information

Type
Séminaire / Conférence
performance location
Ircam, Salle Igor-Stravinsky (Paris)
duration
36 min
date
January 29, 2019

Frederik Bous : Singing Synthesis with Neural Networks

Frederik Bous, étudiant de l’université technique de Darmstadt, après son stage de Master of Science - MS, Computational Engineering dans l’équipe Analyse et synthèse des sons du laboratoire STMS (Ircam/CNRS/Sorbonne Université/Ministère de la Culture), fera une présentation de ses travaux :

" Singing Synthesis with Neural Networks "

Neural networks form the state of the art in modern speech synthesis and the very high quality of state of the art speech synthesis with neural networks motivates this study into using neural networks to improve the quality of singing synthesis.
This work is a first step towards integrating these neural networks into Ircam's singing synthesis system ISiS.

In the presentation we will discuss two approaches for using neural networks in ISiS. Compared to googles Tacotron2 and WaveNet the objective is to achieve increased control over F0 and loudness contours with models that allow training with significantly smaller databases.

First we investigate into using deep neural networks for synthesis of spectral envelops (formant filters) from melody, text, F0 and loudness control parameters aiming to replace the concatenative envelope synthesis in ISiS.
Second, we study a wavenet style speech excitation synthesizer with the aim to replace the Pulse and Noise (PaN) source model in ISiS. In combination these two components are expected to replace the complete signal processing framework used in ISiS.

The presentation will present preliminary results as well as insights into the technical details and the problems we have encountered along the way and which need to be addressed when using neural networks for singing synthesis.

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.