Singing Synthesis with Neural Networks

Neural networks form the state of the art in modern speech synthesis and the very high quality of state of the art speech synthesis with neural networks motivates this study into using neural networks to improve the quality of singing synthesis.
This work is a first step towards integrating these neural networks into Ircam's singing synthesis system ISiS.

In the presentation we will discuss two approaches for using neural networks in ISiS. Compared to googles Tacotron2 and WaveNet the objective is to achieve increased control over F0 and loudness contours with models that allow training with significantly smaller databases.

First we investigate into using deep neural networks for synthesis of spectral envelops (formant filters) from melody, text, F0 and loudness control parameters aiming to replace the concatenative envelope synthesis in ISiS.
Second, we study a wavenet style speech excitation synthesizer with the aim to replace the Pulse and Noise (PaN) source model in ISiS. In combination these two components are expected to replace the complete signal processing framework used in ISiS.

The presentation will present preliminary results as well as insights into the technical details and the problems we have encountered along the way and which need to be addressed when using neural networks for singing synthesis.

Frederik Bous : Singing Synthesis with Neural Networks

Frederik Bous, étudiant de l’université technique de Darmstadt, après son stage de Master of Science - MS, Computational Engineering dans l’équipe Analyse et synthèse des sons du laboratoire STMS (Ircam/CNRS/Sorbonne Université/Ministère de la Culture), fera une présentation de ses travaux :

" Singing Synthesis with Neural Networks "

Neural networks form the state of the art in modern speech synthesis and the very high quality of state of the art speech synthesis with neural networks motivates this study into using neural networks to improve the quality of singing synthesis.
This work is a first step towards integrating these neural networks into Ircam's singing synthesis system ISiS.

In the presentation we will discuss two approaches for using neural networks in ISiS. Compared to googles Tacotron2 and WaveNet the objective is to achieve increased control over F0 and loudness contours with models that allow training with significantly smaller databases.

First we investigate into using deep neural networks for synthesis of spectral envelops (formant filters) from melody, text, F0 and loudness control parameters aiming to replace the concatenative envelope synthesis in ISiS.
Second, we study a wavenet style speech excitation synthesizer with the aim to replace the Pulse and Noise (PaN) source model in ISiS. In combination these two components are expected to replace the complete signal processing framework used in ISiS.

The presentation will present preliminary results as well as insights into the technical details and the problems we have encountered along the way and which need to be addressed when using neural networks for singing synthesis.

Frederik Bous

conférencier

Singing Synthesis with Neural Networks

Singing Synthesis with Neural Networks

informations

Frederik Bous : Singing Synthesis with Neural Networks

intervenants

partager

IRCAM

heures d'ouverture

accès en transports