Vous constatez une erreur ?
NaN:NaN
00:00
Antoine Caillon, doctorant de Sorbonne Université, soutient sa thèse "Apprentissage temporel hiérarchique pour la synthèse audio neuronale de la musique" menée dans l'équipe Représentations Musicales du laboratoire Ircam STMS sous la direction de Jean Bresson et Philippe Esling.
Son jury sera composé de :
Simon Colton 	Rapporteur - Queen Mary University of London (Royaume-Uni)
Bob Sturm 		Rapporteur - Royal institute of technology (Suède)
Michèle Sebag 	Examinateur - Université Paris Saclay
Patrick Gallinari 	Examinateur - Sorbonne Université
Mark Sandler 	Examinateur - Queen Mary University of London (Royaume-Uni)
Jean Bresson 		Directeur de thèse - Sorbonne Université   
Philippe Esling 	Co-directeur de thèse et encadrant - Sorbonne Université
Abstract
Recent advances in deep learning have offered new ways to build models addressing a wide variety of tasks through the optimization of a set of parameters based on minimizing a cost function. Amongst these techniques, probabilistic generative models have yielded impressive advances in text, image and sound generation. However, musical audio signal generation remains a challenging problem. In this thesis, we study how a hierarchical approach to audio modeling can address the musical signal modeling task, while offering different levels of control to the user. Our main hypothesis is that extracting different representation levels of an audio signal allows to abstract the complexity of lower levels for each modeling stage. This would eventually allow the use of lightweight architectures, each modeling a single audio scale. We start by addressing raw audio modeling by proposing an audio model combining Variational Auto Encoders and Generative Adversarial Networks, yielding high-quality 48kHz neural audio synthesis, while being 20 times faster than real time on CPU. Then, we study how autoregressive models can be used to understand the temporal behavior of the representation yielded by this low-level audio model, using optional additional conditioning signals such as acoustic descriptors or tempo. Finally, we propose a method for using all the proposed models directly on audio streams, allowing their use in realtime applications that we developed during this thesis.
21 février 2023 01 h 06 min
Vous constatez une erreur ?
 1, place Igor-Stravinsky 
 75004 Paris 
 +33 1 44 78 48 43 
Du lundi au vendredi de 9h30 à 19h 
 Fermé le samedi et le dimanche
Hôtel de Ville, Rambuteau, Châtelet, Les Halles
Institut de Recherche et de Coordination Acoustique/Musique
Copyright © 2022 Ircam. All rights reserved.