information

Type
Soutenance de thèse/HDR
performance location
Ircam, Salle Igor-Stravinsky (Paris)
duration
01 h 59 min
date
March 10, 2023

PhD thesis defense of Constance Douwes

Constance DOUWES wrote her thesis « On the environmental impact of deep generative models for audio » in the team Musical Representations of the STMS laboratory (Ircam-Sorbonne University-CNRS-Ministry of Culture) and in the Doctoral School of Computer Science, Telecommunications and Electronics of Paris. The Jury is composed of: Nick Bryan-Kinns - Reporter - Queen Mary, University of London (UK) Sebastien Loustau - Reporter - LMAP, Université de Pau et des Pays de l’Adour (FR) Evripidis Bampis - Reviewer - LIP6, Sorbonne Université, CNRS (FR) Peter Bryzgalov - Reviewer - Chiba Institute of Technology (JP) Emma Strubell - Reviewer- Carnegie Mellon University (US) Geoffroy Peeters - Reviewer - LTCI, Telecom Paris (FR) Jean-Pierre Briot - Director - LIP6, Sorbonne Université, CNRS (FR) Philippe Esling - Supervisor - STMS, IRCAM, Sorbonne Université, CNRS (FR) Abstract: In this thesis, we investigate the environmental impact of deep learning models for audio generation and we aim to put computational cost at the core of the evaluation process. In particular, we focus on different types of deep learning models specialized in raw waveform audio synthesis. These models are now a key component of modern audio systems, and their use has increased significantly in recent years. Their flexibility and generalization capabilities make them powerful tools in many contexts, from text-to-speech synthesis to unconditional audio generation. However, these benefits come at the cost of expensive training sessions on large amounts of data, operated on energy-intensive dedicated hardware, which incurs large greenhouse gas emissions. The measures we use as a scientific community to evaluate our work are at the heart of this problem. Currently, deep learning researchers evaluate their works primarily based on improvements in accuracy, log-likelihood, reconstruction, or opinion scores, all of which overshadow the computational cost of generative models. Therefore, we propose using a new methodology based on Pareto optimality to help the community better evaluate their work's significance while bringing energy footprint, and in fine, carbon emissions, at the same level of interest as the sound quality.

speakers


share


Do you notice a mistake?

IRCAM

1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43

opening times

Monday through Friday 9:30am-7pm
Closed Saturday and Sunday

subway access

Hôtel de Ville, Rambuteau, Châtelet, Les Halles

Institut de Recherche et de Coordination Acoustique/Musique

Copyright © 2022 Ircam. All rights reserved.