Constance DOUWES wrote her thesis « On the environmental impact of deep generative models for audio » in the team Musical Representations of the STMS laboratory (Ircam-Sorbonne University-CNRS-Ministry of Culture) and in the Doctoral School of Computer Science, Telecommunications and Electronics of Paris.
The Jury is composed of:
Nick Bryan-Kinns - Reporter - Queen Mary, University of London (UK)
Sebastien Loustau - Reporter - LMAP, Université de Pau et des Pays de l’Adour (FR)
Evripidis Bampis - Reviewer - LIP6, Sorbonne Université, CNRS (FR)
Peter Bryzgalov - Reviewer - Chiba Institute of Technology (JP)
Emma Strubell - Reviewer- Carnegie Mellon University (US)
Geoffroy Peeters - Reviewer - LTCI, Telecom Paris (FR)
Jean-Pierre Briot - Director - LIP6, Sorbonne Université, CNRS (FR)
Philippe Esling - Supervisor - STMS, IRCAM, Sorbonne Université, CNRS (FR)
Abstract: In this thesis, we investigate the environmental impact of deep learning models for audio generation and we aim to put computational cost at the core of the evaluation process. In particular, we focus on different types of deep learning models specialized in raw waveform audio synthesis. These models are now a key component of modern audio systems, and their use has increased significantly in recent years. Their flexibility and generalization capabilities make them powerful tools in many contexts, from text-to-speech synthesis to unconditional audio generation. However, these benefits come at the cost of expensive training sessions on large amounts of data, operated on energy-intensive dedicated hardware, which incurs large greenhouse gas emissions. The measures we use as a scientific community to evaluate our work are at the heart of this problem. Currently, deep learning researchers evaluate their works primarily based on improvements in accuracy, log-likelihood, reconstruction, or opinion scores, all of which overshadow the computational cost of generative models. Therefore, we propose using a new methodology based on Pareto optimality to help the community better evaluate their work's significance while bringing energy footprint, and in fine, carbon emissions, at the same level of interest as the sound quality.