Modern musical creation unfolds on many different time scales: from the vibration of a string or the resonance of an electronic instrument to the millisecond scale, through the typical few seconds of an instrument note, to the tens of minutes of operas or DJ sets. The intermingling of these multiple scales has led to the development of numerous technical and theoretical tools to make this time-manipulation enterprise effective. These abstractions, such as scales, rhythmic notations or even common models of audio synthesis, largely infuse the current tools - software and hardware - of musical creation. However, these abstractions, which emerged for the most part during the 20th century in the West on the basis of classical musical theories of written music, are not devoid of cultural a priori. They reflect certain principles aimed at erasing certain aspects of the music (e.g. micro-deviations from a metronomic beat or micro-deviations of frequency from an idealised pitch), whose high degree of physical variability typically makes them inconvenient for music writing. These compromises, which are relevant when the written music is intended for performance by musicians, able to reintroduce variation and physical and musical richness, prove limiting in the context of computer-assisted music creation, coldly rendering these abstractions, where they tend to restrict the diversity of music that can be produced. Through the presentation of several typical interfaces for music creation, I show that an essential factor is the scale of human-machine interactions offered by these abstractions. At their most flexible level, such as audio representations or piano-rolls over unquantified time, they prove difficult to manipulate, as they require a high degree of precision, particularly unsuitable for modern mobile and touch terminals. Conversely, in many commonly used abstractions, such as scores or sequencers, in discretised time, they prove to be constraining for the creation of culturally diverse music. In this thesis, I argue that artificial intelligence, through its ability to construct high-level representations of given complex objects, allows for the construction of new scales of musical creation, designed for interaction, and thus proposes radically new approaches to musical creation. I present and illustrate this idea through the design and development of three AI-assisted music creation web prototypes, one of which is based on a novel neural model for the inpainting of musical instrument sounds also designed in the framework of this thesis. These high-level representations – for scores, piano rolls and spectrograms – are deployed at a coarser time-frequency scale than the original data, but better suited to interaction. By allowing localised transformations to be performed on this representation, but also capturing, through statistical modelling, aesthetic specificities and micro-variations of the musical training data, these tools allow musically rich results to be obtained in an easy and controllable way. Through the evaluation of these three prototypes in real conditions by several artists, I show that these new scales of interactive creation are useful for both experts and novices. Thanks to the assistance of AI on technical aspects normally requiring precision and expertise, they are also suitable for use on touch screens and mobile devices.
Soutenance de thèse de Théis Bazin
Théis BAZIN wrote her thesis « Conception de nouvelles échelles de la création musicale à l'aide de l'apprentissage statistique hiérarchique » in the team Musical Representations of the STMS laboratory (Ircam-Sorbonne University-CNRS-Ministry of Culture).
The Jury is composed of:
Pr. Wendy MACKAY – Reporter – Directrice de recherche, INRIA-Saclay, ex-situ research group
Pr. Geoffroy PEETERS – Reporter – Professeur, Image-Data-Signal (IDS) department, LTCI, Telecom Paris, Institut Polytechnique de Paris
Pr. Cheng-Zhi Anna HUANG – Reviewer – Professeure associée, MILA, Université de Montréal – Chercheuse, Google Magenta – Chaire en IA Canada-CIFAR
Dr. Jean BRESSON – Reviewer – Directeur de recherche, RepMus, STMS, IRCAM, Sorbonne Université, CNRS – Team Product Owner, Ableton
Dr. Mikhail MALT – Director – Chargé de recherche RepMus, STMS, IRCAM, Sorbonne Université, CNRS
Dr. Gaëtan HADJERES – Co-director – Senior Research Scientist, SonyAI
Abstract: Modern musical creation unfolds on many different time scales: from the vibration of a string or the resonance of an electronic instrument to the millisecond scale, through the typical few seconds of an instrument note, to the tens of minutes of operas or DJ sets. The intermingling of these multiple scales has led to the development of numerous technical and theoretical tools to make this time-manipulation enterprise effective. These abstractions, such as scales, rhythmic notations or even common models of audio synthesis, largely infuse the current tools - software and hardware - of musical creation. However, these abstractions, which emerged for the most part during the 20th century in the West on the basis of classical musical theories of written music, are not devoid of cultural a priori. They reflect certain principles aimed at erasing certain aspects of the music (e.g. micro-deviations from a metronomic beat or micro-deviations of frequency from an idealised pitch), whose high degree of physical variability typically makes them inconvenient for music writing. These compromises, which are relevant when the written music is intended for performance by musicians, able to reintroduce variation and physical and musical richness, prove limiting in the context of computer-assisted music creation, coldly rendering these abstractions, where they tend to restrict the diversity of music that can be produced. Through the presentation of several typical interfaces for music creation, I show that an essential factor is the scale of human-machine interactions offered by these abstractions. At their most flexible level, such as audio representations or piano-rolls over unquantified time, they prove difficult to manipulate, as they require a high degree of precision, particularly unsuitable for modern mobile and touch terminals. Conversely, in many commonly used abstractions, such as scores or sequencers, in discretised time, they prove to be constraining for the creation of culturally diverse music. In this thesis, I argue that artificial intelligence, through its ability to construct high-level representations of given complex objects, allows for the construction of new scales of musical creation, designed for interaction, and thus proposes radically new approaches to musical creation. I present and illustrate this idea through the design and development of three AI-assisted music creation web prototypes, one of which is based on a novel neural model for the inpainting of musical instrument sounds also designed in the framework of this thesis. These high-level representations -- for scores, piano rolls and spectrograms -- are deployed at a coarser time-frequency scale than the original data, but better suited to interaction. By allowing localised transformations to be performed on this representation, but also capturing, through statistical modelling, aesthetic specificities and micro-variations of the musical training data, these tools allow musically rich results to be obtained in an easy and controllable way. Through the evaluation of these three prototypes in real conditions by several artists, I show that these new scales of interactive creation are useful for both experts and novices. Thanks to the assistance of AI on technical aspects normally requiring precision and expertise, they are also suitable for use on touch screens and mobile devices.