Among diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, as it jointly nourishes both scientific and artistic practices since its creation. Some processes naturally handle both pathways, hence providing invertible representations of given sounds. On the top of that, recent trends in machine learning gave rise to powerful data-centered methods, raising several epistemological questions amongst researchers about their possible uses and concrete significations. Particularly, generative models focus on the generation of original content from automatically extracted features, not only questioning previous approaches in generation but also how these processes could be exploited for artistic purposes. Particularly, a specific family of generative models called variational methods are based on both unsupervised inference of features and direct generation. The interest of such methods is twofold : first, they resort to Bayesian inference to extract continuous low-dimensional representations, aiming to reflect the underlying structure of a data corpus. Secondly, these continuous representations, called latent spaces, can be inverted to generate the data back, providing powerful synthesis and in-domain interpolation capabilities.
Hence, such bijective systems can be interestingly used for sound synthesis, providing data-centered generation methods whose controls are automatically extracted from the data. Furthermore, the flexibility of such systems allow numerous ways of influencing the construction of these representations, by example with external information or perceptual constraints, such that the training process can also be embedded in their creative use. We will review the generative abilities of these methods when applied to the audio domain, and how the extracted spaces can be used as high-level features for audio analysis. We will also introduce diverse ways of using it for creative purposes, and also how these methods can be extended to integrate the temporal dimension of audio information. Finally, we will present how these generative processes can be embedded in musical and composition tools, and how they can be used to figure a novel use of synthesis algorithms.
Axel Chemla--Romeu-Santos : Manifold-based representations of musical signals and generative spaces
Among diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, as it jointly nourishes both scientific and artistic practices since its creation. Some processes naturally handle both pathways, hence providing invertible representations of given sounds. On the top of that, recent trends in machine learning gave rise to powerful data-centered methods, raising several epistemological questions amongst researchers about their possible uses and concrete significations. Particularly, generative models focus on the generation of original content from automatically extracted features, not only questioning previous approaches in generation but also how these processes could be exploited for artistic purposes. Particularly, a specific family of generative models called variational methods are based on both unsupervised inference of features and direct generation. The interest of such methods is twofold : first, they resort to Bayesian inference to extract continuous low-dimensional representations, aiming to reflect the underlying structure of a data corpus. Secondly, these continuous representations, called latent spaces, can be inverted to generate the data back, providing powerful synthesis and in-domain interpolation capabilities.
Hence, such bijective systems can be interestingly used for sound synthesis, providing data-centered generation methods whose controls are automatically extracted from the data. Furthermore, the flexibility of such systems allow numerous ways of influencing the construction of these representations, by example with external information or perceptual constraints, such that the training process can also be embedded in their creative use. We will review the generative abilities of these methods when applied to the audio domain, and how the extracted spaces can be used as high-level features for audio analysis. We will also introduce diverse ways of using it for creative purposes, and also how these methods can be extended to integrate the temporal dimension of audio information. Finally, we will present how these generative processes can be embedded in musical and composition tools, and how they can be used to figure a novel use of synthesis algorithms.