Séminaire / Conférence
Participants
  • Laurent Benaroya (conférencier)
  • Marco Liuni (conférencier)
  • Axel Roebel (conférencier)
  • Geoffroy Peeters (conférencier)

This seminar presents research undertaken by the Analysis/Synthesis team in the European project 3DTVS (3D TV Content Search). This projet deals with multimodal search and indexing in 3D TV Content and IRCAM contributes to the project with algorithms that work on the description of the multichannel audio scene. This rather ambitious objective is made tractable by means of focusing on the detection of specific audio events, only.

Two rather complementary techniques are investigated in the project. The first approach is based on audio event detection using
classification methods. The audio events considered are speech and music detection. We introduce a multichannel extension of the present
classification system, “ircamclass” and propose for the extended system several information fusion strategies. These are evaluated on a dataset of 4 films and we show that they give better results than the baseline classification system on a mono down-mix of all channels.

The second approach is based on extensions of nonnegative matrix factorization (NMF) algorithms to multichannel audio resulting in
nonnegative tensor factorization NTF and nonnegative tensor deconvolution (NTD). The NTD algorithm will be used in the project
to detect, localize, and eventually separate, sources of selected audio events.

The presentation will describe the research objectives of the project, the results obtained so far, and an outlook on the results that are expected until the end of the project.