April 14, 2005 01 h 01 min
April 14, 2005 24 min
May 12, 2005 52 min
February 4, 2005 01 h 18 min
October 17, 2007 49 min
June 27, 2007 01 h 12 min
July 11, 2007 48 min
September 12, 2007 01 h 07 min
September 19, 2007 01 h 13 min
September 26, 2007 01 h 00 min
October 3, 2007 01 h 12 min
October 10, 2007 01 h 10 min
October 24, 2007 50 min
November 21, 2007 57 min
0:00/0:00
This seminar presents research undertaken by the Analysis/Synthesis team in the European project 3DTVS (3D TV Content Search). This projet deals with multimodal search and indexing in 3D TV Content and IRCAM contributes to the project with algorithms that work on the description of the multichannel audio scene. This rather ambitious objective is made tractable by means of focusing on the detection of specific audio events, only.
Two rather complementary techniques are investigated in the project. The first approach is based on audio event detection using
classification methods. The audio events considered are speech and music detection. We introduce a multichannel extension of the present
classification system, “ircamclass” and propose for the extended system several information fusion strategies. These are evaluated on a dataset of 4 films and we show that they give better results than the baseline classification system on a mono down-mix of all channels.
The second approach is based on extensions of nonnegative matrix factorization (NMF) algorithms to multichannel audio resulting in
nonnegative tensor factorization NTF and nonnegative tensor deconvolution (NTD). The NTD algorithm will be used in the project
to detect, localize, and eventually separate, sources of selected audio events.
The presentation will describe the research objectives of the project, the results obtained so far, and an outlook on the results that are expected until the end of the project.