Authors
Danny Wyatt, Tanzeem Choudhury, Jeff A Bilmes, Henry A Kautz
Publication date
2007/1/6
Journal
IJCAI
Volume
7
Pages
1769-1775
Description
In this paper we introduce a new dynamic Bayesian network that separates the speakers and their speaking turns in a multi-person conversation. We protect the speakers’ privacy by using only features from which intelligible speech cannot be reconstructed. The model we present combines data from multiple audio streams, segments the streams into speech and silence, separates the different speakers, and detects when other nearby individuals who are not wearing microphones are speaking. No pre-trained speaker specific models are used, so the system can be easily applied in new and different environments. We show promising results in two very different datasets that vary in background noise, microphone placement and quality, and conversational dynamics.
Total citations
20072008200920102011201220132014201520162017201820192020202120222023202444541332222111
Scholar articles