Authors
Ivan Magrin-Chagnolleau, Aaron E Rosenberg, Sarangarajan Parthasarathy
Publication date
1999/3/15
Conference
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258)
Volume
2
Pages
821-824
Publisher
IEEE
Description
The problem of speaker detection in audio databases is addressed in this paper. Gaussian mixture modeling is used to build target speaker and background models. A detection algorithm based on a likelihood ratio calculation is applied to estimate target speaker segments. Evaluation procedures are defined in detail for this task. Results are given for different subsets of the HUB4 broadcast news database. For one target speaker, with the data restricted to high quality speech segments, the segment miss rate is approximately 7%. For unrestricted data, the segment miss rate is approximately 27%. In both cases the segment false alarm rate is 4 or 5 per hour. For two target speakers with unrestricted data, the segment miss rate is approximately 63% with about 27 segment false alarms per hour. The decrease in performance for two target speakers is largely associated with short speech segments in the two target …
Total citations
1999200020012002200320042005200620072008200920102011201220132014201520162017341363221211121
Scholar articles
I Magrin-Chagnolleau, AE Rosenberg, S Parthasarathy - 1999 IEEE International Conference on Acoustics …, 1999