Authors
Yun Li, Zhang Liu, Yueyue Na, Ziteng Wang, Biao Tian, Qiang Fu
Publication date
2020/4/21
Conference
45th IEEE ICASSP conference 2020
Pages
4442-4446
Description
Separating the target speech in multi-talker noisy environment is a challenging problem for audio-only source separation algorithms. The major problem behind is that the separated speech from the same talker can switch among the outputs across consecutive segments, causing the talker permutation issue. In this paper, we deploy face tracking and propose the low-dimension hand-crafted visual features and the low-cost deep fusion architectures to separate the unseen but visible target sources in multi-talker noisy environment. It is shown that our approach is not only capable of addressing the talker permutation issue but also producing additional separation improvement in challenging mixtures such as the same-gender overlapping ones on the public dataset. We also show that the significant improvement of the target speech recognition is achieved on the simulated real-world dataset. Our training is …
Total citations
20212022202320241112
Scholar articles
Y Li, Z Liu, Y Na, Z Wang, B Tian, Q Fu - ICASSP 2020-2020 IEEE International Conference on …, 2020