View article

[PDF] from researchgate.net

A visual-pilot deep fusion for target speech separation in multi-talker noisy environment

Authors

Yun Li, Zhang Liu, Yueyue Na, Ziteng Wang, Biao Tian, Qiang Fu

Publication date

2020/4/21

Conference

45th IEEE ICASSP conference 2020

Pages

4442-4446

Description

Separating the target speech in multi-talker noisy environment is a challenging problem for audio-only source separation algorithms. The major problem behind is that the separated speech from the same talker can switch among the outputs across consecutive segments, causing the talker permutation issue. In this paper, we deploy face tracking and propose the low-dimension hand-crafted visual features and the low-cost deep fusion architectures to separate the unseen but visible target sources in multi-talker noisy environment. It is shown that our approach is not only capable of addressing the talker permutation issue but also producing additional separation improvement in challenging mixtures such as the same-gender overlapping ones on the public dataset. We also show that the significant improvement of the target speech recognition is achieved on the simulated real-world dataset. Our training is …

Total citations

Cited by 5

20212022202320241 1 1 2

Scholar articles

A visual-pilot deep fusion for target speech separation in multitalker noisy environment

Y Li, Z Liu, Y Na, Z Wang, B Tian, Q Fu - ICASSP 2020-2020 IEEE International Conference on …, 2020