Authors
Tiantian Liu, Chao Wang, Zhengxiong Li, Ming-Chun Huang, Wenyao Xu, Feng Lin
Publication date
2024/5/11
Journal
ACM Transactions on Sensor Networks
Volume
20
Issue
4
Pages
1-29
Publisher
ACM
Description
As automatic speech recognition evolves, deployment of the voice user interface (VUI) has boomingly expanded. Especially since the COVID-19 pandemic, the VUI has gained more attention in online communication owing to its non-contact property. However, the VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this article, we propose Wavoice, the first noise-resistant multi-modal speech recognition system that fuses two distinct voices sensing modalities (i.e., millimeter-wave signals and audio signals from a microphone) together. One key contribution is to model the inherent correlation between millimeter-wave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion …
Total citations
2023202421
Scholar articles
T Liu, C Wang, Z Li, MC Huang, W Xu, F Lin - ACM Transactions on Sensor Networks, 2024