View article

[PDF] from arxiv.org

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

Authors

Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li

Publication date

2020/10/22

Journal

ICASSP 2021

Description

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition. However, compared to LSTM models, the heavy computational cost of the Transformer during inference is a key issue to prevent their applications. In this work, we explored the potential of Transformer Transducer (T-T) models for the fist pass decoding with low latency and fast speed on a large-scale dataset. We combine the idea of Transformer- XL and chunk-wise streaming processing to design a streamable Transformer Transducer model. We demonstrate that T-T outperforms the hybrid model, RNN Transducer (RNN-T), and streamable Transformer attention-based encoder-decoder model in the streaming scenario. Furthermore, the runtime cost and latency can be optimized with a relatively small look-ahead.

Total citations

Cited by 186

202020212022202320241 21 51 80 33

Scholar articles

Developing real-time streaming transformer transducer for speech recognition on large-scale dataset

X Chen, Y Wu, Z Wang, S Liu, J Li - ICASSP 2021-2021 IEEE International Conference on …, 2021