View article

[PDF] from researchgate.net

Importance of high-order n-gram models in morph-based speech recognition

Authors

Teemu Hirsimaki, Janne Pylkkonen, Mikko Kurimo

Publication date

2009/3/27

Journal

IEEE Transactions on Audio, Speech, and Language Processing

Volume

Issue

Pages

724-732

Publisher

IEEE

Description

Speech recognition systems trained for morphologically rich languages face the problem of vocabulary growth caused by prefixes, suffixes, inflections, and compound words. Solutions proposed in the literature include increasing the size of the vocabulary and segmenting words into morphs. However, in many cases, the methods have only been experimented with low-order n-gram models or compared to word-based models that do not have very large vocabularies. In this paper, we study the importance of using high-order variable-length n-gram models when the language models are trained over morphs instead of whole words. Language models trained on a very large vocabulary are compared with models based on different morph segmentations. Speech recognition experiments are carried out on two highly inflecting and agglutinative languages, Finnish and Estonian. The results suggest that high-order …

Total citations

Cited by 141

200820092010201120122013201420152016201720182019202020212022202320241 8 14 6 17 11 11 15 8 15 4 4 7 7 8 2 1

Scholar articles

Importance of high-order n-gram models in morph-based speech recognition

T Hirsimaki, J Pylkkonen, M Kurimo - IEEE Transactions on Audio, Speech, and Language …, 2009