Authors
Noah B Coccaro
Publication date
2005
Institution
University of Colorado at Boulder
Description
This thesis explores the use of Latent Semantic Analysis to augment an N-gram language model to improve the accuracy of a large vocabulary speech recognition system. This thesis discusses possible solutions to three problems presented when integrating LSA with an N-gram model. First, two approaches to deriving a probability from a semantic distance are examined. Numerous parameters are introduced and optimal values found. Second, because the N-gram and LSA model have different strengths, it is necessary to develop confidence metrics that indicate when to rely more strongly on a particular model. Several confidence metrics are developed and used. Lastly, the problem of combining the two probability models is explored. Several different approaches to combining the models, including geometric mean and a decision tree were evaluated. Experimental results compared to a standard trigram model …
Total citations
2009201020112012201320142015201620172018121131