Authors
Danish Contractor, Tanveer A Faruquie, L Venkata Subramaniam
Publication date
2010/8
Conference
Coling 2010: Posters
Pages
189-196
Description
In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message Service (SMS), Twitter and chat. A typical Statistical Machine Translation system is trained on parallel text comprising noisy and clean sentences. In this paper we propose an unsupervised method for the translation of noisy text to clean text. Our method has two steps. For a given noisy sentence, a weighted list of possible clean tokens for each noisy token are obtained. The clean sentence is then obtained by maximizing the product of the weighted lists and the language model scores.
Total citations
2010201120122013201420152016201720182019202020212022202317812121245349764
Scholar articles
D Contractor, TA Faruquie, LV Subramaniam - Coling 2010: Posters, 2010