View article

[PDF] from arxiv.org

Unsupervised discovery of morphemes

Authors

Mathias Creutz, Krista Lagus

Publication date

2002/5/21

Journal

arXiv preprint cs/0205057

Description

We present two methods for unsupervised segmentation of words into morpheme-like units. The model utilized is especially suited for languages with a rich morphology, such as Finnish. The first method is based on the Minimum Description Length (MDL) principle and works online. In the second method, Maximum Likelihood (ML) optimization is used. The quality of the segmentations is measured using an evaluation method that compares the segmentations produced to an existing morphological analysis. Experiments on both Finnish and English corpora show that the presented methods perform well compared to a current state-of-the-art system.

Total citations

Cited by 484

20032004200520062007200820092010201120122013201420152016201720182019202020212022202320248 14 18 22 22 13 27 25 15 16 20 24 16 35 26 34 20 36 39 24 16 9

Scholar articles

Unsupervised discovery of morphemes

M Creutz, K Lagus - arXiv preprint cs/0205057, 2002