View article

[PDF] from helsinki.fi

Induction of a simple morphology for highly-inflecting languages

Authors

Mathias Johan Philip Creutz, Krista Hannele Lagus

Publication date

2004

Conference

7th Meeting of the ACL Special Interest Group in Computational Phonology (SIGPHON)

Pages

43-51

Description

This paper presents an algorithm for the unsupervised learning of a simple morphology of a natural language from raw text. A generative probabilistic model is applied to segment word forms into morphs. The morphs are assumed to be generated by one of three categories, namely prefix, suffix, or stem, and we make use of some observed asymmetries between these categories. The model learns a word structure, where words are allowed to consist of lengthy sequences of alternating stems and affixes, which makes the model suitable for highly-inflecting languages. The ability of the algorithm to find real morpheme boundaries is evaluated against a gold standard for both Finnish and English. In comparison with a state-of-the-art algorithm the new algorithm performs best on the Finnish data, and on roughly equal level on the English data.

Total citations

Cited by 97

2004200520062007200820092010201120122013201420152016201720182019202020212022202320241 11 12 6 5 3 7 7 1 13 3 3 7 4 3 2 2 1 4 1 1

Scholar articles

Induction of a simple morphology for highly-inflecting languages

MJP Creutz, KH Lagus - 7th Meeting of the ACL Special Interest Group in …, 2004