View article

[PDF] from helsinki.fi

Inducing the morphological lexicon of a natural language from unannotated text

Authors

Mathias Johan Philip Creutz, Krista Hannele Lagus

Publication date

2005/6

Conference

International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR'05)

Pages

106-113

Description

This work presents an algorithm for the unsupervised learning, or induction, of a simple morphology of a natural language. A probabilistic maximum a posteriori model is utilized, which builds hierarchical representations for a set of morphs, which are morpheme-like units discovered from unannotated text corpora. The induced morph lexicon stores parameters related to both the “meaning” and “form” of the morphs it contains. These parameters affect the role of the morphs in words. The model is implemented in a task of unsupervised morpheme segmentation of Finnish and English words. Very good results are obtained for Finnish and almost as good results are obtained in the English task.

Total citations

Cited by 250

200520062007200820092010201120122013201420152016201720182019202020212022202320244 7 14 8 22 25 16 23 21 18 19 17 13 10 8 8 5 4 3 2

Scholar articles

Inducing the morphological lexicon of a natural language from unannotated text

MJP Creutz, KH Lagus - … Knowledge Representation and Reasoning (AKRR'05), 2005