Authors
Mathias Creutz, Krista Lagus
Publication date
2005
Book
Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0
Publisher
Helsinki University of Technology
Description
In this work, we describe the first public version of the Morfessor software, which is a program that takes as input a corpus of unannotated text and produces a segmentation of the word forms observed in the text. The segmentation obtained often resembles a linguistic morpheme segmentation. Morfessor is not language-dependent. The number of segments per word is not restricted to two or three as in some other existing morphology learning models. The current version of the software essentially implements two morpheme segmentation models presented earlier by us (Creutz and Lagus, 2002; Creutz, 2003).
The document contains user’s instructions, as well as the mathematical formulation of the model and a description of the search algorithm used. Additionally, a few experiments on Finnish and English text corpora are reported in order to give the user some ideas of how to apply the program to his own data sets and how to evaluate the results.
Total citations
200520062007200820092010201120122013201420152016201720182019202020212022202320248222416253130282421212418242619155126
Scholar articles