View article

[PDF] from annals-csis.org

Automatic extraction of arabic multi-word terms

Authors

Khalid Al Khatib, Amer Badarneh

Publication date

2010/10/18

Conference

Proceedings of the International Multiconference on Computer Science and Information Technology

Pages

411-418

Publisher

IEEE

Description

Whereas a wide range of methods has been conducted to English multi-word terms (MWTs) extraction, relatively few studied have been applied to Arabic MWTs extraction. In this paper, we present an efficient approach for automatic extraction of Arabic MWTs. The approach relies on two main filtering steps: the linguistic filter, where simple part of speech (POS) tagger is used to extract candidate MWTs matching given syntactic patterns, and the statistical filter, where two statistical methods (log-likelihood ratio and C-value) are used to rank candidate MWTs. Many types of variations (e.g. inflectional variants) are taken into consideration to improve the quality of extracted MWTs. We obtained promising results in both coverage and precision of MWTs extraction in our experiments based on environment domain corpus.

Total citations

Cited by 33

201120122013201420152016201720182019202020212022202320241 2 4 7 3 4 3 5 1 1 1

Scholar articles

Automatic extraction of arabic multi-word terms

K Al Khatib, A Badarneh - Proceedings of the International Multiconference on …, 2010