Authors
Rehab Duwairi, Mohammad Al-Refai, Natheer Khasawneh
Publication date
2007/11/18
Conference
2007 Innovations in Information Technologies (IIT)
Pages
446-450
Publisher
IEEE
Description
This paper compares and contrasts two feature selection techniques when applied to Arabic corpus; in particular; stemming, and light stemming were employed. With stemming, words are reduced to their stems. With light stemming, words are reduced to their light stems. Stemming is aggressive in the sense that it reduces words to their 3-letters roots. This affects the semantics as several words with different meanings might have the same root. Light stemming, by comparison, removes frequently used prefixes and suffixes in Arabic words. Light stemming doesn't produce the root and therefore doesn't affect the semantics of words; it maps several words, which have the same meaning to a common syntactical form. The effectiveness of above two feature selection techniques was assessed in a text categorization exercise for Arabic corpus. This corpus consists of 15000 documents that fall into three categories. The K …
Total citations
20102011201220132014201520162017201820192020202120222023202443686310551033231
Scholar articles
R Duwairi, M Al-Refai, N Khasawneh - 2007 Innovations in Information Technologies (IIT), 2007