Authors
Thabit Sabbah, Ali Selamat, Md Hafiz Selamat, Fawaz S Al-Anzi, Enrique Herrera Viedma, Ondrej Krejcar, Hamido Fujita
Publication date
2017/9/1
Journal
Applied Soft Computing
Volume
58
Pages
193-206
Publisher
Elsevier
Description
With the rapid growth of textual content on the Internet, automatic text categorization is a comparatively more effective solution in information organization and knowledge management. Feature selection, one of the basic phases in statistical-based text categorization, crucially depends on the term weighting methods In order to improve the performance of text categorization, this paper proposes four modified frequency-based term weighting schemes namely; mTF, mTFIDF, TFmIDF, and mTFmIDF. The proposed term weighting schemes take the amount of missing terms into account calculating the weight of existing terms. The proposed schemes show the highest performance for a SVM classifier with a micro-average F1 classification performance value of 97%. Moreover, benchmarking results on Reuters-21578, 20Newsgroups, and WebKB text-classification datasets, using different classifying algorithms such as …
Total citations
2017201820192020202120222023202411517272026237
Scholar articles
T Sabbah, A Selamat, MH Selamat, FS Al-Anzi… - Applied Soft Computing, 2017