View article

[PDF] from ceur-ws.org

A Benchmark for Text Classification in News Recommendations.

Authors

Xinyi Li, Edward C Malthouse

Publication date

2022

Conference

INRA/IWILDS@ SIGIR

Pages

28-37

Description

Text classification is an important task in natural language processing. In the current era, people mainly obtain information from online news resources. It is then important to have an automatic and accurate news classifier to categorize every day’s news stories such that readers can find articles of interested more easily. We use news story data from the McClatchy organization to establish benchmarks on how accurately stories can be classified by multiple existing deep learning classifiers. Among the models we evaluated, Bidirectional Encoder Representations from Transformers (BERT) provides the best accuracy, macro-averaging precision, micro-averaging precision, macro-averaging recall and micro-averaging recall. Different from many other benchmark news data set, McClatchy provides both headline and full-text for each news story. We compare the performance of every deep learning-based classifier using headlines versus full-texts—the top three predicted categories include the labeled value 95% of the time with full-texts training and 92% with headlines only. Furthermore, the defined topics in McClatchy are not mutually exclusive. Some predictions identified as inaccurate are in fact classified into reasonable topics. We further provide a visualization of stories from various defined topics. The predicted results and the visualization of news stories illustrate the untrustworthiness of labeled classes and the intrinsic difficulty of categorizing news stories.

Scholar articles

A Benchmark for Text Classification in News Recommendations.

X Li, EC Malthouse - INRA/IWILDS@ SIGIR, 2022