View article

[PDF] from researchgate.net

Ontology-based text document clustering

Authors

Andreas Hotho, Alexander Maedche, Steffen Staab

Publication date

2002/4

Journal

Volume

Issue

Pages

48-54

Description

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. In this paper, we propose a new approach for applying background knowledge during preprocessing in order to improve clustering results and allow for selection between results. We preprocess our input data applying an ontology-based heuristics for feature selection and feature aggregation. Thus, we construct a number of alternative text representations. Based on these representations, we compute multiple clustering results using K-Means. The results may be distinguished and explained by the corresponding selection of concepts in the ontology. Our results compare favourably with a sophisticated baseline preprocessing strategy.

Total citations

Cited by 451

2001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320249 12 11 15 20 18 21 30 18 34 30 36 35 25 26 27 13 14 13 16 12 5 7 2

Scholar articles

Ontology-based text document clustering

A Hotho, A Maedche, S Staab - KI, 2002