Authors
Derek Greene, Derek O’Callaghan, Pádraig Cunningham
Publication date
2014
Conference
Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2014, Nancy, France, September 15-19, 2014. Proceedings, Part I 14
Pages
498-513
Publisher
Springer Berlin Heidelberg
Description
Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the“over-clustering” of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can …
Total citations
20152016201720182019202020212022202320247162932414044383418
Scholar articles
D Greene, D O'Callaghan, P Cunningham - Machine Learning and Knowledge Discovery in …, 2014