Authors
Hemant Misra, François Yvon, Joemon M Jose, Olivier Cappé
Publication date
2009/11/2
Book
Proceedings of the 18th ACM conference on Information and knowledge management
Pages
1553-1556
Description
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a text into semantically coherent segments. A major benefit of the proposed approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications like segment retrieval and discourse analysis. The new approach outperforms a standard baseline method and yields significantly better performance than most of the available unsupervised methods on a benchmark dataset.
Total citations
2010201120122013201420152016201720182019202020212022202320248811131016611175761051
Scholar articles
H Misra, F Yvon, JM Jose, O Cappé - Proceedings of the 18th ACM conference on …, 2009