Authors
Jialu Liu, Xiang Ren, Jingbo Shang, Taylor Cassidy, Clare R Voss, Jiawei Han
Publication date
2016/4/11
Book
Proceedings of the 25th international conference on World wide web
Pages
1057-1067
Description
Many text mining approaches adopt bag-of-words or -grams models to represent documents. Looking beyond just the words, fiie, the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation. But these methods are not desirable when applied to vertical domains (eg, literature, enterprise, etc) due to low coverage of in-domain concepts in the general knowledge base and interference from out-of-domain concepts. In this paper, we propose a data-driven model named Latent Keyphrase Inference LAKI) that represents documents with a vector of closely related domain keyphrases instead of single words or existing concepts in the knowledge base. We show that given a corpus of in-domain documents …
Total citations
201620172018201920202021202220232024514861112
Scholar articles
J Liu, X Ren, J Shang, T Cassidy, CR Voss, J Han - Proceedings of the 25th international conference on …, 2016