View article

Cross lingual text classification by mining multilingual topics from wikipedia

Authors

Xiaochuan Ni, Jian-Tao Sun, Jian Hu, Zheng Chen

Publication date

2011/2/9

Conference

Proceedings of the fourth ACM international conference on Web search and data mining

Pages

375-384

Publisher

ACM

Description

This paper investigates how to effectively do cross lingual text classification by leveraging a large scale and multilingual knowledge base, Wikipedia. Based on the observation that each Wikipedia concept is described by documents of different languages, we adapt existing topic modeling algorithms for mining multilingual topics from this knowledge base. The extracted topics have multiple types of representations, with each type corresponding to one language. In this work, we regard such topics extracted from Wikipedia documents as universal-topics, since each topic corresponds with same semantic information of different languages. Thus new documents of different languages can be represented in a space using a group of universal-topics. We use these universal-topics to do cross lingual text classification. Given the training data labeled for one language, we can train a text classifier to classify the documents of …

Total citations

Cited by 83

20112012201320142015201620172018201920202021202220231 10 13 9 10 11 7 6 6 1 3 4 2

Scholar articles

Cross lingual text classification by mining multilingual topics from wikipedia

X Ni, JT Sun, J Hu, Z Chen - Proceedings of the fourth ACM international …, 2011

Cited by 83 Related articles