Authors
Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, Ramanathan Guha, Anant Jhingran, Tapas Kanungo, Sridhar Rajagopalan, Andrew Tomkins, John A Tomlin, Jason Y Zien
Publication date
2003/5/20
Book
Proceedings of the 12th international conference on World Wide Web
Pages
178-186
Description
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap …
Total citations
2003200420052006200720082009201020112012201320142015201620172018201920202021202220232024156069696360564446393236292619171578631
Scholar articles
S Dill, N Eiron, D Gibson, D Gruhl, R Guha, A Jhingran… - Proceedings of the 12th international conference on …, 2003