Authors
Saikat Mukherjee, Guizhen Yang, IV Ramakrishnan
Publication date
2003
Conference
The Semantic Web-ISWC 2003: Second International Semantic Web Conference, Sanibel Island, FL, USA, October 20-23, 2003. Proceedings 2
Pages
533-549
Publisher
Springer Berlin Heidelberg
Description
Although RDF/XML has been widely recognized as the standard vehicle for representing semantic information on the Web, an enormous amount of semantic data is still being encoded in HTML documents that are designed primarily for human consumption and not directly amenable to machine processing. This paper seeks to bridge this semantic gap by addressing the fundamental problem of automatically annotating HTML documents with semantic labels. Exploiting a key observation that semantically related items exhibit consistency in presentation style as well as spatial locality in template-based content-rich HTML documents, we have developed a novel framework for automatically partitioning such documents into semantic structures. Our framework tightly couples structural analysis of documents with semantic analysis incorporating domain ontologies and lexical databases such as WordNet. We …
Total citations
200320042005200620072008200920102011201220132014201520162017201820192020202120221420161579552243311112
Scholar articles
S Mukherjee, G Yang, IV Ramakrishnan - The Semantic Web-ISWC 2003: Second International …, 2003