Authors
Hui Guo, Jalal Mahmud, Yevgen Borodin, Amanda Stent, I Ramakrishnan
Publication date
2007/9/23
Conference
Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
Volume
2
Pages
929-933
Publisher
IEEE
Description
In this paper, we describe a general-purpose approach for partitioning Web page content. The novelty of our ap- proach lies in the use of detailed layout information from a Web page renderer to determine spatial locality and identify visual separators, and the use of relaxed matching over pre- sentation style information to determine presentation style similarity. We present several examples to illustrate the gen- erality of our approach.
Total citations
200620072008200920102011201220132014201520162017201820192020202111132439422211
Scholar articles
H Guo, J Mahmud, Y Borodin, A Stent, I Ramakrishnan - Ninth International Conference on Document Analysis …, 2007