Authors
Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart
Publication date
2012/4/16
Conference
Proceedings of the 21st international conference on World Wide Web
Pages
829-838
Publisher
ACM
Description
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design …
Total citations
2012201320142015201620172018201920202021202271134212221
Scholar articles
T Furche, G Gottlob, G Grasso, X Guo, G Orsi… - Proceedings of the 21st international conference on …, 2012