Authors
Georg Gottlob, Christoph Koch
Publication date
2004/1/1
Journal
Journal of the ACM (JACM)
Volume
51
Issue
1
Pages
74-113
Publisher
ACM
Description
Research on information extraction from Web pages (wrapping) has seen much activity recently (particularly systems implementations), but little work has been done on formally studying the expressiveness of the formalisms proposed or on the theoretical foundations of wrapping. In this paper, we first study monadic datalog over trees as a wrapping language. We show that this simple language is equivalent to monadic second order logic (MSO) in its ability to specify wrappers. We believe that MSO has the right expressiveness required for Web information extraction and propose MSO as a yardstick for evaluating and comparing wrappers. Along the way, several other results on the complexity of query evaluation and query containment for monadic datalog over trees are established, and a simple normal form for this language is presented. Using the above results, we subsequently study the kernel fragment Elog of …
Total citations
2003200420052006200720082009201020112012201320142015201620172018201920202021202220232024152134281813202612712185564133342