Authors
Giovanni Grasso, Tim Furche, Christian Schallhart
Publication date
2013
Journal
International Conference on World Wide Web (developer track)
Publisher
ACM
Description
Even in the third decade of the Web, scraping web sites remains a challenging task: Most scraping programs are still developed as ad-hoc solutions using a complex stack of languages and tools. Where comprehensive extraction solutions exist, they are expensive, heavyweight, and proprietary.
OXPath is a minimalistic wrapping language that is nevertheless expressive and versatile enough for a wide range of scraping tasks. In this presentation, we want to introduce you to a new paradigm of scraping: declarative navigation--instead of complex scripting or heavyweight, limited visual tools, OXPath turns scraping into a simple two step process: pick the relevant nodes through an XPath expression and then specify which action to apply to those nodes. OXPath takes care of browser synchronisation, page and state management, making scraping as easy as node selection with XPath. To achieve this, OXPath does not …
Total citations
20152016201720182019202020212022332183
Scholar articles
G Grasso, T Furche, C Schallhart - Proceedings of the 22nd international conference on …, 2013