Authors
Alexander Schätzle, Martin Przyjaciel-Zablocki, Georg Lausen
Publication date
2011/6/12
Book
Proceedings of the International Workshop on Semantic Web Information Management
Pages
1-8
Description
In this paper we investigate the scalable processing of complex SPARQL queries on very large RDF datasets. As underlying platform we use Apache Hadoop, an open source implementation of Google's MapReduce for massively parallelized computations on a computer cluster. We introduce PigSPARQL, a system which gives us the opportunity to process complex SPARQL queries on a MapReduce cluster. To this end, SPARQL queries are translated into Pig Latin, a data analysis language developed by Yahoo! Research. Pig Latin programs are executed by a series of MapReduce jobs on a Hadoop cluster. We evaluate the processing of SPARQL queries by means of PigSPARQL using the SP2Bench, a SPARQL specific performance benchmark and demonstrate that PigSPARQL enables a scalable execution of SPARQL queries based on Hadoop without any additional programming efforts.
Total citations
20112012201320142015201620172018201920202021202220231131020814171999943
Scholar articles
A Schätzle, M Przyjaciel-Zablocki, G Lausen - Proceedings of the International Workshop on …, 2011