View article

[PDF] from researchgate.net

Distributed RDF querying on hadoop

Authors

Alexander Schätzle

Publication date

2016/9/23

Institution

PhD thesis, PhD thesis, University of Freiburg

Description

In 2001 Tim Berners-Lee et al. postulated the notion of a so-called Semantic Web, primarily designed for automated consumption by machines, as an extension to the primarily human consumption oriented World Wide Web. In the meanwhile, numerous standards under the aegis of the W3C (World Wide Web Consortium) have turned this vision into real specifications. Among those standards, RDF, a graph framework to describe semantic information in a machine-readable format, and its associated query language SPARQL constitute core layers of the Semantic Web stack. Promoted by leading search engine providers and driven by an increasing interest in Open Data publishing and initiatives like Schema. org, Semantic Web technologies have gained momentum although their potential is still in its infancy. Thus, we expect that the amount of semantic data in general and RDF data in particular will continue to increase on a web scale, requiring distributed solutions for storage and querying.

However, current platforms for Big Data processing like Hadoop lack of native support for RDF and SPARQL. The main advantage of an Hadoop-based approach compared to a dedicated and closed RDF-only solution lies in the highly diverse ecosystem and interoperability of Hadoop. It enables an application to combine different data sources and to leverage various frameworks, all part of an integrated modular environment. Thus, we think that extending Hadoop to deal with RDF data offers a potential for synergies beyond the capabilities of RDF-only systems. Following this intuition, we have investigated the adaption of different Hadoopbased technologies …

Total citations

Cited by 4

2017201820191 1 2

Scholar articles

Distributed RDF querying on hadoop

A Schätzle - 2016