View article

‘Teach me to fish’Querying Semantic Data Lakes

Authors

Mohamed Nadjib Mami, Hajira Jabeen, Sören Auer

Description

We have recently made a huge leap in terms of data formats, data modalities, and storage capabilities. Dozens of data storage techniques have been created as a result. Today, we are able to store clusterwide data, and to choose a storage technique that suits our application needs, rather than the opposite. If different data stores are interlinked and integrated, this data can generate valuable knowledge and insights. In this article, we present an approach that uses semantic technologies to query heterogeneous Big Data stored in a Data Lake in a unified manner. Our approach is based on equipping original data stored in the Data Lake with mappings and adding transformations to the SPARQL query syntax to make heterogeneous data joinable across the Data Lake. We devise an implementation, named Sparkall, that uses Apache Spark as the underlying query engine. Our evaluation demonstrates the feasibility and efficiency of Sparkall in querying five popular data sources.

Scholar articles

‘Teach me to fish’Querying Semantic Data Lakes

MN Mami, H Jabeen, S Auer