Authors
Donatella Firmani, Massimo Mecella, Monica Scannapieco, Carlo Batini
Publication date
2016/3
Journal
Data Science and Engineering
Volume
1
Pages
6-20
Publisher
Springer Berlin Heidelberg
Description
In this paper, we discuss the application of concept of data quality to big data by highlighting how much complex is to define it in a general way. Already data quality is a multidimensional concept, difficult to characterize in precise definitions even in the case of well-structured data. Big data add two further dimensions of complexity: (i) being “verysource specific, and for this we adopt the interesting UNECE classification, and (ii) being highly unstructured and schema-less, often without golden standards to refer to or very difficult to access. After providing a tutorial on data quality in traditional contexts, we analyze big data by providing insights into the UNECE classification, and then, for each type of data source, we choose a specific instance of such a type (notably deep Web data, sensor-generated data, and Twitters/short texts) and discuss how quality dimensions can be defined in these cases. The overall …
Total citations
201620172018201920202021202220232024281912131415216
Scholar articles
D Firmani, M Mecella, M Scannapieco, C Batini - Data Science and Engineering, 2016