Authors
Samir Al-janabi, Ryszard Janicki
Publication date
2016/7/13
Conference
2016 SAI Computing Conference (SAI)
Pages
492-501
Publisher
IEEE
Description
Data cleaning is a critical part of the data transformation stage in data warehousing where the extracted data from relational databases are usually unclean. This may affect critical tasks in different organizations such as data analysis and decision making. Current techniques of data cleaning generally deal with one or two quality aspects. The techniques assume the availability of master data, or that users are involved in data cleaning such as manually placing confidence scores that represent the correctness of the values of data. In this paper, we present a uniform framework and algorithms to integrate data deduplication with inconsistent data repairing and discovering of the accurate values in data. We utilize the embedded density information in data to fix errors based on data density where tuples that are close to each other are packed together. We present a weight model to assign confidence scores that are …
Total citations
20162017201820192020202120222023202422221121
Scholar articles