Authors
Ridha Khedri, Fei Chiang, Khair Eddin Sabri
Publication date
2013/1/1
Journal
Procedia Computer Science
Volume
21
Pages
50-59
Publisher
Elsevier
Description
There has been a proliferation in the amount of data being generated and collected in the past several years. One of the leading factors contributing to this increased data scale is cheaper commodity storage, making it easier for organisations to house large data stores containing massive amounts of historical data. To effectively analyse these data sets, a preprocessing step is often required as most real data sets are inherently dirty and inconsistent. Existing data cleaning tools have focused on cleaning the errors at hand. In this paper, we take a more formal approach and propose the use of information algebra as a general theory to describe structured data sets and data cleaning. We formally define the notion of association rule, association function, and we present results relating these concepts. We also propose an algorithm for generating association rules from a given structured data set.
Total citations
2014201520162017201820192020202120221113612
Scholar articles
R Khedri, F Chiang, KE Sabri - Procedia Computer Science, 2013