Authors
Matthew Eric Otey, Adriano Veloso, Chao Wang, Srinivasan Parthasarathy, Wagner Meira Jr
Publication date
2003/11
Journal
The Third IEEE International Conference on Data Mining (ICDM’03), Melbourne, FL
Description
Traditional methods for data mining typically make the assumptions that the data is centralized and static. These assumptions are no longer tenable. Such methods impose excessive communication overhead when data is distributed. Also, they waste computational and I/O resources when data is dynamic. In this paper we present what we believe to be the first data mining approach that overcomes all these assumptions. In fact, we consider a broader scenario in which the data is continuously updated and stored at geographically different locations. This scenario imposes several challenges to data mining, especially those concerning performance and interactivity. Our approach makes use of parallel and incremental techniques to generate frequent itemsets even in the presence of data updates without examining the entire database. It also imposes minimal communication overhead when mining distributed databases. Further, our approach is capable of generating both local models (in which each site has a summary of its own database) as well as the global model of frequent itemsets (in which all sites have a summary of the entire database). This ability permits our approach not only to generate frequent itemsets, but also high-contrast frequent itemsets, from which users can know those itemsets that have their supports unevenly distributed among the distributed databases.
Total citations
Scholar articles
ME Otey, A Veloso, C Wang, S Parthasarathy… - The Third IEEE International Conference on Data …, 2003