Authors
Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting
Publication date
2006/8/20
Book
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pages
86-95
Description
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.
Total citations
Scholar articles
G Buehrer, S Parthasarathy, A Ghoting - Proceedings of the 12th ACM SIGKDD international …, 2006