Authors
Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting
Publication date
2006/8/20
Book
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pages
86-95
Description
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.
Total citations
20062007200820092010201120122013201420152016201720182019202020212022202320242644868586522111
Scholar articles
G Buehrer, S Parthasarathy, A Ghoting - Proceedings of the 12th ACM SIGKDD international …, 2006