View article

[PDF] from psu.edu

Out-of-core frequent pattern mining on a commodity pc

Authors

Gregory Buehrer, Srinivasan Parthasarathy, Amol Ghoting

Publication date

2006/8/20

Book

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages

86-95

Description

In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset mining algorithms and their drawbacks, we introduce our efficient, highly scalable solution. Presented in the context of the FPGrowth algorithm, our technique involves several novel I/O-conscious optimizations, such as approximate hash-based sorting and blocking, and leverages recent architectural advancements in commodity computers, such as 64-bit processing. We evaluate the proposed optimizations on truly large data sets,up to 75GB, and show they yield greater than a 400-fold execution time improvement. Finally, we discuss the impact of this research in the context of other pattern mining challenges, such as sequence mining and graph mining.

Total citations

Cited by 69

20062007200820092010201120122013201420152016201720182019202020212022202320242 6 4 4 8 6 8 5 8 6 5 2 2 1 1 1

Scholar articles

Out-of-core frequent pattern mining on a commodity pc

G Buehrer, S Parthasarathy, A Ghoting - Proceedings of the 12th ACM SIGKDD international …, 2006