View article

[PDF] from uniroma1.it

Max-cover in map-reduce

Authors

Flavio Chierichetti, Ravi Kumar, Andrew Tomkins

Publication date

2010/4/26

Conference

Proceedings of the 19th international conference on World wide web

Pages

231-240

Publisher

ACM

Description

The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to maximize the size of the union. This classic problem occurs commonly in many settings in web search and advertising. For moderately-sized instances, a greedy algorithm gives an approximation of (1-1/e). However, the greedy algorithm requires updating scores of arbitrary elements after each step, and hence becomes intractable for large datasets.

We give the first max cover algorithm designed for today's large-scale commodity clusters. Our algorithm has provably almost the same approximation as greedy, but runs much faster. Furthermore, it can be easily expressed in the MapReduce programming paradigm, and requires only polylogarithmically many passes over the data. Our experiments on five large problem instances show that our algorithm is practical and can achieve good speedups compared to the sequential greedy …

Total citations

Cited by 140

2010201120122013201420152016201720182019202020212022202320241 9 12 9 17 24 14 20 12 8 5 1 2 2 1

Scholar articles

Max-cover in map-reduce

F Chierichetti, R Kumar, A Tomkins - Proceedings of the 19th international conference on …, 2010