View article

[PDF] from sindhwani.org

Large-scale distributed non-negative sparse coding and sparse dictionary learning

Authors

Vikas Sindhwani, Amol Ghoting

Publication date

2012/8/12

Book

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages

489-497

Description

We consider the problem of building compact, unsupervised representations of large, high-dimensional, non-negative data using sparse coding and dictionary learning schemes, with an emphasis on executing the algorithm in a Map-Reduce environment. The proposed algorithms may be seen as parallel optimization procedures for constructing sparse non-negative factorizations of large, sparse matrices. Our approach alternates between a parallel sparse coding phase implemented using greedy or convex (l₁) regularized risk minimization procedures, and a sequential dictionary learning phase where we solve a set of l₀ optimization problems exactly. These two-fold sparsity constraints lead to better statistical performance on text analysis tasks and at the same time make it possible to implement each iteration in a single Map-Reduce job. We detail our implementations and optimizations that lead to the ability to …

Total citations

Cited by 34

2013201420152016201720182019202020212 7 6 7 3 5 2 1

Scholar articles

Large-scale distributed non-negative sparse coding and sparse dictionary learning

V Sindhwani, A Ghoting - Proceedings of the 18th ACM SIGKDD international …, 2012