Authors
Edo Liberty, Ram Sriharsha, Maxim Sviridenko
Publication date
2016
Book
2016 Proceedings of the eighteenth workshop on algorithm engineering and experiments (ALENEX)
Pages
81-89
Publisher
Society for Industrial and Applied Mathematics
Description
This paper shows that one can be competitive with the k-means objective while operating online. In this model, the algorithm receives vectors v1, …, vn one by one in an arbitrary order. For each vector vt the algorithm outputs a cluster identifier before receiving vt+1. Our online algorithm generates O(k log n log γn) clusters whose expected k-means cost is O(W* log n). Here, W* is the optimal k-means cost using k clusters and γ is the aspect ratio of the data. The dependence on γ is shown to be unavoidable and tight. We also show that, experimentally, it is not much worse than k-means++ while operating in a strictly more constrained computational model.
Total citations
20152016201720182019202020212022202320241961018182012168
Scholar articles
E Liberty, R Sriharsha, M Sviridenko - 2016 Proceedings of the eighteenth workshop on …, 2016