Authors
Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, Sergei Vassilvitskii
Publication date
2017/3/1
Journal
Proceedings of the VLDB Endowment
Volume
10
Issue
7
Pages
757-768
Publisher
VLDB Endowment
Description
We study the problem of k-means clustering in the presence of outliers. The goal is to cluster a set of data points to minimize the variance of the points assigned to the same cluster, with the freedom of ignoring a small set of data points that can be labeled as outliers. Clustering with outliers has received a lot of attention in the data processing community, but practical, efficient, and provably good algorithms remain unknown for the most popular k-means objective.
Our work proposes a simple local search-based algorithm for k-means clustering with outliers. We prove that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets. Using empirical evaluation on both synthetic and large-scale real-world data, we demonstrate that the algorithm dominates recently proposed heuristic approaches for the problem.
Total citations
201820192020202120222023202418201818153114
Scholar articles
S Gupta, R Kumar, K Lu, B Moseley, S Vassilvitskii - Proceedings of the VLDB Endowment, 2017