Tekijät
Joonas Hämäläinen, Tommi Kärkkäinen
Julkaisupäivämäärä
2016
Konferenssi
ESANN
Kuvaus
Use of distributionally balanced folding to speed up the initialization phase of K-means++ clustering method, targeting for big data applications, is proposed and tested. The approach is first described and then experimented, by focusing on the effects of the sampling method when the number of folds created is varied. In the tests, quality of the final clustering results were assessed and scalability of a distributed implementation was demonstrated. The experiments support the viability of the proposed approach.
Sitaatteja yhteensä