Authors
Marwan Hassani, Philipp Kranen, Rajveer Saini, Thomas Seidl
Publication date
2014/6/30
Book
Proceedings of the 26th International Conference on Scientific and Statistical Database Management
Pages
1-4
Description
Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space stream clustering. To handle the varying speeds of the data stream, "anytime" algorithms are proposed but so far only in full space stream clustering. However, data streams from many application domains contain abundance of dimensions; the clusters often exist only in specific subspaces (subset of dimensions) and do not show up in the full feature space. In this paper, the first algorithm that considers both the high dimensionality and the varying speeds of streaming data, is proposed. The algorithm, called SubClusTree, can flexibly adapt to the different stream speeds and makes the best use of available time to provide a high quality subspace clustering. The …
Scholar articles
M Hassani, P Kranen, R Saini, T Seidl - Proceedings of the 26th International Conference on …, 2014