View article

Clustering gene expression data in SQL using locally adaptive metrics

Authors

Dimitris Papadopoulos, Carlotta Domeniconi, Dimitrios Gunopulos, Sheng Ma

Publication date

2003/6/13

Book

Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Pages

35-41

Description

The clustering problem concerns the discovery of homogeneous groups of data according to a certain similarity measure. Clustering suffers from the curse of dimensionality. It is not meaningful to look for clusters in high dimensional spaces as the average density of points anywhere in input space is likely to be low. As a consequence, distance functions that equally use all input features may be ineffective. We introduce an algorithm that discovers clusters in subspaces spanned by different combinations of dimensions via local weightings of features. This approach avoids the risk of loss of information encountered in global dimensionality reduction techniques. Our method associates to each cluster a weight vector, whose values capture the relevance of features within the corresponding cluster. In this paper we present an efficient SQL implementation of our algorithm, that enables the discovery of clusters on data …

Total citations

Cited by 14

20042005200620072008200920102011201220132014201520162017201820193 4 1 1 1 1 1

Scholar articles

Clustering gene expression data in SQL using locally adaptive metrics

D Papadopoulos, C Domeniconi, D Gunopulos, S Ma - Proceedings of the 8th ACM SIGMOD workshop on …, 2003