Computing clusters of correlation connected objects

Christian Böhm*, Karin Kailing, Peer Kröger, Arthur Zimek

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.

Original languageEnglish
Title of host publicationSIGMOD '04 : Proceedings of the 2004 ACM SIGMOD international conference on Management of data
EditorsG. Weikum, A. C. Konig, S. Dessloch
PublisherAssociation for Computing Machinery
Publication date13. Jun 2004
Pages455-466
ISBN (Electronic)1-58113-859-8
DOIs
Publication statusPublished - 13. Jun 2004
Externally publishedYes
EventProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004 - Paris, France
Duration: 13. Jun 200418. Jun 2004

Conference

ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004
CountryFrance
CityParis
Period13/06/200418/06/2004
SponsorACM SIGAI

Fingerprint

Cluster computing
Principal component analysis
Molecular biology
Electronic commerce
Data mining

Cite this

Böhm, C., Kailing, K., Kröger, P., & Zimek, A. (2004). Computing clusters of correlation connected objects. In G. Weikum, A. C. Konig, & S. Dessloch (Eds.), SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 455-466). Association for Computing Machinery. https://doi.org/10.1145/1007568.1007620
Böhm, Christian ; Kailing, Karin ; Kröger, Peer ; Zimek, Arthur. / Computing clusters of correlation connected objects. SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. editor / G. Weikum ; A. C. Konig ; S. Dessloch. Association for Computing Machinery, 2004. pp. 455-466
@inproceedings{d033d913ab09459183a2d7e39627329f,
title = "Computing clusters of correlation connected objects",
abstract = "The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.",
author = "Christian B{\"o}hm and Karin Kailing and Peer Kr{\"o}ger and Arthur Zimek",
year = "2004",
month = "6",
day = "13",
doi = "10.1145/1007568.1007620",
language = "English",
pages = "455--466",
editor = "G. Weikum and Konig, {A. C.} and S. Dessloch",
booktitle = "SIGMOD '04",
publisher = "Association for Computing Machinery",
address = "United States",

}

Böhm, C, Kailing, K, Kröger, P & Zimek, A 2004, Computing clusters of correlation connected objects. in G Weikum, AC Konig & S Dessloch (eds), SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. Association for Computing Machinery, pp. 455-466, Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, Paris, France, 13/06/2004. https://doi.org/10.1145/1007568.1007620

Computing clusters of correlation connected objects. / Böhm, Christian; Kailing, Karin; Kröger, Peer; Zimek, Arthur.

SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ed. / G. Weikum; A. C. Konig; S. Dessloch. Association for Computing Machinery, 2004. p. 455-466.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Computing clusters of correlation connected objects

AU - Böhm, Christian

AU - Kailing, Karin

AU - Kröger, Peer

AU - Zimek, Arthur

PY - 2004/6/13

Y1 - 2004/6/13

N2 - The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.

AB - The detection of correlations between different features in a set of feature vectors is a very important data mining task because correlation indicates a dependency between the features or some association of cause and effect between them. This association can be arbitrarily complex, i.e. one or more features might be dependent from a combination of several other features. Well-known methods like the principal components analysis (PCA) can perfectly find correlations which are global, linear, not hidden in a set of noise vectors, and uniform, i.e. the same type of correlation is exhibited in all feature vectors. In many applications such as medical diagnosis, molecular biology, time sequences, or electronic commerce, however, correlations are not global since the dependency between features can be different in different subgroups of the set. In this paper, we propose a method called 4C (Computing Correlation Connected Clusters) to identify local subgroups of the data objects sharing a uniform but arbitrarily complex correlation. Our algorithm is based on a combination of PCA and density-based clustering (DBSCAN). Our method has a determinate result and is robust against noise. A broad comparative evaluation demonstrates the superior performance of 4C over competing methods such as DBSCAN, CLIQUE and ORCLUS.

U2 - 10.1145/1007568.1007620

DO - 10.1145/1007568.1007620

M3 - Article in proceedings

AN - SCOPUS:14544300820

SP - 455

EP - 466

BT - SIGMOD '04

A2 - Weikum, G.

A2 - Konig, A. C.

A2 - Dessloch, S.

PB - Association for Computing Machinery

ER -

Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In Weikum G, Konig AC, Dessloch S, editors, SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data. Association for Computing Machinery. 2004. p. 455-466 https://doi.org/10.1145/1007568.1007620