Mining hierarchies of correlation clusters

Elke Achtert*, Christian Böhm, Peer Kröger, Arthur Zimek

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

Original languageEnglish
Title of host publicationProceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006
PublisherIEEE
Publication dateDec 2006
Pages119-128
ISBN (Print)0-7695-2590-3
DOIs
Publication statusPublished - Dec 2006
Externally publishedYes
Event18th International Conference on Scientific and Statistical Database Management, SSDBM 2006 - Vienna, Australia
Duration: 3. Jul 20065. Jul 2006

Conference

Conference18th International Conference on Scientific and Statistical Database Management, SSDBM 2006
CountryAustralia
CityVienna
Period03/07/200605/07/2006

Fingerprint

Data mining
Experiments

Cite this

Achtert, E., Böhm, C., Kröger, P., & Zimek, A. (2006). Mining hierarchies of correlation clusters. In Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006 (pp. 119-128). IEEE. https://doi.org/10.1109/SSDBM.2006.35
Achtert, Elke ; Böhm, Christian ; Kröger, Peer ; Zimek, Arthur. / Mining hierarchies of correlation clusters. Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, 2006. pp. 119-128
@inproceedings{80c5783dd2b2402fb60bfe27a5a1f536,
title = "Mining hierarchies of correlation clusters",
abstract = "The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.",
author = "Elke Achtert and Christian B{\"o}hm and Peer Kr{\"o}ger and Arthur Zimek",
year = "2006",
month = "12",
doi = "10.1109/SSDBM.2006.35",
language = "English",
isbn = "0-7695-2590-3",
pages = "119--128",
booktitle = "Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006",
publisher = "IEEE",
address = "United States",

}

Achtert, E, Böhm, C, Kröger, P & Zimek, A 2006, Mining hierarchies of correlation clusters. in Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, pp. 119-128, 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, Vienna, Australia, 03/07/2006. https://doi.org/10.1109/SSDBM.2006.35

Mining hierarchies of correlation clusters. / Achtert, Elke; Böhm, Christian; Kröger, Peer; Zimek, Arthur.

Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, 2006. p. 119-128.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Mining hierarchies of correlation clusters

AU - Achtert, Elke

AU - Böhm, Christian

AU - Kröger, Peer

AU - Zimek, Arthur

PY - 2006/12

Y1 - 2006/12

N2 - The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

AB - The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

U2 - 10.1109/SSDBM.2006.35

DO - 10.1109/SSDBM.2006.35

M3 - Article in proceedings

AN - SCOPUS:45149090482

SN - 0-7695-2590-3

SP - 119

EP - 128

BT - Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006

PB - IEEE

ER -

Achtert E, Böhm C, Kröger P, Zimek A. Mining hierarchies of correlation clusters. In Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE. 2006. p. 119-128 https://doi.org/10.1109/SSDBM.2006.35