Mining hierarchies of correlation clusters

Elke Achtert*, Christian Böhm, Peer Kröger, Arthur Zimek

*Kontaktforfatter for dette arbejde

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Resumé

The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

OriginalsprogEngelsk
TitelProceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006
ForlagIEEE
Publikationsdatodec. 2006
Sider119-128
ISBN (Trykt)0-7695-2590-3
DOI
StatusUdgivet - dec. 2006
Udgivet eksterntJa
Begivenhed18th International Conference on Scientific and Statistical Database Management, SSDBM 2006 - Vienna, Australien
Varighed: 3. jul. 20065. jul. 2006

Konference

Konference18th International Conference on Scientific and Statistical Database Management, SSDBM 2006
LandAustralien
ByVienna
Periode03/07/200605/07/2006

Fingeraftryk

Data mining
Experiments

Citer dette

Achtert, E., Böhm, C., Kröger, P., & Zimek, A. (2006). Mining hierarchies of correlation clusters. I Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006 (s. 119-128). IEEE. https://doi.org/10.1109/SSDBM.2006.35
Achtert, Elke ; Böhm, Christian ; Kröger, Peer ; Zimek, Arthur. / Mining hierarchies of correlation clusters. Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, 2006. s. 119-128
@inproceedings{80c5783dd2b2402fb60bfe27a5a1f536,
title = "Mining hierarchies of correlation clusters",
abstract = "The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.",
author = "Elke Achtert and Christian B{\"o}hm and Peer Kr{\"o}ger and Arthur Zimek",
year = "2006",
month = "12",
doi = "10.1109/SSDBM.2006.35",
language = "English",
isbn = "0-7695-2590-3",
pages = "119--128",
booktitle = "Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006",
publisher = "IEEE",
address = "United States",

}

Achtert, E, Böhm, C, Kröger, P & Zimek, A 2006, Mining hierarchies of correlation clusters. i Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, s. 119-128, 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, Vienna, Australien, 03/07/2006. https://doi.org/10.1109/SSDBM.2006.35

Mining hierarchies of correlation clusters. / Achtert, Elke; Böhm, Christian; Kröger, Peer; Zimek, Arthur.

Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE, 2006. s. 119-128.

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Mining hierarchies of correlation clusters

AU - Achtert, Elke

AU - Böhm, Christian

AU - Kröger, Peer

AU - Zimek, Arthur

PY - 2006/12

Y1 - 2006/12

N2 - The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

AB - The detection of correlations between different features in high dimensional data sets is a very important data mining task. These correlations can be arbitrarily complex: One or more features might be correlated with several other features, and both noise features as well as the actual dependencies may be different for different clusters. Therefore, each cluster contains points that are located on a common hyperplane of arbitrary dimensionality in the data space and thus generates a separate, arbitrarily oriented subspace of the original data space. The few recently proposed algorithms designed to uncover these correlation clusters have several disadvantages. In particular, these methods cannot detect correlation clusters of different dimensionality which are nested into each other. The complete hierarchical structure of correlation clusters of varying dimensionality can only be detected by a hierarchical clustering approach. Therefore, we propose the algorithm HiCO (Hierarchical Correlation Ordering), the first hierarchical approach to correlation clustering. The algorithm determines the cluster hierarchy, and visualizes it using correlation diagrams. Several comparative experiments using synthetic and real data sets show the performance and the effectivity of HiCO.

U2 - 10.1109/SSDBM.2006.35

DO - 10.1109/SSDBM.2006.35

M3 - Article in proceedings

AN - SCOPUS:45149090482

SN - 0-7695-2590-3

SP - 119

EP - 128

BT - Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006

PB - IEEE

ER -

Achtert E, Böhm C, Kröger P, Zimek A. Mining hierarchies of correlation clusters. I Proceedings - 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006. IEEE. 2006. s. 119-128 https://doi.org/10.1109/SSDBM.2006.35