A general framework for increasing the robustness of PCA-based correlation clustering algorithms

Hans Peter Kriegel*, Peer Kröger, Erich Schubert, Arthur Zimek

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.

Original languageEnglish
Title of host publicationScientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
EditorsB. Ludäscher, N. Mamoulis
PublisherSpringer
Publication date14. Aug 2008
Pages418-435
ISBN (Print)978-3-540-69476-2
ISBN (Electronic)978-3-540-69497-7
DOIs
Publication statusPublished - 14. Aug 2008
Externally publishedYes
Event20th International Conference on Scientific and Statistical Database Management, SSDBM 2008 - Hong Kong, China
Duration: 9. Jul 200811. Jul 2008

Conference

Conference20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
CountryChina
CityHong Kong
Period09/07/200811/07/2008
SeriesLecture Notes in Computer Science
Volume5069
ISSN0302-9743

Fingerprint

Clustering algorithms
Principal component analysis

Cite this

Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2008). A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In B. Ludäscher, & N. Mamoulis (Eds.), Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings (pp. 418-435). Springer. Lecture Notes in Computer Science, Vol.. 5069 https://doi.org/10.1007/978-3-540-69497-7_27
Kriegel, Hans Peter ; Kröger, Peer ; Schubert, Erich ; Zimek, Arthur. / A general framework for increasing the robustness of PCA-based correlation clustering algorithms. Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. editor / B. Ludäscher ; N. Mamoulis. Springer, 2008. pp. 418-435 (Lecture Notes in Computer Science, Vol. 5069).
@inproceedings{897918ad5ac9496bab8dd6e239a3bde4,
title = "A general framework for increasing the robustness of PCA-based correlation clustering algorithms",
abstract = "Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.",
author = "Kriegel, {Hans Peter} and Peer Kr{\"o}ger and Erich Schubert and Arthur Zimek",
year = "2008",
month = "8",
day = "14",
doi = "10.1007/978-3-540-69497-7_27",
language = "English",
isbn = "978-3-540-69476-2",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "418--435",
editor = "B. Lud{\"a}scher and N. Mamoulis",
booktitle = "Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings",
address = "Germany",

}

Kriegel, HP, Kröger, P, Schubert, E & Zimek, A 2008, A general framework for increasing the robustness of PCA-based correlation clustering algorithms. in B Ludäscher & N Mamoulis (eds), Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. Springer, Lecture Notes in Computer Science, vol. 5069, pp. 418-435, 20th International Conference on Scientific and Statistical Database Management, SSDBM 2008, Hong Kong, China, 09/07/2008. https://doi.org/10.1007/978-3-540-69497-7_27

A general framework for increasing the robustness of PCA-based correlation clustering algorithms. / Kriegel, Hans Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur.

Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. ed. / B. Ludäscher; N. Mamoulis. Springer, 2008. p. 418-435 (Lecture Notes in Computer Science, Vol. 5069).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - A general framework for increasing the robustness of PCA-based correlation clustering algorithms

AU - Kriegel, Hans Peter

AU - Kröger, Peer

AU - Schubert, Erich

AU - Zimek, Arthur

PY - 2008/8/14

Y1 - 2008/8/14

N2 - Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.

AB - Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.

U2 - 10.1007/978-3-540-69497-7_27

DO - 10.1007/978-3-540-69497-7_27

M3 - Article in proceedings

AN - SCOPUS:49049119729

SN - 978-3-540-69476-2

T3 - Lecture Notes in Computer Science

SP - 418

EP - 435

BT - Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings

A2 - Ludäscher, B.

A2 - Mamoulis, N.

PB - Springer

ER -

Kriegel HP, Kröger P, Schubert E, Zimek A. A general framework for increasing the robustness of PCA-based correlation clustering algorithms. In Ludäscher B, Mamoulis N, editors, Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. Springer. 2008. p. 418-435. (Lecture Notes in Computer Science, Vol. 5069). https://doi.org/10.1007/978-3-540-69497-7_27