Deriving quantitative models for correlation clusters

Elke Achtert*, Christian Böhm, Hans Peter Kriegel, Peer Kröger, Arthur Zimek

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation, demonstrates the beneficial impact of our method on several applications of significant practical importance.

Original languageEnglish
Title of host publicationKDD 2006 : Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Publication date20. Aug 2006
Pages4-13
ISBN (Print)1-59593-339-5
DOIs
Publication statusPublished - 20. Aug 2006
Externally publishedYes
EventKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Philadelphia, PA, United States
Duration: 20. Aug 200623. Aug 2006

Conference

ConferenceKDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityPhiladelphia, PA
Period20/08/200623/08/2006
SponsorACM SIGKDD, ACM SIGMOD

Fingerprint

Clustering algorithms
Probability distributions
Pipelines
Processing

Keywords

  • Cluster description
  • Cluster model
  • Clustering
  • Correlation clustering
  • Data mining

Cite this

Achtert, E., Böhm, C., Kriegel, H. P., Kröger, P., & Zimek, A. (2006). Deriving quantitative models for correlation clusters. In KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 4-13). Association for Computing Machinery. https://doi.org/10.1145/1150402.1150408
Achtert, Elke ; Böhm, Christian ; Kriegel, Hans Peter ; Kröger, Peer ; Zimek, Arthur. / Deriving quantitative models for correlation clusters. KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2006. pp. 4-13
@inproceedings{ef553b6613404798a9ec0a9fcbbb0e3a,
title = "Deriving quantitative models for correlation clusters",
abstract = "Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation, demonstrates the beneficial impact of our method on several applications of significant practical importance.",
keywords = "Cluster description, Cluster model, Clustering, Correlation clustering, Data mining",
author = "Elke Achtert and Christian B{\"o}hm and Kriegel, {Hans Peter} and Peer Kr{\"o}ger and Arthur Zimek",
year = "2006",
month = "8",
day = "20",
doi = "10.1145/1150402.1150408",
language = "English",
isbn = "1-59593-339-5",
pages = "4--13",
booktitle = "KDD 2006",
publisher = "Association for Computing Machinery",
address = "United States",

}

Achtert, E, Böhm, C, Kriegel, HP, Kröger, P & Zimek, A 2006, Deriving quantitative models for correlation clusters. in KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 4-13, KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, United States, 20/08/2006. https://doi.org/10.1145/1150402.1150408

Deriving quantitative models for correlation clusters. / Achtert, Elke; Böhm, Christian; Kriegel, Hans Peter; Kröger, Peer; Zimek, Arthur.

KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2006. p. 4-13.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Deriving quantitative models for correlation clusters

AU - Achtert, Elke

AU - Böhm, Christian

AU - Kriegel, Hans Peter

AU - Kröger, Peer

AU - Zimek, Arthur

PY - 2006/8/20

Y1 - 2006/8/20

N2 - Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation, demonstrates the beneficial impact of our method on several applications of significant practical importance.

AB - Correlation clustering aims at grouping the data set into correlation clusters such that the objects in the same cluster exhibit a certain density and are all associated to a common arbitrarily oriented hyperplane of arbitrary dimensionality. Several algorithms for this task have been proposed recently. However, all algorithms only compute the partitioning of the data into clusters. This is only a first step in the pipeline of advanced data analysis and system modelling. The second (post-clustering) step of deriving a quantitative model for each correlation cluster has not been addressed so far. In this paper, we describe an original approach to handle this second step. We introduce a general method that can extract quantitative information on the linear dependencies within a correlation clustering. Our concepts are independent of the clustering model and can thus be applied as a post-processing step to any correlation clustering algorithm. Furthermore, we show how these quantitative models can be used to predict the probability distribution that an object is created by these models. Our broad experimental evaluation, demonstrates the beneficial impact of our method on several applications of significant practical importance.

KW - Cluster description

KW - Cluster model

KW - Clustering

KW - Correlation clustering

KW - Data mining

U2 - 10.1145/1150402.1150408

DO - 10.1145/1150402.1150408

M3 - Article in proceedings

AN - SCOPUS:33749545528

SN - 1-59593-339-5

SP - 4

EP - 13

BT - KDD 2006

PB - Association for Computing Machinery

ER -

Achtert E, Böhm C, Kriegel HP, Kröger P, Zimek A. Deriving quantitative models for correlation clusters. In KDD 2006: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2006. p. 4-13 https://doi.org/10.1145/1150402.1150408