Representative clustering of uncertain data

Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, Nikos Mamoulis, Arthur Zimek, Matthias Renz

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.

Original languageEnglish
Title of host publicationProceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Publication date24. Aug 2014
Pages243-252
ISBN (Print)978-1-4503-2956-9
DOIs
Publication statusPublished - 24. Aug 2014
Externally publishedYes
Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - New York, United States
Duration: 24. Aug 201427. Aug 2014

Conference

Conference20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityNew York
Period24/08/201427/08/2014
SponsorACM SIGKDD, ACM SIGMOD

Fingerprint

Clustering algorithms
Semantics
Uncertainty

Keywords

  • clustering
  • possible worlds
  • uncertain data

Cite this

Züfle, A., Emrich, T., Schmid, K. A., Mamoulis, N., Zimek, A., & Renz, M. (2014). Representative clustering of uncertain data. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 243-252). Association for Computing Machinery. https://doi.org/10.1145/2623330.2623725
Züfle, Andreas ; Emrich, Tobias ; Schmid, Klaus Arthur ; Mamoulis, Nikos ; Zimek, Arthur ; Renz, Matthias. / Representative clustering of uncertain data. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. pp. 243-252
@inproceedings{f1e596b4cc62447bb5bf893530a045b8,
title = "Representative clustering of uncertain data",
abstract = "This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.",
keywords = "clustering, possible worlds, uncertain data",
author = "Andreas Z{\"u}fle and Tobias Emrich and Schmid, {Klaus Arthur} and Nikos Mamoulis and Arthur Zimek and Matthias Renz",
year = "2014",
month = "8",
day = "24",
doi = "10.1145/2623330.2623725",
language = "English",
isbn = "978-1-4503-2956-9",
pages = "243--252",
booktitle = "Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
address = "United States",

}

Züfle, A, Emrich, T, Schmid, KA, Mamoulis, N, Zimek, A & Renz, M 2014, Representative clustering of uncertain data. in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 243-252, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, United States, 24/08/2014. https://doi.org/10.1145/2623330.2623725

Representative clustering of uncertain data. / Züfle, Andreas; Emrich, Tobias; Schmid, Klaus Arthur; Mamoulis, Nikos; Zimek, Arthur; Renz, Matthias.

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. p. 243-252.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Representative clustering of uncertain data

AU - Züfle, Andreas

AU - Emrich, Tobias

AU - Schmid, Klaus Arthur

AU - Mamoulis, Nikos

AU - Zimek, Arthur

AU - Renz, Matthias

PY - 2014/8/24

Y1 - 2014/8/24

N2 - This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.

AB - This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.

KW - clustering

KW - possible worlds

KW - uncertain data

U2 - 10.1145/2623330.2623725

DO - 10.1145/2623330.2623725

M3 - Article in proceedings

AN - SCOPUS:84907033673

SN - 978-1-4503-2956-9

SP - 243

EP - 252

BT - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -

Züfle A, Emrich T, Schmid KA, Mamoulis N, Zimek A, Renz M. Representative clustering of uncertain data. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2014. p. 243-252 https://doi.org/10.1145/2623330.2623725