Density-based clustering validation

Davoud Moulavi, Pablo A. Jaskowiak, Ricardo J.G.B. Campello, Arthur Zimek, Jorg Sander

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clusters, not necessarily globular) separated by low density areas, possibly containing noise objects. In these cases relative validity indices proposed for globular cluster validation may fail. In this paper we propose a relative validation index for density-based, arbitrarily shaped clusters. The index assesses clustering quality based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results. Experiments on synthetic and real world data show the effectiveness of our approach for the evaluation and selection of clustering algorithms and their respective appropriate parameters.

Original languageEnglish
Title of host publicationProceedings of the 2014 SIAM International Conference on Data Mining
EditorsMohammed Zaki, Zoran Obradovic, Pang Ning-Tan, Arindam Banerjee, Chandrika Kamath, Srinivasan Parthasarathy
PublisherSociety for Industrial and Applied Mathematics Publications
Publication date2014
Pages839-847
ISBN (Electronic)978-1-61197-344-0
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event14th SIAM International Conference on Data Mining - Philadelphia, United States
Duration: 24. Apr 201426. Apr 2014

Conference

Conference14th SIAM International Conference on Data Mining
CountryUnited States
CityPhiladelphia
Period24/04/201426/04/2014
SponsorAmerican Statistical Association

Fingerprint

Clustering algorithms
Probability density function
Experiments

Cite this

Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A., & Sander, J. (2014). Density-based clustering validation. In M. Zaki, Z. Obradovic, P. Ning-Tan, A. Banerjee, C. Kamath, & S. Parthasarathy (Eds.), Proceedings of the 2014 SIAM International Conference on Data Mining (pp. 839-847). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611973440.96
Moulavi, Davoud ; Jaskowiak, Pablo A. ; Campello, Ricardo J.G.B. ; Zimek, Arthur ; Sander, Jorg. / Density-based clustering validation. Proceedings of the 2014 SIAM International Conference on Data Mining. editor / Mohammed Zaki ; Zoran Obradovic ; Pang Ning-Tan ; Arindam Banerjee ; Chandrika Kamath ; Srinivasan Parthasarathy. Society for Industrial and Applied Mathematics Publications, 2014. pp. 839-847
@inproceedings{e2fa94ea01444985b4b2d5d1d3143da1,
title = "Density-based clustering validation",
abstract = "One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clusters, not necessarily globular) separated by low density areas, possibly containing noise objects. In these cases relative validity indices proposed for globular cluster validation may fail. In this paper we propose a relative validation index for density-based, arbitrarily shaped clusters. The index assesses clustering quality based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results. Experiments on synthetic and real world data show the effectiveness of our approach for the evaluation and selection of clustering algorithms and their respective appropriate parameters.",
author = "Davoud Moulavi and Jaskowiak, {Pablo A.} and Campello, {Ricardo J.G.B.} and Arthur Zimek and Jorg Sander",
year = "2014",
doi = "10.1137/1.9781611973440.96",
language = "English",
pages = "839--847",
editor = "Mohammed Zaki and Zoran Obradovic and Pang Ning-Tan and Arindam Banerjee and Chandrika Kamath and Srinivasan Parthasarathy",
booktitle = "Proceedings of the 2014 SIAM International Conference on Data Mining",
publisher = "Society for Industrial and Applied Mathematics Publications",
address = "United States",

}

Moulavi, D, Jaskowiak, PA, Campello, RJGB, Zimek, A & Sander, J 2014, Density-based clustering validation. in M Zaki, Z Obradovic, P Ning-Tan, A Banerjee, C Kamath & S Parthasarathy (eds), Proceedings of the 2014 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics Publications, pp. 839-847, 14th SIAM International Conference on Data Mining, Philadelphia, United States, 24/04/2014. https://doi.org/10.1137/1.9781611973440.96

Density-based clustering validation. / Moulavi, Davoud; Jaskowiak, Pablo A.; Campello, Ricardo J.G.B.; Zimek, Arthur; Sander, Jorg.

Proceedings of the 2014 SIAM International Conference on Data Mining. ed. / Mohammed Zaki; Zoran Obradovic; Pang Ning-Tan; Arindam Banerjee; Chandrika Kamath; Srinivasan Parthasarathy. Society for Industrial and Applied Mathematics Publications, 2014. p. 839-847.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Density-based clustering validation

AU - Moulavi, Davoud

AU - Jaskowiak, Pablo A.

AU - Campello, Ricardo J.G.B.

AU - Zimek, Arthur

AU - Sander, Jorg

PY - 2014

Y1 - 2014

N2 - One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clusters, not necessarily globular) separated by low density areas, possibly containing noise objects. In these cases relative validity indices proposed for globular cluster validation may fail. In this paper we propose a relative validation index for density-based, arbitrarily shaped clusters. The index assesses clustering quality based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results. Experiments on synthetic and real world data show the effectiveness of our approach for the evaluation and selection of clustering algorithms and their respective appropriate parameters.

AB - One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clusters, not necessarily globular) separated by low density areas, possibly containing noise objects. In these cases relative validity indices proposed for globular cluster validation may fail. In this paper we propose a relative validation index for density-based, arbitrarily shaped clusters. The index assesses clustering quality based on the relative density connection between pairs of objects. Our index is formulated on the basis of a new kernel density function, which is used to compute the density of objects and to evaluate the within- and between-cluster density connectedness of clustering results. Experiments on synthetic and real world data show the effectiveness of our approach for the evaluation and selection of clustering algorithms and their respective appropriate parameters.

U2 - 10.1137/1.9781611973440.96

DO - 10.1137/1.9781611973440.96

M3 - Article in proceedings

AN - SCOPUS:84944455763

SP - 839

EP - 847

BT - Proceedings of the 2014 SIAM International Conference on Data Mining

A2 - Zaki, Mohammed

A2 - Obradovic, Zoran

A2 - Ning-Tan, Pang

A2 - Banerjee, Arindam

A2 - Kamath, Chandrika

A2 - Parthasarathy, Srinivasan

PB - Society for Industrial and Applied Mathematics Publications

ER -

Moulavi D, Jaskowiak PA, Campello RJGB, Zimek A, Sander J. Density-based clustering validation. In Zaki M, Obradovic Z, Ning-Tan P, Banerjee A, Kamath C, Parthasarathy S, editors, Proceedings of the 2014 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics Publications. 2014. p. 839-847 https://doi.org/10.1137/1.9781611973440.96