Efficient computation of multiple density-based clustering hierarchies

Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J.G.B. Campello, Mario A. Nascimento

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

HDBSCAN∗, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN∗ is robust w.r.t. mpts, choosing a 'good' value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN∗ for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN∗ hierarchies for a range of mpts by replacing the graph used by HDBSCAN∗ with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN∗ about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

Original languageEnglish
Title of host publicationProceedings - 17th IEEE International Conference on Data Mining, ICDM 2017
EditorsGeorge Karypis, Srinivas Alu, Vijay Raghavan, Xindong Wu, Lucio Miele
PublisherIEEE
Publication date15. Dec 2017
Pages991-996
ISBN (Electronic)9781538638347
DOIs
Publication statusPublished - 15. Dec 2017
Externally publishedYes
Event17th IEEE International Conference on Data Mining, ICDM 2017 - New Orleans, United States
Duration: 18. Nov 201721. Nov 2017

Conference

Conference17th IEEE International Conference on Data Mining, ICDM 2017
Country/TerritoryUnited States
CityNew Orleans
Period18/11/201721/11/2017
SponsorCisco Systems, Citigroup Inc., IEEE, IEEE Computer Society, National Science Foundation (NSF)
SeriesProceedings of the IEEE International Conference on Data Mining, ICDM
Volume2017-November
ISSN1550-4786

Bibliographical note

Publisher Copyright:

Keywords

  • Clustering
  • HDBSCAN
  • Hierarchical Clustering
  • Relative Neighborhood Graph

Fingerprint

Dive into the research topics of 'Efficient computation of multiple density-based clustering hierarchies'. Together they form a unique fingerprint.

Cite this