Efficient computation of multiple density-based clustering hierarchies

Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J.G.B. Campello, Mario A. Nascimento

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review


HDBSCAN∗, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN∗ is robust w.r.t. mpts, choosing a 'good' value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN∗ for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN∗ hierarchies for a range of mpts by replacing the graph used by HDBSCAN∗ with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN∗ about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

TitelProceedings - 17th IEEE International Conference on Data Mining, ICDM 2017
RedaktørerGeorge Karypis, Srinivas Alu, Vijay Raghavan, Xindong Wu, Lucio Miele
Publikationsdato15. dec. 2017
ISBN (Elektronisk)9781538638347
StatusUdgivet - 15. dec. 2017
Udgivet eksterntJa
Begivenhed17th IEEE International Conference on Data Mining, ICDM 2017 - New Orleans, USA
Varighed: 18. nov. 201721. nov. 2017


Konference17th IEEE International Conference on Data Mining, ICDM 2017
ByNew Orleans
SponsorCisco Systems, Citigroup Inc., IEEE, IEEE Computer Society, National Science Foundation (NSF)
NavnProceedings - IEEE International Conference on Data Mining, ICDM

Bibliografisk note

Funding Information:
In our future work we intend to investigate strategies to simultaneously explore, visualize and possibly combine the whole spectrum of clustering solutions that are available both across multiple hierarchies as well as across different hierarchical/density levels, taking into account the quality of these solutions according to different unsupervised criteria. ACKNOWLEDGMENT Research partially supported by NSERC, Canada, and by CNPq, Brazil (Science without Borders Program).

Publisher Copyright:
© 2017 IEEE.


Dyk ned i forskningsemnerne om 'Efficient computation of multiple density-based clustering hierarchies'. Sammen danner de et unikt fingeraftryk.