Efficient computation of multiple density-based clustering hierarchies

Antonio Cavalcante Araujo Neto, Joerg Sander, Ricardo J.G.B. Campello, Mario A. Nascimento

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Abstract

HDBSCAN∗, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN∗ is robust w.r.t. mpts, choosing a 'good' value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may reveal themselves at different values of mpts. To explore results for a range of mpts, one has to run HDBSCAN∗ for each value in the range independently, which is computationally inefficient. In this paper we propose an efficient approach to compute all HDBSCAN∗ hierarchies for a range of mpts by replacing the graph used by HDBSCAN∗ with a much smaller graph that is guaranteed to contain the required information. Our experiments show that our approach can obtain, for example, over one hundred hierarchies for a cost equivalent to running HDBSCAN∗ about 2 times. In fact, this speedup tends to increase with the number of hierarchies to be computed.

OriginalsprogEngelsk
TitelProceedings - 17th IEEE International Conference on Data Mining, ICDM 2017
RedaktørerGeorge Karypis, Srinivas Alu, Vijay Raghavan, Xindong Wu, Lucio Miele
ForlagIEEE
Publikationsdato15. dec. 2017
Sider991-996
ISBN (Elektronisk)9781538638347
DOI
StatusUdgivet - 15. dec. 2017
Udgivet eksterntJa
Begivenhed17th IEEE International Conference on Data Mining, ICDM 2017 - New Orleans, USA
Varighed: 18. nov. 201721. nov. 2017

Konference

Konference17th IEEE International Conference on Data Mining, ICDM 2017
Land/OmrådeUSA
ByNew Orleans
Periode18/11/201721/11/2017
SponsorCisco Systems, Citigroup Inc., IEEE, IEEE Computer Society, National Science Foundation (NSF)
NavnProceedings - IEEE International Conference on Data Mining, ICDM
Vol/bind2017-November
ISSN1550-4786

Bibliografisk note

Funding Information:
In our future work we intend to investigate strategies to simultaneously explore, visualize and possibly combine the whole spectrum of clustering solutions that are available both across multiple hierarchies as well as across different hierarchical/density levels, taking into account the quality of these solutions according to different unsupervised criteria. ACKNOWLEDGMENT Research partially supported by NSERC, Canada, and by CNPq, Brazil (Science without Borders Program).

Publisher Copyright:
© 2017 IEEE.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient computation of multiple density-based clustering hierarchies'. Sammen danner de et unikt fingeraftryk.

Citationsformater