Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies

Antonio Cavalcante Araujo Neto*, Jorg Sander, Ricardo J.G.B. Campello, Mario A. Nascimento

*Kontaktforfatter

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstract

HDBSCAN*, a state-of-The-Art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mptsmpts. While a small change in mptsmpts typically leads to a small change in the clustering structure, choosing a 'good' mptsmpts value can be challenging: depending on the data distribution, a high or low mptsmpts value may be more appropriate, and certain clusters may reveal themselves at different values. To explore results for a range of mptsmpts values, one has to run HDBSCAN∗ for each value independently, which can be computationally impractical. In this paper, we propose an approach to efficiently compute all HDBSCAN∗ hierarchies for a range of mptsmpts values by building upon results from computational geometry to replace HDBSCAN*'s complete graph with a smaller equivalent graph. An experimental evaluation shows that our approach can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN∗ about twice, which corresponds to a speedup of more than 60 times, compared to running HDBSCAN∗ independently that many times. We also propose a series of visualizations that allow users to analyze a collection of hierarchies for a range of mptsmpts values, along with case studies that illustrate how these analyses are performed.

OriginalsprogEngelsk
Artikelnummer8943293
TidsskriftIEEE Transactions on Knowledge and Data Engineering
Vol/bind33
Udgave nummer8
Sider (fra-til)3075-3089
ISSN1041-4347
DOI
StatusUdgivet - 1. aug. 2021
Udgivet eksterntJa

Bibliografisk note

Funding Information:
Research partially supported by NSERC, Canada, and by CNPq, under the program Science without Borders, Brazil.

Publisher Copyright:
© 1989-2012 IEEE.

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies'. Sammen danner de et unikt fingeraftryk.

Citationsformater