CORE-SG: Efficient Computation of Multiple MSTs for Density-Based Methods

Antonio Cavalcante Araujo Neto, Murilo Coelho Naldi, Ricardo J.G.B. Campello, Jorg Sander

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Several popular density-based methods for unsuper-vised and semi-supervised learning tasks, including clustering and classification, can be formulated as instances of a framework that is based on the processing of a minimum spanning tree of the data, where the edge weights correspond to a form of (unnormalized) density estimate w.r.t. a smoothing parameter mpts. While density-based methods are considered to be robust w.r.t. mpts in the sense that small changes in its value usually lead to slight or no changes in the resulting structure, wider ranges of mpts values may lead to different results that a user would like to analyze before choosing the most suitable value for a given data set or application. However, to explore multiple results for a range of mpts values, until recently, one had to re-run the density-based method for each value in the range independently, which is computationally inefficient. This paper proposes a new computationally efficient approach to compute multiple density-based minimum spanning trees w.r.t. a set of mpts values by leveraging a graph obtained from a single run of the density-based algorithm, without the need for re-runs of the original algorithm. We present theoretical and experimental results that show that our approach overcomes the drawbacks of the previous state-of-the-art, and it is considerably superior in runtime and graph size while being easier to implement. Our experimental evaluation using synthetic and real data shows that our strategy can lead to speed-up factors of hundreds to thousands of times on the computation of density-based minimum spanning trees.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PublisherIEEE
Publication date2022
Pages951-964
ISBN (Electronic)9781665408837
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event38th IEEE International Conference on Data Engineering, ICDE 2022 - Virtual, Online, Malaysia
Duration: 9. May 202212. May 2022

Conference

Conference38th IEEE International Conference on Data Engineering, ICDE 2022
Country/TerritoryMalaysia
CityVirtual, Online
Period09/05/202212/05/2022
SeriesProceedings of the International Conference on Data Engineering
Volume2022-May
ISSN1063-6382

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • clustering
  • density-based methods
  • k-nearest neighbors
  • mini-mum spanning tree
  • semi-supervised clustering

Fingerprint

Dive into the research topics of 'CORE-SG: Efficient Computation of Multiple MSTs for Density-Based Methods'. Together they form a unique fingerprint.

Cite this