Density-based clustering

Ricardo J.G.B. Campello, Peer Kröger, Jörg Sander, Arthur Zimek*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

211 Downloads (Pure)

Abstract

Clustering refers to the task of identifying groups or clusters in a data set. In density-based clustering, a cluster is a set of data objects spread in the data space over a contiguous region of high density of objects. Density-based clusters are separated from each other by contiguous regions of low density of objects. Data objects located in low-density regions are typically considered noise or outliers. In this review article we discuss the statistical notion of density-based clusters, classic algorithms for deriving a flat partitioning of density-based clusters, methods for hierarchical density-based clustering, and methods for semi-supervised clustering. We conclude with some open challenges related to density-based clustering. This article is categorized under: Technologies > Data Preprocessing Ensemble Methods > Structure Discovery Algorithmic Development > Hierarchies and Trees.

Original languageEnglish
Article numbere1343
JournalWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume10
Issue number2
Number of pages15
ISSN1942-4787
DOIs
Publication statusPublished - Apr 2020

Keywords

  • flat clustering
  • hierarchical clustering
  • nonparametric clustering
  • semi-supervised clustering
  • unsupervised clustering

Cite this