A comparative evaluation of clustering-based outlier detection

Braulio V. Sánchez Vinces, Erich Schubert*, Arthur Zimek, Robson L.F. Cordeiro

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We perform an extensive experimental evaluation of clustering-based outlier detection methods. These methods offer benefits such as efficiency, the possibility to capitalize on more mature evaluation measures, more developed subspace analysis for high-dimensional data and better explainability, and yet they have so-far been neglected in literature. To our knowledge, our work is the first effort to analytically and empirically study their advantages and disadvantages. Our main goal is to evaluate whether or not clustering-based techniques can compete in efficiency and effectiveness against the most studied state-of-the-art algorithms in the literature. We consider the quality of the results, the resilience against different types of data and variations in parameter configuration, the scalability, and the ability to filter out inappropriate parameter values automatically based on internal measures of clustering quality. It has been recently shown that several classic, simple, unsupervised methods surpass many deep learning approaches and, hence, remain at the state-of-the-art of outlier detection. We therefore study 14 of the best classic unsupervised methods, in particular 11 clustering-based methods and 3 non-clustering-based ones, using a consistent parameterization heuristic to identify the pros and cons of each approach. We consider 46 real and synthetic datasets with up to 125k points and 1.5k dimensions aiming to achieve plausibility with the broadest possible diversity of real-world use cases. Our results indicate that the clustering-based methods are on par with (if not surpass) the non-clustering-based ones, and we argue that clustering-based methods like KMeans−− should be included as baselines in future benchmarking studies, as they often offer a competitive quality at a relatively low run time, besides several other benefits.

Original languageEnglish
Article number13
JournalData Mining and Knowledge Discovery
Volume39
Issue number2
ISSN1384-5810
DOIs
Publication statusPublished - Mar 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Keywords

  • Clustering-based outlier detection
  • Evaluation
  • Experimental analysis and comparison

Fingerprint

Dive into the research topics of 'A comparative evaluation of clustering-based outlier detection'. Together they form a unique fingerprint.

Cite this