Angle-based outlier detection in high-dimensional data

Hans Peter Kriegel*, Matthias Schubert, Arthur Zimek

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.

Original languageEnglish
Title of host publicationProceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Publication dateDec 2008
Pages444-452
ISBN (Print)978-1-60558-193-4
DOIs
Publication statusPublished - Dec 2008
Externally publishedYes
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Las Vegas, United States
Duration: 24. Aug 200827. Aug 2008

Conference

Conference14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CityLas Vegas
Period24/08/200827/08/2008

Fingerprint

Data mining

Keywords

  • Angle-based
  • High-dimensional
  • Outlier detection

Cite this

Kriegel, H. P., Schubert, M., & Zimek, A. (2008). Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining (pp. 444-452). Association for Computing Machinery. https://doi.org/10.1145/1401890.1401946
Kriegel, Hans Peter ; Schubert, Matthias ; Zimek, Arthur. / Angle-based outlier detection in high-dimensional data. Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2008. pp. 444-452
@inproceedings{80a7b0f580674ffca8a7c40bcc2d3450,
title = "Angle-based outlier detection in high-dimensional data",
abstract = "Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious {"}curse of dimensionality{"}. In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the {"}curse of dimensionality{"} are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.",
keywords = "Angle-based, High-dimensional, Outlier detection",
author = "Kriegel, {Hans Peter} and Matthias Schubert and Arthur Zimek",
year = "2008",
month = "12",
doi = "10.1145/1401890.1401946",
language = "English",
isbn = "978-1-60558-193-4",
pages = "444--452",
booktitle = "Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
address = "United States",

}

Kriegel, HP, Schubert, M & Zimek, A 2008, Angle-based outlier detection in high-dimensional data. in Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 444-452, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, United States, 24/08/2008. https://doi.org/10.1145/1401890.1401946

Angle-based outlier detection in high-dimensional data. / Kriegel, Hans Peter; Schubert, Matthias; Zimek, Arthur.

Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2008. p. 444-452.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Angle-based outlier detection in high-dimensional data

AU - Kriegel, Hans Peter

AU - Schubert, Matthias

AU - Zimek, Arthur

PY - 2008/12

Y1 - 2008/12

N2 - Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.

AB - Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.

KW - Angle-based

KW - High-dimensional

KW - Outlier detection

U2 - 10.1145/1401890.1401946

DO - 10.1145/1401890.1401946

M3 - Article in proceedings

AN - SCOPUS:65449145220

SN - 978-1-60558-193-4

SP - 444

EP - 452

BT - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -

Kriegel HP, Schubert M, Zimek A. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2008. p. 444-452 https://doi.org/10.1145/1401890.1401946