Intrinsic dimensional outlier detection in high-dimensional data

Jonathan Von Brünken, Michael E. Houle, Arthur Zimek

Research output: Contribution to journalJournal articleResearch

Abstract

We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.

Original languageEnglish
JournalNII Technical Reports
Volume2015
Issue number3
Pages (from-to)1-12
ISSN1346-5597
Publication statusPublished - Mar 2015
Externally publishedYes

Fingerprint

Outlier Detection
High-dimensional Data
Outlier
Continuous random variable
Density Estimation
Experimental Analysis
Substitute
Scoring
Dimensionality
High-dimensional
Query

Cite this

Von Brünken, Jonathan ; Houle, Michael E. ; Zimek, Arthur. / Intrinsic dimensional outlier detection in high-dimensional data. In: NII Technical Reports. 2015 ; Vol. 2015, No. 3. pp. 1-12.
@article{8a9db603c572409a86f8bfa9efba2fc6,
title = "Intrinsic dimensional outlier detection in high-dimensional data",
abstract = "We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.",
author = "{Von Br{\"u}nken}, Jonathan and Houle, {Michael E.} and Arthur Zimek",
year = "2015",
month = "3",
language = "English",
volume = "2015",
pages = "1--12",
journal = "NII Technical Reports",
issn = "1346-5597",
number = "3",

}

Von Brünken, J, Houle, ME & Zimek, A 2015, 'Intrinsic dimensional outlier detection in high-dimensional data', NII Technical Reports, vol. 2015, no. 3, pp. 1-12.

Intrinsic dimensional outlier detection in high-dimensional data. / Von Brünken, Jonathan; Houle, Michael E.; Zimek, Arthur.

In: NII Technical Reports, Vol. 2015, No. 3, 03.2015, p. 1-12.

Research output: Contribution to journalJournal articleResearch

TY - JOUR

T1 - Intrinsic dimensional outlier detection in high-dimensional data

AU - Von Brünken, Jonathan

AU - Houle, Michael E.

AU - Zimek, Arthur

PY - 2015/3

Y1 - 2015/3

N2 - We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.

AB - We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.

M3 - Journal article

AN - SCOPUS:84927915212

VL - 2015

SP - 1

EP - 12

JO - NII Technical Reports

JF - NII Technical Reports

SN - 1346-5597

IS - 3

ER -