Intrinsic dimensional outlier detection in high-dimensional data

Jonathan Von Brünken, Michael E. Houle, Arthur Zimek

Publikation: Bidrag til tidsskriftTidsskriftartikelForskning

Abstrakt

We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.

OriginalsprogEngelsk
TidsskriftNII Technical Reports
Vol/bind2015
Udgave nummer3
Sider (fra-til)1-12
ISSN1346-5597
StatusUdgivet - mar. 2015
Udgivet eksterntJa

Citationsformater