Intrinsic dimensional outlier detection in high-dimensional data

Jonathan Von Brünken, Michael E. Houle, Arthur Zimek

Publikation: Bidrag til tidsskriftTidsskriftartikelForskning


We introduce a new method for evaluating local outliers, by utilizing a measure of the intrinsic dimensionality in the vicinity of a test point, the continuous intrinsic dimension (ID), which has been shown to be equivalent to a measure of the discriminative power of similarity functions. Continuous ID can be regarded as an extension of Karger and Ruhl's expansion dimension to a statistical setting in which the distribution of distances to a query point is modeled in terms of a continuous random variable. The proposed local outlier score, IDOS, uses ID as a substitute for the density estimation used in classical outlier detection methods such as LOF. An experimental analysis is provided showing that the precision of IDOS substantially improves over that of state-of-the-art outlier detection scoring methods, especially when the data sets are large and high-dimensional.

TidsskriftNII Technical Reports
Udgave nummer3
Sider (fra-til)1-12
StatusUdgivet - mar. 2015
Udgivet eksterntJa