Outlier detection based on low density models

Felix Iglesias Vazquez, Tanja Zseby, Arthur Zimek

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Resumé

Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

OriginalsprogEngelsk
TitelProceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
RedaktørerJeffrey Yu, Zhenhui Li, Hanghang Tong, Feida Zhu
ForlagIEEE Press
Publikationsdato2018
Sider970-979
Artikelnummer8637447
ISBN (Elektronisk)9781538692882
DOI
StatusUdgivet - 2018
Begivenhed18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 - Singapore, Singapore
Varighed: 17. nov. 201820. nov. 2018

Konference

Konference18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
LandSingapore
BySingapore
Periode17/11/201820/11/2018
SponsorIEEE, IEEE Computer Society, Singapore Management University, National Science Foundation (NSF), Shanghai Yixue Educational Technology, X-Order UCommune Singapore
NavnIEEE International Conference on Data Mining Workshops, ICDMW
Vol/bind2018-November
ISSN2375-9232

Fingeraftryk

Learning algorithms
Testing
Costs
Big data

Citer dette

Iglesias Vazquez, F., Zseby, T., & Zimek, A. (2018). Outlier detection based on low density models. I J. Yu, Z. Li, H. Tong, & F. Zhu (red.), Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 (s. 970-979). [8637447] IEEE Press. IEEE International Conference on Data Mining Workshops, ICDMW, Bind. 2018-November https://doi.org/10.1109/ICDMW.2018.00140
Iglesias Vazquez, Felix ; Zseby, Tanja ; Zimek, Arthur. / Outlier detection based on low density models. Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. red. / Jeffrey Yu ; Zhenhui Li ; Hanghang Tong ; Feida Zhu. IEEE Press, 2018. s. 970-979 (IEEE International Conference on Data Mining Workshops, ICDMW, Bind 2018-November).
@inproceedings{042f7b6ed06a421486e18c40e5086f3f,
title = "Outlier detection based on low density models",
abstract = "Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.",
keywords = "eager learning, machine learning model, outlier analysis",
author = "{Iglesias Vazquez}, Felix and Tanja Zseby and Arthur Zimek",
year = "2018",
doi = "10.1109/ICDMW.2018.00140",
language = "English",
pages = "970--979",
editor = "Jeffrey Yu and Zhenhui Li and Hanghang Tong and Feida Zhu",
booktitle = "Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018",
publisher = "IEEE Press",

}

Iglesias Vazquez, F, Zseby, T & Zimek, A 2018, Outlier detection based on low density models. i J Yu, Z Li, H Tong & F Zhu (red), Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018., 8637447, IEEE Press, IEEE International Conference on Data Mining Workshops, ICDMW, bind 2018-November, s. 970-979, 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018, Singapore, Singapore, 17/11/2018. https://doi.org/10.1109/ICDMW.2018.00140

Outlier detection based on low density models. / Iglesias Vazquez, Felix; Zseby, Tanja; Zimek, Arthur.

Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. red. / Jeffrey Yu; Zhenhui Li; Hanghang Tong; Feida Zhu. IEEE Press, 2018. s. 970-979 8637447 (IEEE International Conference on Data Mining Workshops, ICDMW, Bind 2018-November).

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Outlier detection based on low density models

AU - Iglesias Vazquez, Felix

AU - Zseby, Tanja

AU - Zimek, Arthur

PY - 2018

Y1 - 2018

N2 - Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

AB - Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

KW - eager learning

KW - machine learning model

KW - outlier analysis

U2 - 10.1109/ICDMW.2018.00140

DO - 10.1109/ICDMW.2018.00140

M3 - Article in proceedings

SP - 970

EP - 979

BT - Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018

A2 - Yu, Jeffrey

A2 - Li, Zhenhui

A2 - Tong, Hanghang

A2 - Zhu, Feida

PB - IEEE Press

ER -

Iglesias Vazquez F, Zseby T, Zimek A. Outlier detection based on low density models. I Yu J, Li Z, Tong H, Zhu F, red., Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. IEEE Press. 2018. s. 970-979. 8637447. (IEEE International Conference on Data Mining Workshops, ICDMW, Bind 2018-November). https://doi.org/10.1109/ICDMW.2018.00140