Outlier detection based on low density models

Felix Iglesias Vazquez, Tanja Zseby, Arthur Zimek

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

Original languageEnglish
Title of host publicationProceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
EditorsJeffrey Yu, Zhenhui Li, Hanghang Tong, Feida Zhu
PublisherIEEE Press
Publication date2018
Pages970-979
Article number8637447
ISBN (Electronic)9781538692882
DOIs
Publication statusPublished - 2018
Event18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 - Singapore, Singapore
Duration: 17 Nov 201820 Nov 2018

Conference

Conference18th IEEE International Conference on Data Mining Workshops, ICDMW 2018
CountrySingapore
CitySingapore
Period17/11/201820/11/2018
SponsorIEEE, IEEE Computer Society, Singapore Management University, National Science Foundation (NSF), Shanghai Yixue Educational Technology, X-Order UCommune Singapore
SeriesIEEE International Conference on Data Mining Workshops, ICDMW
Volume2018-November
ISSN2375-9232

Fingerprint

Learning algorithms
Testing
Costs
Big data

Keywords

  • eager learning
  • machine learning model
  • outlier analysis

Cite this

Iglesias Vazquez, F., Zseby, T., & Zimek, A. (2018). Outlier detection based on low density models. In J. Yu, Z. Li, H. Tong, & F. Zhu (Eds.), Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018 (pp. 970-979). [8637447] IEEE Press. IEEE International Conference on Data Mining Workshops, ICDMW, Vol.. 2018-November https://doi.org/10.1109/ICDMW.2018.00140
Iglesias Vazquez, Felix ; Zseby, Tanja ; Zimek, Arthur. / Outlier detection based on low density models. Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. editor / Jeffrey Yu ; Zhenhui Li ; Hanghang Tong ; Feida Zhu. IEEE Press, 2018. pp. 970-979 (IEEE International Conference on Data Mining Workshops, ICDMW, Vol. 2018-November).
@inproceedings{042f7b6ed06a421486e18c40e5086f3f,
title = "Outlier detection based on low density models",
abstract = "Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.",
keywords = "eager learning, machine learning model, outlier analysis",
author = "{Iglesias Vazquez}, Felix and Tanja Zseby and Arthur Zimek",
year = "2018",
doi = "10.1109/ICDMW.2018.00140",
language = "English",
pages = "970--979",
editor = "Jeffrey Yu and Zhenhui Li and Hanghang Tong and Feida Zhu",
booktitle = "Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018",
publisher = "IEEE Press",

}

Iglesias Vazquez, F, Zseby, T & Zimek, A 2018, Outlier detection based on low density models. in J Yu, Z Li, H Tong & F Zhu (eds), Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018., 8637447, IEEE Press, IEEE International Conference on Data Mining Workshops, ICDMW, vol. 2018-November, pp. 970-979, 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018, Singapore, Singapore, 17/11/2018. https://doi.org/10.1109/ICDMW.2018.00140

Outlier detection based on low density models. / Iglesias Vazquez, Felix; Zseby, Tanja; Zimek, Arthur.

Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. ed. / Jeffrey Yu; Zhenhui Li; Hanghang Tong; Feida Zhu. IEEE Press, 2018. p. 970-979 8637447 (IEEE International Conference on Data Mining Workshops, ICDMW, Vol. 2018-November).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Outlier detection based on low density models

AU - Iglesias Vazquez, Felix

AU - Zseby, Tanja

AU - Zimek, Arthur

PY - 2018

Y1 - 2018

N2 - Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

AB - Most outlier detection algorithms are based on lazy learning or imply quadratic complexity. Both characteristics make them unsuitable for big data and stream data applications and preclude their applicability in systems that must operate autonomously. In this paper we propose a new algorithm-called SDO (Sparse Data Observers)-to estimate outlierness based on low density models of data. SDO is an eager learner; therefore, computational costs in application phases are severely reduced. We perform tests with a wide variation of synthetic datasets as well as the main datasets published in the literature for anomaly detection testing. Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives. The good detection performance coupled with a low complexity makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

KW - eager learning

KW - machine learning model

KW - outlier analysis

U2 - 10.1109/ICDMW.2018.00140

DO - 10.1109/ICDMW.2018.00140

M3 - Article in proceedings

SP - 970

EP - 979

BT - Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018

A2 - Yu, Jeffrey

A2 - Li, Zhenhui

A2 - Tong, Hanghang

A2 - Zhu, Feida

PB - IEEE Press

ER -

Iglesias Vazquez F, Zseby T, Zimek A. Outlier detection based on low density models. In Yu J, Li Z, Tong H, Zhu F, editors, Proceedings of the 18th IEEE International Conference on Data Mining Workshops, ICDMW 2018. IEEE Press. 2018. p. 970-979. 8637447. (IEEE International Conference on Data Mining Workshops, ICDMW, Vol. 2018-November). https://doi.org/10.1109/ICDMW.2018.00140