Data perturbation for outlier detection ensembles

Arthur Zimek, Ricardo J.G.B. Campello, Jörg Sander

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Resumé

Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.

OriginalsprogEngelsk
TitelProceedings of the 26th International Conference on Scientific and Statistical Database Management
Antal sider12
ForlagAssociation for Computing Machinery
Publikationsdato2014
Artikelnummer13
ISBN (Trykt)978-1-4503-2722-0
DOI
StatusUdgivet - 2014
Udgivet eksterntJa
Begivenhed26th International Conference on Scientific and Statistical Database Management - Aalborg, Danmark
Varighed: 30. jun. 20142. jul. 2014

Konference

Konference26th International Conference on Scientific and Statistical Database Management
LandDanmark
ByAalborg
Periode30/06/201402/07/2014
SponsorDanish Otto Monsted Foundation, TARGIT

Fingeraftryk

outlier
perturbation
learning
data mining
ranking
detection

Citer dette

Zimek, A., Campello, R. J. G. B., & Sander, J. (2014). Data perturbation for outlier detection ensembles. I Proceedings of the 26th International Conference on Scientific and Statistical Database Management [13] Association for Computing Machinery. https://doi.org/10.1145/2618243.2618257
Zimek, Arthur ; Campello, Ricardo J.G.B. ; Sander, Jörg. / Data perturbation for outlier detection ensembles. Proceedings of the 26th International Conference on Scientific and Statistical Database Management. Association for Computing Machinery, 2014.
@inproceedings{356909689e14441da2ee72ac66cb012d,
title = "Data perturbation for outlier detection ensembles",
abstract = "Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.",
keywords = "Ensemble, Outlier detection",
author = "Arthur Zimek and Campello, {Ricardo J.G.B.} and J{\"o}rg Sander",
year = "2014",
doi = "10.1145/2618243.2618257",
language = "English",
isbn = "978-1-4503-2722-0",
booktitle = "Proceedings of the 26th International Conference on Scientific and Statistical Database Management",
publisher = "Association for Computing Machinery",
address = "United States",

}

Zimek, A, Campello, RJGB & Sander, J 2014, Data perturbation for outlier detection ensembles. i Proceedings of the 26th International Conference on Scientific and Statistical Database Management., 13, Association for Computing Machinery, 26th International Conference on Scientific and Statistical Database Management, Aalborg, Danmark, 30/06/2014. https://doi.org/10.1145/2618243.2618257

Data perturbation for outlier detection ensembles. / Zimek, Arthur; Campello, Ricardo J.G.B.; Sander, Jörg.

Proceedings of the 26th International Conference on Scientific and Statistical Database Management. Association for Computing Machinery, 2014. 13.

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

TY - GEN

T1 - Data perturbation for outlier detection ensembles

AU - Zimek, Arthur

AU - Campello, Ricardo J.G.B.

AU - Sander, Jörg

PY - 2014

Y1 - 2014

N2 - Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.

AB - Outlier detection and ensemble learning are well established research directions in data mining yet the application of ensemble techniques to outlier detection has been rarely studied. Building an ensemble requires learning of diverse models and combining these diverse models in an appropriate way. We propose data perturbation as a new technique to induce diversity in individual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble. In an extensive evaluation, we study the impact, potential, and shortcomings of this new approach for outlier detection ensembles. We show that this ensemble can significantly improve over weak performing base methods.

KW - Ensemble

KW - Outlier detection

U2 - 10.1145/2618243.2618257

DO - 10.1145/2618243.2618257

M3 - Article in proceedings

AN - SCOPUS:84904429012

SN - 978-1-4503-2722-0

BT - Proceedings of the 26th International Conference on Scientific and Statistical Database Management

PB - Association for Computing Machinery

ER -

Zimek A, Campello RJGB, Sander J. Data perturbation for outlier detection ensembles. I Proceedings of the 26th International Conference on Scientific and Statistical Database Management. Association for Computing Machinery. 2014. 13 https://doi.org/10.1145/2618243.2618257