TY - GEN
T1 - Similarity-Based Unsupervised Evaluation of Outlier Detection
AU - Marques, Henrique O.
AU - Zimek, Arthur
AU - Campello, Ricardo J.G.B.
AU - Sander, Jörg
N1 - Funding Information:
Acknowledgement. This work has partly been funded by NSERC Canada, and the Independent Research Fund Denmark in the project “Reliable Outlier Detection”.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - The evaluation of unsupervised algorithm results is one of the most challenging tasks in data mining research. Where labeled data are not available, one has to use in practice the so-called internal evaluation, which is based solely on the data and the assessed solutions themselves. In unsupervised cluster analysis, indices for internal evaluation of clustering solutions have been studied for decades, with a multitude of indices available, based on different criteria. In unsupervised outlier detection, however, this problem has only recently received some attention, and still very few indices are available. In this paper, we provide a new internal index based on criteria different from the ones available in the literature. The index is based on a (generic) similarity measure to efficiently evaluate candidate outlier detection solutions in a completely unsupervised way. We evaluate and compare this index against existing indices in terms of quality and run time performance using collections of both real and synthetic datasets.
AB - The evaluation of unsupervised algorithm results is one of the most challenging tasks in data mining research. Where labeled data are not available, one has to use in practice the so-called internal evaluation, which is based solely on the data and the assessed solutions themselves. In unsupervised cluster analysis, indices for internal evaluation of clustering solutions have been studied for decades, with a multitude of indices available, based on different criteria. In unsupervised outlier detection, however, this problem has only recently received some attention, and still very few indices are available. In this paper, we provide a new internal index based on criteria different from the ones available in the literature. The index is based on a (generic) similarity measure to efficiently evaluate candidate outlier detection solutions in a completely unsupervised way. We evaluate and compare this index against existing indices in terms of quality and run time performance using collections of both real and synthetic datasets.
KW - Model selection
KW - Outlier detection
KW - Unsupervised evaluation
KW - Validation
U2 - 10.1007/978-3-031-17849-8_19
DO - 10.1007/978-3-031-17849-8_19
M3 - Article in proceedings
AN - SCOPUS:85140474716
SN - 9783031178481
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 234
EP - 248
BT - Similarity Search and Applications - 15th International Conference, SISAP 2022, Proceedings
A2 - Skopal, Tomáš
A2 - Lokoč, Jakub
A2 - Falchi, Fabrizio
A2 - Sapino, Maria Luisa
A2 - Bartolini, Ilaria
A2 - Patella, Marco
PB - Springer Science+Business Media
T2 - 15th International Conference on Similarity Search and Applications, SISAP 2022
Y2 - 5 October 2022 through 7 October 2022
ER -