On evaluation of outlier rankings and outlier scores

Erich Schubert, Remigius Wojdanowski, Arthur Zimek, Hans Peter Kriegel

Publikation: Bidrag til bog/antologi/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review


Outlier detection research is currently focusing on the development of new methods and on improving the computation time for these methods. Evaluation however is rather heuristic, often considering just precision in the top k results or using the area under the ROC curve. These evaluation procedures do not allow for assessment of similarity between methods. Judging the similarity of or correlation between two rankings of outlier scores is an important question in itself but it is also an essential step towards meaningfully building outlier detection ensembles, where this aspect has been completely ignored so far. In this study, our generalized view of evaluation methods allows both to evaluate the performance of existing methods as well as to compare different methods w.r.t. their detection performance. Our new evaluation framework takes into consideration the class imbalance problem and offers new insights on similarity and redundancy of existing outlier detection methods. As a result, the design of effective ensemble methods for outlier detection is considerably enhanced.

TitelProceedings of the 12th SIAM International Conference on Data Mining
RedaktørerJoydeep Ghosh, Huan Liu, Ian Davidson, Carlotta Domeniconi, Chandrika Kamath
Publikationsdatodec. 2012
ISBN (Trykt)9781611972320
ISBN (Elektronisk)978-1-61197-282-5
StatusUdgivet - dec. 2012
Udgivet eksterntJa
Begivenhed12th SIAM International Conference on Data Mining - Anaheim, USA
Varighed: 26. apr. 201228. apr. 2012


Konference12th SIAM International Conference on Data Mining
SponsorAmerican Statistical Association



Schubert, E., Wojdanowski, R., Zimek, A., & Kriegel, H. P. (2012). On evaluation of outlier rankings and outlier scores. I J. Ghosh, H. Liu, I. Davidson, C. Domeniconi, & C. Kamath (red.), Proceedings of the 12th SIAM International Conference on Data Mining (s. 1047-1058) https://doi.org/10.1137/1.9781611972825.90