TY - GEN
T1 - Applying Explainable Outlier Detection
T2 - En Empirisk og Teoretisk Undersøgelse af Anvendelse af Explainable Outlier Detection
AU - Sejr, Jonas Herskind
PY - 2022/4/11
Y1 - 2022/4/11
N2 - Unsupervised outlier detection has advantages over supervised algorithms as there is no need for labels. Because learning is not based on labels, the algorithms can detect outliers that follow patterns not yet seen. This is advantageous e.g., in cybersecurity, where the adversary continuously changes their approach.The first approaches to unsupervised outlier detection were relatively simple, and the users were typically statisticians who, with their insight, could match the appropriate algorithms to a specific application. With increased amounts of data and data complexity, outlier algorithms have become much more complex, and the latest development is towards highly complex algorithms. To the user, these algorithms are often uninterpretable. This is a challenge when a data scientist apply the outlier algorithms and when end users are presented with the detected outliers. This development has led to a new field of research, explaining previously uninterpretable unsupervised outlier algorithms.This thesis investigates explainable outlier algorithms applied in three domains: HTTP intrusion detection, consumer behaviour event detection, and image outlier object detection. We have developed a new algorithm in each domain and tested and evaluated it empirically, as close to the users as possible.Finally, in a position paper, we discuss the what, the who, and the why of explainable outlier detection and introduce a new perspective on outlier detection interpretation and explanation.
AB - Unsupervised outlier detection has advantages over supervised algorithms as there is no need for labels. Because learning is not based on labels, the algorithms can detect outliers that follow patterns not yet seen. This is advantageous e.g., in cybersecurity, where the adversary continuously changes their approach.The first approaches to unsupervised outlier detection were relatively simple, and the users were typically statisticians who, with their insight, could match the appropriate algorithms to a specific application. With increased amounts of data and data complexity, outlier algorithms have become much more complex, and the latest development is towards highly complex algorithms. To the user, these algorithms are often uninterpretable. This is a challenge when a data scientist apply the outlier algorithms and when end users are presented with the detected outliers. This development has led to a new field of research, explaining previously uninterpretable unsupervised outlier algorithms.This thesis investigates explainable outlier algorithms applied in three domains: HTTP intrusion detection, consumer behaviour event detection, and image outlier object detection. We have developed a new algorithm in each domain and tested and evaluated it empirically, as close to the users as possible.Finally, in a position paper, we discuss the what, the who, and the why of explainable outlier detection and introduce a new perspective on outlier detection interpretation and explanation.
U2 - 10.21996/avqe-4e63
DO - 10.21996/avqe-4e63
M3 - Ph.D. thesis
PB - Syddansk Universitet. Det Naturvidenskabelige Fakultet
ER -