Density-based projected clustering over high dimensional data streams

Irene Ntoutsi, Arthur Zimek, Themis Palpanas, Peer Kröger, Hans Peter Kriegel

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review

Abstrakt

Clustering of high dimensional data streams is an important problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the different aspects of the problem. There exist methods for clustering over full dimensional streams and methods for finding clusters in subspaces of high dimensional static data. Yet only a few approaches have been proposed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimensional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points expire due to ageing. Our experimental results illustrate the effectiveness and the efficiency of HDDStream and also demonstrate that it could serve as a trigger for detecting drastic changes in the underlying stream population, like bursts of network attacks.

OriginalsprogEngelsk
TitelProceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
RedaktørerJoydeep Ghosh, Huan Liu, Ian Davidson, Charlotta Domeniconi, Chandrika Kamath
ForlagSociety for Industrial and Applied Mathematics
Publikationsdatodec. 2012
Sider987-998
ISBN (Trykt)978-1-61197-232-0
ISBN (Elektronisk)978-1-61197-282-5
DOI
StatusUdgivet - dec. 2012
Udgivet eksterntJa
Begivenhed12th SIAM International Conference on Data Mining - Anaheim, USA
Varighed: 26. apr. 201228. apr. 2012

Konference

Konference12th SIAM International Conference on Data Mining
LandUSA
ByAnaheim
Periode26/04/201228/04/2012
SponsorAmerican Statistical Association

Fingeraftryk

Dyk ned i forskningsemnerne om 'Density-based projected clustering over high dimensional data streams'. Sammen danner de et unikt fingeraftryk.

Citationsformater