Density-based projected clustering over high dimensional data streams

Irene Ntoutsi, Arthur Zimek, Themis Palpanas, Peer Kröger, Hans Peter Kriegel

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

Clustering of high dimensional data streams is an important problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the different aspects of the problem. There exist methods for clustering over full dimensional streams and methods for finding clusters in subspaces of high dimensional static data. Yet only a few approaches have been proposed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimensional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points expire due to ageing. Our experimental results illustrate the effectiveness and the efficiency of HDDStream and also demonstrate that it could serve as a trigger for detecting drastic changes in the underlying stream population, like bursts of network attacks.

Original languageEnglish
Title of host publicationProceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
EditorsJoydeep Ghosh, Huan Liu, Ian Davidson, Charlotta Domeniconi, Chandrika Kamath
PublisherSociety for Industrial and Applied Mathematics
Publication dateDec 2012
Pages987-998
ISBN (Print)978-1-61197-232-0
ISBN (Electronic)978-1-61197-282-5
DOIs
Publication statusPublished - Dec 2012
Externally publishedYes
Event12th SIAM International Conference on Data Mining - Anaheim, United States
Duration: 26. Apr 201228. Apr 2012

Conference

Conference12th SIAM International Conference on Data Mining
CountryUnited States
CityAnaheim
Period26/04/201228/04/2012
SponsorAmerican Statistical Association

Fingerprint

Clustering algorithms
Aging of materials
Monitoring

Cite this

Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., & Kriegel, H. P. (2012). Density-based projected clustering over high dimensional data streams. In J. Ghosh, H. Liu, I. Davidson, C. Domeniconi, & C. Kamath (Eds.), Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012 (pp. 987-998). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972825.85
Ntoutsi, Irene ; Zimek, Arthur ; Palpanas, Themis ; Kröger, Peer ; Kriegel, Hans Peter. / Density-based projected clustering over high dimensional data streams. Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012. editor / Joydeep Ghosh ; Huan Liu ; Ian Davidson ; Charlotta Domeniconi ; Chandrika Kamath. Society for Industrial and Applied Mathematics, 2012. pp. 987-998
@inproceedings{e64ae9ad4d2f44c8997ee1a5c5c3e2e0,
title = "Density-based projected clustering over high dimensional data streams",
abstract = "Clustering of high dimensional data streams is an important problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the different aspects of the problem. There exist methods for clustering over full dimensional streams and methods for finding clusters in subspaces of high dimensional static data. Yet only a few approaches have been proposed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimensional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points expire due to ageing. Our experimental results illustrate the effectiveness and the efficiency of HDDStream and also demonstrate that it could serve as a trigger for detecting drastic changes in the underlying stream population, like bursts of network attacks.",
author = "Irene Ntoutsi and Arthur Zimek and Themis Palpanas and Peer Kr{\"o}ger and Kriegel, {Hans Peter}",
year = "2012",
month = "12",
doi = "10.1137/1.9781611972825.85",
language = "English",
isbn = "978-1-61197-232-0",
pages = "987--998",
editor = "Joydeep Ghosh and Huan Liu and Ian Davidson and Charlotta Domeniconi and Chandrika Kamath",
booktitle = "Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012",
publisher = "Society for Industrial and Applied Mathematics",
address = "United States",

}

Ntoutsi, I, Zimek, A, Palpanas, T, Kröger, P & Kriegel, HP 2012, Density-based projected clustering over high dimensional data streams. in J Ghosh, H Liu, I Davidson, C Domeniconi & C Kamath (eds), Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012. Society for Industrial and Applied Mathematics, pp. 987-998, 12th SIAM International Conference on Data Mining, Anaheim, United States, 26/04/2012. https://doi.org/10.1137/1.9781611972825.85

Density-based projected clustering over high dimensional data streams. / Ntoutsi, Irene; Zimek, Arthur; Palpanas, Themis; Kröger, Peer; Kriegel, Hans Peter.

Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012. ed. / Joydeep Ghosh; Huan Liu; Ian Davidson; Charlotta Domeniconi; Chandrika Kamath. Society for Industrial and Applied Mathematics, 2012. p. 987-998.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Density-based projected clustering over high dimensional data streams

AU - Ntoutsi, Irene

AU - Zimek, Arthur

AU - Palpanas, Themis

AU - Kröger, Peer

AU - Kriegel, Hans Peter

PY - 2012/12

Y1 - 2012/12

N2 - Clustering of high dimensional data streams is an important problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the different aspects of the problem. There exist methods for clustering over full dimensional streams and methods for finding clusters in subspaces of high dimensional static data. Yet only a few approaches have been proposed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimensional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points expire due to ageing. Our experimental results illustrate the effectiveness and the efficiency of HDDStream and also demonstrate that it could serve as a trigger for detecting drastic changes in the underlying stream population, like bursts of network attacks.

AB - Clustering of high dimensional data streams is an important problem in many application domains, a prominent example being network monitoring. Several approaches have been lately proposed for solving independently the different aspects of the problem. There exist methods for clustering over full dimensional streams and methods for finding clusters in subspaces of high dimensional static data. Yet only a few approaches have been proposed so far which tackle both the stream and the high dimensionality aspects of the problem simultaneously. In this work, we propose a new density-based projected clustering algorithm, HDDStream, for high dimensional data streams. Our algorithm summarizes both the data points and the dimensions where these points are grouped together and maintains these summaries online, as new points arrive over time and old points expire due to ageing. Our experimental results illustrate the effectiveness and the efficiency of HDDStream and also demonstrate that it could serve as a trigger for detecting drastic changes in the underlying stream population, like bursts of network attacks.

U2 - 10.1137/1.9781611972825.85

DO - 10.1137/1.9781611972825.85

M3 - Article in proceedings

AN - SCOPUS:84868121916

SN - 978-1-61197-232-0

SP - 987

EP - 998

BT - Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012

A2 - Ghosh, Joydeep

A2 - Liu, Huan

A2 - Davidson, Ian

A2 - Domeniconi, Charlotta

A2 - Kamath, Chandrika

PB - Society for Industrial and Applied Mathematics

ER -

Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel HP. Density-based projected clustering over high dimensional data streams. In Ghosh J, Liu H, Davidson I, Domeniconi C, Kamath C, editors, Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012. Society for Industrial and Applied Mathematics. 2012. p. 987-998 https://doi.org/10.1137/1.9781611972825.85