Can shared-neighbor distances defeat the curse of dimensionality?

Michael E. Houle, Hans Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.

Original languageEnglish
Title of host publicationScientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings
EditorsM. Gertz, B. Ludäscher
PublisherSpringer
Publication date3. Aug 2010
Pages482-500
ISBN (Print)978-3-642-13817-1
ISBN (Electronic)978-3-642-13818-8
DOIs
Publication statusPublished - 3. Aug 2010
Externally publishedYes
Event22nd International Conference on Scientific and Statistical Database Management - Heidelberg, Germany
Duration: 30. Jun 20102. Jul 2010

Conference

Conference22nd International Conference on Scientific and Statistical Database Management
CountryGermany
CityHeidelberg
Period30/06/201002/07/2010
SponsorHeidelberg University , Heidelberg Institute for Theoretical Studies (HITS)
SeriesLecture Notes in Computer Science
Volume6187
ISSN0302-9743

Fingerprint

Data mining

Cite this

Houle, M. E., Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2010). Can shared-neighbor distances defeat the curse of dimensionality? In M. Gertz, & B. Ludäscher (Eds.), Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings (pp. 482-500). Springer. Lecture Notes in Computer Science, Vol.. 6187 https://doi.org/10.1007/978-3-642-13818-8_34
Houle, Michael E. ; Kriegel, Hans Peter ; Kröger, Peer ; Schubert, Erich ; Zimek, Arthur. / Can shared-neighbor distances defeat the curse of dimensionality?. Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings. editor / M. Gertz ; B. Ludäscher. Springer, 2010. pp. 482-500 (Lecture Notes in Computer Science, Vol. 6187).
@inproceedings{633daa70c88e4654972954d4509326d2,
title = "Can shared-neighbor distances defeat the curse of dimensionality?",
abstract = "The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.",
author = "Houle, {Michael E.} and Kriegel, {Hans Peter} and Peer Kr{\"o}ger and Erich Schubert and Arthur Zimek",
year = "2010",
month = "8",
day = "3",
doi = "10.1007/978-3-642-13818-8_34",
language = "English",
isbn = "978-3-642-13817-1",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "482--500",
editor = "M. Gertz and B. Lud{\"a}scher",
booktitle = "Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings",
address = "Germany",

}

Houle, ME, Kriegel, HP, Kröger, P, Schubert, E & Zimek, A 2010, Can shared-neighbor distances defeat the curse of dimensionality? in M Gertz & B Ludäscher (eds), Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings. Springer, Lecture Notes in Computer Science, vol. 6187, pp. 482-500, 22nd International Conference on Scientific and Statistical Database Management, Heidelberg, Germany, 30/06/2010. https://doi.org/10.1007/978-3-642-13818-8_34

Can shared-neighbor distances defeat the curse of dimensionality? / Houle, Michael E.; Kriegel, Hans Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur.

Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings. ed. / M. Gertz; B. Ludäscher. Springer, 2010. p. 482-500 (Lecture Notes in Computer Science, Vol. 6187).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Can shared-neighbor distances defeat the curse of dimensionality?

AU - Houle, Michael E.

AU - Kriegel, Hans Peter

AU - Kröger, Peer

AU - Schubert, Erich

AU - Zimek, Arthur

PY - 2010/8/3

Y1 - 2010/8/3

N2 - The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.

AB - The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.

U2 - 10.1007/978-3-642-13818-8_34

DO - 10.1007/978-3-642-13818-8_34

M3 - Article in proceedings

AN - SCOPUS:77955045250

SN - 978-3-642-13817-1

T3 - Lecture Notes in Computer Science

SP - 482

EP - 500

BT - Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings

A2 - Gertz, M.

A2 - Ludäscher, B.

PB - Springer

ER -

Houle ME, Kriegel HP, Kröger P, Schubert E, Zimek A. Can shared-neighbor distances defeat the curse of dimensionality? In Gertz M, Ludäscher B, editors, Scientific and Statistical Database Management - 22nd International Conference, SSDBM 2010, Proceedings. Springer. 2010. p. 482-500. (Lecture Notes in Computer Science, Vol. 6187). https://doi.org/10.1007/978-3-642-13818-8_34