Active learning strategies for semi-supervised DBSCAN

Jundong Li, Jörg Sander, Ricardo Campello, Arthur Zimek

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Abstract

The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select "most representative" objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.

Original languageEnglish
Title of host publicationAdvances in Artificial Intelligence : Proceedings of the 27th Canadian Conference on Artificial Intelligence
EditorsM. Sokolova, P. van Beek
PublisherSpringer VS
Publication date2014
Pages179-190
ISBN (Print)978-3-319-06482-6
ISBN (Electronic)978-3-319-06483-3
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event27th Canadian Conference on Artificial Intelligence - Montreal, Canada
Duration: 6. May 20149. May 2014

Conference

Conference27th Canadian Conference on Artificial Intelligence
CountryCanada
CityMontreal
Period06/05/201409/05/2014
Sponsor'Nana Traiteur par l'Assommoir', Canadian Artificial Intelligence Association (CAIAC), et al., GRAND (Graphics, Animation and New Media) Research Network, Grevin, Polytechnique Montreal
SeriesLecture Notes in Computer Science
Volume8436
ISSN0302-9743

Fingerprint

Labels
Clustering algorithms
Costs
Experiments
Problem-Based Learning

Keywords

  • Active learning
  • Density-based clustering
  • Semi-supervised clustering

Cite this

Li, J., Sander, J., Campello, R., & Zimek, A. (2014). Active learning strategies for semi-supervised DBSCAN. In M. Sokolova, & P. van Beek (Eds.), Advances in Artificial Intelligence: Proceedings of the 27th Canadian Conference on Artificial Intelligence (pp. 179-190). Springer VS. Lecture Notes in Computer Science, Vol.. 8436 https://doi.org/10.1007/978-3-319-06483-3_16
Li, Jundong ; Sander, Jörg ; Campello, Ricardo ; Zimek, Arthur. / Active learning strategies for semi-supervised DBSCAN. Advances in Artificial Intelligence: Proceedings of the 27th Canadian Conference on Artificial Intelligence. editor / M. Sokolova ; P. van Beek. Springer VS, 2014. pp. 179-190 (Lecture Notes in Computer Science, Vol. 8436).
@inproceedings{d0c73b5a887240f197e0c69ce5d99c52,
title = "Active learning strategies for semi-supervised DBSCAN",
abstract = "The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select {"}most representative{"} objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.",
keywords = "Active learning, Density-based clustering, Semi-supervised clustering",
author = "Jundong Li and J{\"o}rg Sander and Ricardo Campello and Arthur Zimek",
year = "2014",
doi = "10.1007/978-3-319-06483-3_16",
language = "English",
isbn = "978-3-319-06482-6",
series = "Lecture Notes in Computer Science",
publisher = "Springer VS",
pages = "179--190",
editor = "M. Sokolova and {van Beek}, P.",
booktitle = "Advances in Artificial Intelligence",

}

Li, J, Sander, J, Campello, R & Zimek, A 2014, Active learning strategies for semi-supervised DBSCAN. in M Sokolova & P van Beek (eds), Advances in Artificial Intelligence: Proceedings of the 27th Canadian Conference on Artificial Intelligence. Springer VS, Lecture Notes in Computer Science, vol. 8436, pp. 179-190, 27th Canadian Conference on Artificial Intelligence, Montreal, Canada, 06/05/2014. https://doi.org/10.1007/978-3-319-06483-3_16

Active learning strategies for semi-supervised DBSCAN. / Li, Jundong; Sander, Jörg; Campello, Ricardo; Zimek, Arthur.

Advances in Artificial Intelligence: Proceedings of the 27th Canadian Conference on Artificial Intelligence. ed. / M. Sokolova; P. van Beek. Springer VS, 2014. p. 179-190 (Lecture Notes in Computer Science, Vol. 8436).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Active learning strategies for semi-supervised DBSCAN

AU - Li, Jundong

AU - Sander, Jörg

AU - Campello, Ricardo

AU - Zimek, Arthur

PY - 2014

Y1 - 2014

N2 - The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select "most representative" objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.

AB - The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select "most representative" objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.

KW - Active learning

KW - Density-based clustering

KW - Semi-supervised clustering

U2 - 10.1007/978-3-319-06483-3_16

DO - 10.1007/978-3-319-06483-3_16

M3 - Article in proceedings

AN - SCOPUS:84901657429

SN - 978-3-319-06482-6

T3 - Lecture Notes in Computer Science

SP - 179

EP - 190

BT - Advances in Artificial Intelligence

A2 - Sokolova, M.

A2 - van Beek, P.

PB - Springer VS

ER -

Li J, Sander J, Campello R, Zimek A. Active learning strategies for semi-supervised DBSCAN. In Sokolova M, van Beek P, editors, Advances in Artificial Intelligence: Proceedings of the 27th Canadian Conference on Artificial Intelligence. Springer VS. 2014. p. 179-190. (Lecture Notes in Computer Science, Vol. 8436). https://doi.org/10.1007/978-3-319-06483-3_16