Model selection for semi-supervised clustering

Mojgan Pourrajabi, Arthur Zimek, Davoud Moulavi, Jörg Sander, Ricardo J.G.B. Campello, Randy Goebel

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

3 Downloads (Pure)

Abstract

Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information (labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.

Original languageEnglish
Title of host publicationAdvances in Database Technology - EDBT 2014 : Proc. 17th International Conference on Extending Database Technology
EditorsVincent Leroy, Vassilis Christophides, Vassilis Christophides, Stratos Idreos, Anastasios Kementsietsidis, Minos Garofalakis, Sihem Amer-Yahia
Publisher OpenProceedings
Publication date2014
Pages331-342
ISBN (Electronic)978-3-89318065-3
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event17th International Conference on Extending Database Technology - Athens, Greece
Duration: 24. Mar 201428. Mar 2014

Conference

Conference17th International Conference on Extending Database Technology
CountryGreece
CityAthens
Period24/03/201428/03/2014

Fingerprint

Labels
Acoustic waves

Cite this

Pourrajabi, M., Zimek, A., Moulavi, D., Sander, J., Campello, R. J. G. B., & Goebel, R. (2014). Model selection for semi-supervised clustering. In V. Leroy, V. Christophides, V. Christophides, S. Idreos, A. Kementsietsidis, M. Garofalakis, & S. Amer-Yahia (Eds.), Advances in Database Technology - EDBT 2014: Proc. 17th International Conference on Extending Database Technology (pp. 331-342). OpenProceedings. https://doi.org/10.5441/002/edbt.2014.31
Pourrajabi, Mojgan ; Zimek, Arthur ; Moulavi, Davoud ; Sander, Jörg ; Campello, Ricardo J.G.B. ; Goebel, Randy. / Model selection for semi-supervised clustering. Advances in Database Technology - EDBT 2014: Proc. 17th International Conference on Extending Database Technology. editor / Vincent Leroy ; Vassilis Christophides ; Vassilis Christophides ; Stratos Idreos ; Anastasios Kementsietsidis ; Minos Garofalakis ; Sihem Amer-Yahia. OpenProceedings, 2014. pp. 331-342
@inproceedings{2ada1a27017344bd85a37212b74979dc,
title = "Model selection for semi-supervised clustering",
abstract = "Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link{"} or \cannot-link{"}), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information (labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.",
author = "Mojgan Pourrajabi and Arthur Zimek and Davoud Moulavi and J{\"o}rg Sander and Campello, {Ricardo J.G.B.} and Randy Goebel",
year = "2014",
doi = "10.5441/002/edbt.2014.31",
language = "English",
pages = "331--342",
editor = "Vincent Leroy and Vassilis Christophides and Vassilis Christophides and Stratos Idreos and Anastasios Kementsietsidis and Minos Garofalakis and Sihem Amer-Yahia",
booktitle = "Advances in Database Technology - EDBT 2014",
publisher = "OpenProceedings",

}

Pourrajabi, M, Zimek, A, Moulavi, D, Sander, J, Campello, RJGB & Goebel, R 2014, Model selection for semi-supervised clustering. in V Leroy, V Christophides, V Christophides, S Idreos, A Kementsietsidis, M Garofalakis & S Amer-Yahia (eds), Advances in Database Technology - EDBT 2014: Proc. 17th International Conference on Extending Database Technology. OpenProceedings, pp. 331-342, 17th International Conference on Extending Database Technology, Athens, Greece, 24/03/2014. https://doi.org/10.5441/002/edbt.2014.31

Model selection for semi-supervised clustering. / Pourrajabi, Mojgan; Zimek, Arthur; Moulavi, Davoud; Sander, Jörg; Campello, Ricardo J.G.B.; Goebel, Randy.

Advances in Database Technology - EDBT 2014: Proc. 17th International Conference on Extending Database Technology. ed. / Vincent Leroy; Vassilis Christophides; Vassilis Christophides; Stratos Idreos; Anastasios Kementsietsidis; Minos Garofalakis; Sihem Amer-Yahia. OpenProceedings, 2014. p. 331-342.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

TY - GEN

T1 - Model selection for semi-supervised clustering

AU - Pourrajabi, Mojgan

AU - Zimek, Arthur

AU - Moulavi, Davoud

AU - Sander, Jörg

AU - Campello, Ricardo J.G.B.

AU - Goebel, Randy

PY - 2014

Y1 - 2014

N2 - Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information (labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.

AB - Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information (labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.

U2 - 10.5441/002/edbt.2014.31

DO - 10.5441/002/edbt.2014.31

M3 - Article in proceedings

AN - SCOPUS:85014342568

SP - 331

EP - 342

BT - Advances in Database Technology - EDBT 2014

A2 - Leroy, Vincent

A2 - Christophides, Vassilis

A2 - Christophides, Vassilis

A2 - Idreos, Stratos

A2 - Kementsietsidis, Anastasios

A2 - Garofalakis, Minos

A2 - Amer-Yahia, Sihem

PB - OpenProceedings

ER -

Pourrajabi M, Zimek A, Moulavi D, Sander J, Campello RJGB, Goebel R. Model selection for semi-supervised clustering. In Leroy V, Christophides V, Christophides V, Idreos S, Kementsietsidis A, Garofalakis M, Amer-Yahia S, editors, Advances in Database Technology - EDBT 2014: Proc. 17th International Conference on Extending Database Technology. OpenProceedings. 2014. p. 331-342 https://doi.org/10.5441/002/edbt.2014.31