Comparing the performance of biomedical clustering methods

Christian Wiwie, Jan Baumbach, Richard Röttger

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

Original languageEnglish
JournalNature Methods
Volume12
Issue number11
Pages (from-to)1033-8
ISSN1548-7091
DOIs
Publication statusPublished - Nov 2015

Fingerprint

Cluster Analysis
Computational methods
Gene expression
Reproducibility of Results
Research Personnel
Guidelines
Proteins
Datasets

Cite this

@article{8d9fe019f6534da98e9a907a13cc9e5a,
title = "Comparing the performance of biomedical clustering methods",
abstract = "Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.",
author = "Christian Wiwie and Jan Baumbach and Richard R{\"o}ttger",
year = "2015",
month = "11",
doi = "10.1038/nmeth.3583",
language = "English",
volume = "12",
pages = "1033--8",
journal = "Nature Methods",
issn = "1548-7091",
publisher = "Nature Publishing Group",
number = "11",

}

Comparing the performance of biomedical clustering methods. / Wiwie, Christian; Baumbach, Jan; Röttger, Richard.

In: Nature Methods, Vol. 12, No. 11, 11.2015, p. 1033-8.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Comparing the performance of biomedical clustering methods

AU - Wiwie, Christian

AU - Baumbach, Jan

AU - Röttger, Richard

PY - 2015/11

Y1 - 2015/11

N2 - Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

AB - Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.

U2 - 10.1038/nmeth.3583

DO - 10.1038/nmeth.3583

M3 - Journal article

VL - 12

SP - 1033

EP - 1038

JO - Nature Methods

JF - Nature Methods

SN - 1548-7091

IS - 11

ER -