Guiding biomedical clustering with ClustEval

Christian Wiwie, Jan Baumbach*, Richard Röttger

*Kontaktforfatter

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Abstrakt

Clustering is a popular technique for discovering groups of similar objects in large datasets. It is nowadays applied in all areas of life sciences, from biomedicine to physics. However, designing high-quality cluster analyses is a tedious and complicated task with manifold choices along the way. As a cluster analysis is often the first step of a succeeding downstream analysis, the clustering must be reliable, reproducible, and of the highest quality. To address these challenges, we recently developed ClustEval, an integrated and extensible platform for the automated and standardized design and execution of complex cluster analyses. It allows researchers to design and carry out cluster analyses involving a large number of clustering methods applied to many, large datasets. ClustEval helps to shed light on all major aspects of cluster analysis, from choosing the right similarity function to using validity indices and data preprocessing protocols. Only this high degree of automation allows the researcher to easily run a clustering task with many different tools, parameters, and settings in order to gain the best possible outcome. In this paper, we guide the user step by step through three fundamentally important and widely applicable use cases: (i) identification of the best clustering method for a new, user-given protein sequence similarity dataset; (ii) evaluation of the performance of a new, user-given clustering method (densityCut) against the state of the art; and (iii) prediction of the best method for a new protein sequence similarity dataset. This protocol guides the user through the most important features of ClustEval and takes â 1/44 h to complete.

OriginalsprogEngelsk
TidsskriftNature Protocols
Vol/bind13
Udgave nummer6
Sider (fra-til)1429-1444
ISSN1754-2189
DOI
StatusUdgivet - 1. jun. 2018

Fingeraftryk

Dyk ned i forskningsemnerne om 'Guiding biomedical clustering with ClustEval'. Sammen danner de et unikt fingeraftryk.

Citationsformater