On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families

Christian Wiwie, Richard Röttger

Publikation: Kapitel i bog/rapport/konference-proceedingKonferencebidrag i proceedingsForskningpeer review


Over the last decades, we have observed an ongoing tremendous growth of available sequencing data fueled by the advancements in wet-lab technology. The sequencing information is only the beginning of the actual understanding of how organisms survive and prosper. It is, for instance, equally important to also unravel the proteomic repertoire of an organism. A classical computational approach for detecting protein families is a sequence-based similarity calculation coupled with a subsequent cluster analysis. In this work we have intensively analyzed various clustering tools on a large scale. We used the data to investigate the behavior of the tools' parameters underlining the diversity of the protein families. Furthermore, we trained regression models for predicting the expected performance of a clustering tool for an unknown data set and aimed to also suggest optimal parameters in an automated fashion. Our analysis demonstrates the benefits and limitations of the clustering of proteins with low sequence similarity indicating that each protein family requires its own distinct set of tools and parameters. All results, a tool prediction service, and additional supporting material is also available online under http://proteinclustering.compbio.sdu.dk.

TitelBiocomputing 2017 : Proceedings of the Pacific Symposium
RedaktørerRuss B Altman, A Keith Dunker, Lawrence Hunter, Marylyn Ritchie, Tiffany Murray, Teri Klein
ForlagWorld Scientific
ISBN (Trykt)978-981-3207-80-6
ISBN (Elektronisk)978-981-3207-82-0
StatusUdgivet - 2017
BegivenhedPacific Symposium on Biocomputing 2017 - Hawaii, USA
Varighed: 3. jan. 20177. jan. 2017
Konferencens nummer: 22


KonferencePacific Symposium on Biocomputing 2017


Dyk ned i forskningsemnerne om 'On the Power and Limits of Sequence Similarity Based Clustering of Proteins Into Families'. Sammen danner de et unikt fingeraftryk.