Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles

Richard Röttger, Prabhav Kalaghatgi, Peng Sun, Siomar de Castro Soares, Vasco Azevedo, Tobias Wittkop, Jan Baumbach

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles.
Original languageEnglish
JournalBioinformatics
Volume29
Issue number2
Pages (from-to)215-22
ISSN1367-4803
DOIs
Publication statusPublished - 15 Jan 2013

Fingerprint

Virulence
Computational Biology
Cluster Analysis
Proteins

Keywords

  • Actinobacteria
  • Algorithms
  • Bacterial Proteins
  • Cluster Analysis
  • Genome, Bacterial
  • Models, Genetic
  • Phylogeny
  • Sequence Alignment
  • Sequence Homology, Amino Acid

Cite this

Röttger, Richard ; Kalaghatgi, Prabhav ; Sun, Peng ; Soares, Siomar de Castro ; Azevedo, Vasco ; Wittkop, Tobias ; Baumbach, Jan. / Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles. In: Bioinformatics. 2013 ; Vol. 29, No. 2. pp. 215-22.
@article{9e97de3b8405444aa16d5cb86fcd0dec,
title = "Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles",
abstract = "Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles.",
keywords = "Actinobacteria, Algorithms, Bacterial Proteins, Cluster Analysis, Genome, Bacterial, Models, Genetic, Phylogeny, Sequence Alignment, Sequence Homology, Amino Acid",
author = "Richard R{\"o}ttger and Prabhav Kalaghatgi and Peng Sun and Soares, {Siomar de Castro} and Vasco Azevedo and Tobias Wittkop and Jan Baumbach",
year = "2013",
month = "1",
day = "15",
doi = "10.1093/bioinformatics/bts653",
language = "English",
volume = "29",
pages = "215--22",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Heinemann",
number = "2",

}

Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles. / Röttger, Richard; Kalaghatgi, Prabhav; Sun, Peng; Soares, Siomar de Castro; Azevedo, Vasco; Wittkop, Tobias; Baumbach, Jan.

In: Bioinformatics, Vol. 29, No. 2, 15.01.2013, p. 215-22.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - Density parameter estimation for finding clusters of homologous proteins-tracing actinobacterial pathogenicity lifestyles

AU - Röttger, Richard

AU - Kalaghatgi, Prabhav

AU - Sun, Peng

AU - Soares, Siomar de Castro

AU - Azevedo, Vasco

AU - Wittkop, Tobias

AU - Baumbach, Jan

PY - 2013/1/15

Y1 - 2013/1/15

N2 - Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles.

AB - Homology detection is a long-standing challenge in computational biology. To tackle this problem, typically all-versus-all BLAST results are coupled with data partitioning approaches resulting in clusters of putative homologous proteins. One of the main problems, however, has been widely neglected: all clustering tools need a density parameter that adjusts the number and size of the clusters. This parameter is crucial but hard to estimate without gold standard data at hand. Developing a gold standard, however, is a difficult and time consuming task. Having a reliable method for detecting clusters of homologous proteins between a huge set of species would open opportunities for better understanding the genetic repertoire of bacteria with different lifestyles.

KW - Actinobacteria

KW - Algorithms

KW - Bacterial Proteins

KW - Cluster Analysis

KW - Genome, Bacterial

KW - Models, Genetic

KW - Phylogeny

KW - Sequence Alignment

KW - Sequence Homology, Amino Acid

U2 - 10.1093/bioinformatics/bts653

DO - 10.1093/bioinformatics/bts653

M3 - Journal article

VL - 29

SP - 215

EP - 222

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 2

ER -