On the limits of computational functional genomics for bacterial lifestyle prediction

Eudes Barbosa, Richard Röttger, Anne-Christin Hauschild, Vasco Azevedo, Jan Baumbach

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.

Original languageEnglish
JournalBriefings in Functional Genomics
Volume13
Issue number5
Pages (from-to)398-408
ISSN2041-2649
DOIs
Publication statusPublished - 22 May 2014

Fingerprint

Pathogens
Virulence
Genes
Computational Biology
Ecosystem
Sequence Analysis
Genomics
Mammals
Immune System
Observation
Mutation
Immune system
Bioinformatics

Cite this

Barbosa, Eudes ; Röttger, Richard ; Hauschild, Anne-Christin ; Azevedo, Vasco ; Baumbach, Jan. / On the limits of computational functional genomics for bacterial lifestyle prediction. In: Briefings in Functional Genomics. 2014 ; Vol. 13, No. 5. pp. 398-408.
@article{017691b37dd74b4f9b3fe608053aa7a5,
title = "On the limits of computational functional genomics for bacterial lifestyle prediction",
abstract = "We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.",
author = "Eudes Barbosa and Richard R{\"o}ttger and Anne-Christin Hauschild and Vasco Azevedo and Jan Baumbach",
note = "{\circledC} The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.",
year = "2014",
month = "5",
day = "22",
doi = "10.1093/bfgp/elu014",
language = "English",
volume = "13",
pages = "398--408",
journal = "Briefings in Functional Genomics",
issn = "2041-2649",
publisher = "Heinemann",
number = "5",

}

On the limits of computational functional genomics for bacterial lifestyle prediction. / Barbosa, Eudes; Röttger, Richard; Hauschild, Anne-Christin; Azevedo, Vasco; Baumbach, Jan.

In: Briefings in Functional Genomics, Vol. 13, No. 5, 22.05.2014, p. 398-408.

Research output: Contribution to journalJournal articleResearchpeer-review

TY - JOUR

T1 - On the limits of computational functional genomics for bacterial lifestyle prediction

AU - Barbosa, Eudes

AU - Röttger, Richard

AU - Hauschild, Anne-Christin

AU - Azevedo, Vasco

AU - Baumbach, Jan

N1 - © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

PY - 2014/5/22

Y1 - 2014/5/22

N2 - We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.

AB - We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs), broad-spectrum pathogens (BPs), opportunistic pathogens (OPs) and non-pathogenic (NP). We hypothesize: (H1) Pathogens (HPs and BPs) possess specific pathogenicity signature genes. (H2) The same holds for OPs. (H3) Broad-spectrum and exclusively HPs cannot be distinguished from each other because of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating OPs from both NPs (H2) and pathogens (H4). HPs may also not be distinguished from bacteria annotated as BPs based only on a small set of orthologous genes (H3), as many HPs might as well target a broad range of mammals but have not been annotated accordingly. In conclusion, we illustrate that even in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited.

U2 - 10.1093/bfgp/elu014

DO - 10.1093/bfgp/elu014

M3 - Journal article

VL - 13

SP - 398

EP - 408

JO - Briefings in Functional Genomics

JF - Briefings in Functional Genomics

SN - 2041-2649

IS - 5

ER -