Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

H Bjørn Nielsen, M. Almeida, A. S. Juncker, S. Rasmussen, J. H. Li, S. Sunagawa, D. R. Plichta, L. Gautier, A. G. Pedersen, E. Le Chatelier, E. Pelletier, I. Bonde, T. Nielsen, C. Manichanh, M. Arumugam, J. M. Batto, M. B. Q. dos Santos, N. Blom, N. Borruel, K. S. BurgdorfF. Boumezbeur, F. Casellas, J. Dore, P. Dworzynski, F. Guarner, Torben Hansen, F. Hildebrand, R. S. Kaas, S. Kennedy, K. Kristiansen, J. R. Kultima, P. Leonard, F. Levenez, O. Lund, B. Moumen, D. Le Paslier, N. Pons, O. Pedersen, E. Prifti, J. J. Qin, J. Raes, S. Sorensen, J. Tap, S. Tims, D. W. Ussery, T. Yamada, P. Renault, T. Sicheritz-Ponten, P. Bork, J. Wang, S. Brunak, S. D. Ehrlich, H. I. T. Consortium Meta

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

Resumé

Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
OriginalsprogEngelsk
TidsskriftNature Biotechnology
Vol/bind32
Udgave nummer8
Sider (fra-til)822-828
ISSN1087-0156
DOI
StatusUdgivet - 2014

Bibliografisk note

ISI Document Delivery No.: AW7PF Times Cited: 7 Cited Reference Count: 50 Nielsen, H. Bjorn Almeida, Mathieu Juncker, Agnieszka Sierakowska Rasmussen, Simon Li, Junhua Sunagawa, Shinichi Plichta, Damian R. Gautier, Laurent Pedersen, Anders G. Le Chatelier, Emmanuelle Pelletier, Eric Bonde, Ida Nielsen, Trine Manichanh, Chaysavanh Arumugam, Manimozhiyan Batto, Jean-Michel dos Santos, Marcelo B. Quintanilha Blom, Nikolaj Borruel, Natalia Burgdorf, Kristoffer S. Boumezbeur, Fouad Casellas, Francesc Dore, Joel Dworzynski, Piotr Guarner, Francisco Hansen, Torben Hildebrand, Falk Kaas, Rolf S. Kennedy, Sean Kristiansen, Karsten Kultima, Jens Roat Leonard, Pierre Levenez, Florence Lund, Ole Moumen, Bouziane Le Paslier, Denis Pons, Nicolas Pedersen, Oluf Prifti, Edi Qin, Junjie Raes, Jeroen Sorensen, Soren Tap, Julien Tims, Sebastian Ussery, David W. Yamada, Takuji Renault, Pierre Sicheritz-Ponten, Thomas Bork, Peer Wang, Jun Brunak, Soren Ehrlich, S. Dusko Veiga, Patrick/A-9862-2011; Blottiere, Herve/C-6120-2011; Lund, Ole/F-4437-2014; van Hylckama Vlieg, Johan/F-7887-2014 Lund, Ole/0000-0003-1108-0491; van Hylckama Vlieg, Johan/0000-0001-6656-8668 European Community's Seventh Framework Programme [FP7-HEALTH-F4-2007-201052, FP7-HEALTH-2010-261376]; Novo Nordisk Foundation Center for Biosustainability; OpenGPU FUI collaborative research projects; DGCIS; Instituto de Salud Carlos III (Spain); Ministere de la Recherche et de l'Education Nationale (France); [ANR-11-DPBS-0001] The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7-HEALTH-F4-2007-201052: Metagenomics of the Human Intestinal Tract (MetaHIT) and FP7-HEALTH-2010-261376: International Human Microbiome Standards, as well as the Novo Nordisk Foundation Center for Biosustainability. Work on the clustering concept has been supported by the OpenGPU FUI collaborative research projects, with funding from DGCIS. Researchers on the project were granted access to the HPC resources of CCRT under the allocation 2011-036707 made by GENCI (Grand Equipement National de Calcul Intensif). The company Alliance Services Plus (AS+) has provided help to scale up the process, especially, V. Arslan, D. Tello, V. Ducrot, T. Saidani and S. Monot. The authors affiliated with MGP are funded, in part, by the Metagenopolis ANR-11-DPBS-0001 grant. Ciberehd is funded by the Instituto de Salud Carlos III (Spain). M.A. was supported by a grant from the Ministere de la Recherche et de l'Education Nationale (France). 7 NATURE PUBLISHING GROUP NEW YORK NAT BIOTECHNOL

Citer dette

Nielsen, H. B., Almeida, M., Juncker, A. S., Rasmussen, S., Li, J. H., Sunagawa, S., ... Meta, H. I. T. C. (2014). Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nature Biotechnology, 32(8), 822-828. https://doi.org/10.1038/nbt.2939
Nielsen, H Bjørn ; Almeida, M. ; Juncker, A. S. ; Rasmussen, S. ; Li, J. H. ; Sunagawa, S. ; Plichta, D. R. ; Gautier, L. ; Pedersen, A. G. ; Le Chatelier, E. ; Pelletier, E. ; Bonde, I. ; Nielsen, T. ; Manichanh, C. ; Arumugam, M. ; Batto, J. M. ; dos Santos, M. B. Q. ; Blom, N. ; Borruel, N. ; Burgdorf, K. S. ; Boumezbeur, F. ; Casellas, F. ; Dore, J. ; Dworzynski, P. ; Guarner, F. ; Hansen, Torben ; Hildebrand, F. ; Kaas, R. S. ; Kennedy, S. ; Kristiansen, K. ; Kultima, J. R. ; Leonard, P. ; Levenez, F. ; Lund, O. ; Moumen, B. ; Le Paslier, D. ; Pons, N. ; Pedersen, O. ; Prifti, E. ; Qin, J. J. ; Raes, J. ; Sorensen, S. ; Tap, J. ; Tims, S. ; Ussery, D. W. ; Yamada, T. ; Renault, P. ; Sicheritz-Ponten, T. ; Bork, P. ; Wang, J. ; Brunak, S. ; Ehrlich, S. D. ; Meta, H. I. T. Consortium. / Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. I: Nature Biotechnology. 2014 ; Bind 32, Nr. 8. s. 822-828.
@article{6dd43d4277ad491592d10bfafed88134,
title = "Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes",
abstract = "Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.",
keywords = "SHORT READ ALIGNMENT SEQUENCES SYSTEMS ALGORITHMS MICROBIOTA PROTEIN LIFE SETS TREE TOOL",
author = "Nielsen, {H Bj{\o}rn} and M. Almeida and Juncker, {A. S.} and S. Rasmussen and Li, {J. H.} and S. Sunagawa and Plichta, {D. R.} and L. Gautier and Pedersen, {A. G.} and {Le Chatelier}, E. and E. Pelletier and I. Bonde and T. Nielsen and C. Manichanh and M. Arumugam and Batto, {J. M.} and {dos Santos}, {M. B. Q.} and N. Blom and N. Borruel and Burgdorf, {K. S.} and F. Boumezbeur and F. Casellas and J. Dore and P. Dworzynski and F. Guarner and Torben Hansen and F. Hildebrand and Kaas, {R. S.} and S. Kennedy and K. Kristiansen and Kultima, {J. R.} and P. Leonard and F. Levenez and O. Lund and B. Moumen and {Le Paslier}, D. and N. Pons and O. Pedersen and E. Prifti and Qin, {J. J.} and J. Raes and S. Sorensen and J. Tap and S. Tims and Ussery, {D. W.} and T. Yamada and P. Renault and T. Sicheritz-Ponten and P. Bork and J. Wang and S. Brunak and Ehrlich, {S. D.} and Meta, {H. I. T. Consortium}",
note = "ISI Document Delivery No.: AW7PF Times Cited: 7 Cited Reference Count: 50 Nielsen, H. Bjorn Almeida, Mathieu Juncker, Agnieszka Sierakowska Rasmussen, Simon Li, Junhua Sunagawa, Shinichi Plichta, Damian R. Gautier, Laurent Pedersen, Anders G. Le Chatelier, Emmanuelle Pelletier, Eric Bonde, Ida Nielsen, Trine Manichanh, Chaysavanh Arumugam, Manimozhiyan Batto, Jean-Michel dos Santos, Marcelo B. Quintanilha Blom, Nikolaj Borruel, Natalia Burgdorf, Kristoffer S. Boumezbeur, Fouad Casellas, Francesc Dore, Joel Dworzynski, Piotr Guarner, Francisco Hansen, Torben Hildebrand, Falk Kaas, Rolf S. Kennedy, Sean Kristiansen, Karsten Kultima, Jens Roat Leonard, Pierre Levenez, Florence Lund, Ole Moumen, Bouziane Le Paslier, Denis Pons, Nicolas Pedersen, Oluf Prifti, Edi Qin, Junjie Raes, Jeroen Sorensen, Soren Tap, Julien Tims, Sebastian Ussery, David W. Yamada, Takuji Renault, Pierre Sicheritz-Ponten, Thomas Bork, Peer Wang, Jun Brunak, Soren Ehrlich, S. Dusko Veiga, Patrick/A-9862-2011; Blottiere, Herve/C-6120-2011; Lund, Ole/F-4437-2014; van Hylckama Vlieg, Johan/F-7887-2014 Lund, Ole/0000-0003-1108-0491; van Hylckama Vlieg, Johan/0000-0001-6656-8668 European Community's Seventh Framework Programme [FP7-HEALTH-F4-2007-201052, FP7-HEALTH-2010-261376]; Novo Nordisk Foundation Center for Biosustainability; OpenGPU FUI collaborative research projects; DGCIS; Instituto de Salud Carlos III (Spain); Ministere de la Recherche et de l'Education Nationale (France); [ANR-11-DPBS-0001] The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7-HEALTH-F4-2007-201052: Metagenomics of the Human Intestinal Tract (MetaHIT) and FP7-HEALTH-2010-261376: International Human Microbiome Standards, as well as the Novo Nordisk Foundation Center for Biosustainability. Work on the clustering concept has been supported by the OpenGPU FUI collaborative research projects, with funding from DGCIS. Researchers on the project were granted access to the HPC resources of CCRT under the allocation 2011-036707 made by GENCI (Grand Equipement National de Calcul Intensif). The company Alliance Services Plus (AS+) has provided help to scale up the process, especially, V. Arslan, D. Tello, V. Ducrot, T. Saidani and S. Monot. The authors affiliated with MGP are funded, in part, by the Metagenopolis ANR-11-DPBS-0001 grant. Ciberehd is funded by the Instituto de Salud Carlos III (Spain). M.A. was supported by a grant from the Ministere de la Recherche et de l'Education Nationale (France). 7 NATURE PUBLISHING GROUP NEW YORK NAT BIOTECHNOL",
year = "2014",
doi = "10.1038/nbt.2939",
language = "English",
volume = "32",
pages = "822--828",
journal = "Nature Biotechnology",
issn = "1087-0156",
publisher = "Nature Publishing Group",
number = "8",

}

Nielsen, HB, Almeida, M, Juncker, AS, Rasmussen, S, Li, JH, Sunagawa, S, Plichta, DR, Gautier, L, Pedersen, AG, Le Chatelier, E, Pelletier, E, Bonde, I, Nielsen, T, Manichanh, C, Arumugam, M, Batto, JM, dos Santos, MBQ, Blom, N, Borruel, N, Burgdorf, KS, Boumezbeur, F, Casellas, F, Dore, J, Dworzynski, P, Guarner, F, Hansen, T, Hildebrand, F, Kaas, RS, Kennedy, S, Kristiansen, K, Kultima, JR, Leonard, P, Levenez, F, Lund, O, Moumen, B, Le Paslier, D, Pons, N, Pedersen, O, Prifti, E, Qin, JJ, Raes, J, Sorensen, S, Tap, J, Tims, S, Ussery, DW, Yamada, T, Renault, P, Sicheritz-Ponten, T, Bork, P, Wang, J, Brunak, S, Ehrlich, SD & Meta, HITC 2014, 'Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes', Nature Biotechnology, bind 32, nr. 8, s. 822-828. https://doi.org/10.1038/nbt.2939

Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. / Nielsen, H Bjørn; Almeida, M.; Juncker, A. S.; Rasmussen, S.; Li, J. H.; Sunagawa, S.; Plichta, D. R.; Gautier, L.; Pedersen, A. G.; Le Chatelier, E.; Pelletier, E.; Bonde, I.; Nielsen, T.; Manichanh, C.; Arumugam, M.; Batto, J. M.; dos Santos, M. B. Q.; Blom, N.; Borruel, N.; Burgdorf, K. S.; Boumezbeur, F.; Casellas, F.; Dore, J.; Dworzynski, P.; Guarner, F.; Hansen, Torben; Hildebrand, F.; Kaas, R. S.; Kennedy, S.; Kristiansen, K.; Kultima, J. R.; Leonard, P.; Levenez, F.; Lund, O.; Moumen, B.; Le Paslier, D.; Pons, N.; Pedersen, O.; Prifti, E.; Qin, J. J.; Raes, J.; Sorensen, S.; Tap, J.; Tims, S.; Ussery, D. W.; Yamada, T.; Renault, P.; Sicheritz-Ponten, T.; Bork, P.; Wang, J.; Brunak, S.; Ehrlich, S. D.; Meta, H. I. T. Consortium.

I: Nature Biotechnology, Bind 32, Nr. 8, 2014, s. 822-828.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

AU - Nielsen, H Bjørn

AU - Almeida, M.

AU - Juncker, A. S.

AU - Rasmussen, S.

AU - Li, J. H.

AU - Sunagawa, S.

AU - Plichta, D. R.

AU - Gautier, L.

AU - Pedersen, A. G.

AU - Le Chatelier, E.

AU - Pelletier, E.

AU - Bonde, I.

AU - Nielsen, T.

AU - Manichanh, C.

AU - Arumugam, M.

AU - Batto, J. M.

AU - dos Santos, M. B. Q.

AU - Blom, N.

AU - Borruel, N.

AU - Burgdorf, K. S.

AU - Boumezbeur, F.

AU - Casellas, F.

AU - Dore, J.

AU - Dworzynski, P.

AU - Guarner, F.

AU - Hansen, Torben

AU - Hildebrand, F.

AU - Kaas, R. S.

AU - Kennedy, S.

AU - Kristiansen, K.

AU - Kultima, J. R.

AU - Leonard, P.

AU - Levenez, F.

AU - Lund, O.

AU - Moumen, B.

AU - Le Paslier, D.

AU - Pons, N.

AU - Pedersen, O.

AU - Prifti, E.

AU - Qin, J. J.

AU - Raes, J.

AU - Sorensen, S.

AU - Tap, J.

AU - Tims, S.

AU - Ussery, D. W.

AU - Yamada, T.

AU - Renault, P.

AU - Sicheritz-Ponten, T.

AU - Bork, P.

AU - Wang, J.

AU - Brunak, S.

AU - Ehrlich, S. D.

AU - Meta, H. I. T. Consortium

N1 - ISI Document Delivery No.: AW7PF Times Cited: 7 Cited Reference Count: 50 Nielsen, H. Bjorn Almeida, Mathieu Juncker, Agnieszka Sierakowska Rasmussen, Simon Li, Junhua Sunagawa, Shinichi Plichta, Damian R. Gautier, Laurent Pedersen, Anders G. Le Chatelier, Emmanuelle Pelletier, Eric Bonde, Ida Nielsen, Trine Manichanh, Chaysavanh Arumugam, Manimozhiyan Batto, Jean-Michel dos Santos, Marcelo B. Quintanilha Blom, Nikolaj Borruel, Natalia Burgdorf, Kristoffer S. Boumezbeur, Fouad Casellas, Francesc Dore, Joel Dworzynski, Piotr Guarner, Francisco Hansen, Torben Hildebrand, Falk Kaas, Rolf S. Kennedy, Sean Kristiansen, Karsten Kultima, Jens Roat Leonard, Pierre Levenez, Florence Lund, Ole Moumen, Bouziane Le Paslier, Denis Pons, Nicolas Pedersen, Oluf Prifti, Edi Qin, Junjie Raes, Jeroen Sorensen, Soren Tap, Julien Tims, Sebastian Ussery, David W. Yamada, Takuji Renault, Pierre Sicheritz-Ponten, Thomas Bork, Peer Wang, Jun Brunak, Soren Ehrlich, S. Dusko Veiga, Patrick/A-9862-2011; Blottiere, Herve/C-6120-2011; Lund, Ole/F-4437-2014; van Hylckama Vlieg, Johan/F-7887-2014 Lund, Ole/0000-0003-1108-0491; van Hylckama Vlieg, Johan/0000-0001-6656-8668 European Community's Seventh Framework Programme [FP7-HEALTH-F4-2007-201052, FP7-HEALTH-2010-261376]; Novo Nordisk Foundation Center for Biosustainability; OpenGPU FUI collaborative research projects; DGCIS; Instituto de Salud Carlos III (Spain); Ministere de la Recherche et de l'Education Nationale (France); [ANR-11-DPBS-0001] The research leading to these results has received funding from the European Community's Seventh Framework Programme FP7-HEALTH-F4-2007-201052: Metagenomics of the Human Intestinal Tract (MetaHIT) and FP7-HEALTH-2010-261376: International Human Microbiome Standards, as well as the Novo Nordisk Foundation Center for Biosustainability. Work on the clustering concept has been supported by the OpenGPU FUI collaborative research projects, with funding from DGCIS. Researchers on the project were granted access to the HPC resources of CCRT under the allocation 2011-036707 made by GENCI (Grand Equipement National de Calcul Intensif). The company Alliance Services Plus (AS+) has provided help to scale up the process, especially, V. Arslan, D. Tello, V. Ducrot, T. Saidani and S. Monot. The authors affiliated with MGP are funded, in part, by the Metagenopolis ANR-11-DPBS-0001 grant. Ciberehd is funded by the Instituto de Salud Carlos III (Spain). M.A. was supported by a grant from the Ministere de la Recherche et de l'Education Nationale (France). 7 NATURE PUBLISHING GROUP NEW YORK NAT BIOTECHNOL

PY - 2014

Y1 - 2014

N2 - Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

AB - Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

KW - SHORT READ ALIGNMENT SEQUENCES SYSTEMS ALGORITHMS MICROBIOTA PROTEIN LIFE SETS TREE TOOL

U2 - 10.1038/nbt.2939

DO - 10.1038/nbt.2939

M3 - Journal article

VL - 32

SP - 822

EP - 828

JO - Nature Biotechnology

JF - Nature Biotechnology

SN - 1087-0156

IS - 8

ER -