Accurate genome relative abundance estimation based on shotgun metagenomic reads

Li C. Xia; Jacob A. Cram; Ting Chen; Jed A. Fuhrman; Fengzhu Sun

doi:10.1371/journal.pone.0027992

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Li C. Xia, Jacob A. Cram, Ting Chen, Jed A. Fuhrman, Fengzhu Sun

Research output: Contribution to journal › Article › peer-review

82 Scopus citations

Abstract

Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

Original language	English (US)
Article number	e27992
Journal	PloS one
Volume	6
Issue number	12
DOIs	https://doi.org/10.1371/journal.pone.0027992
State	Published - Dec 6 2011
Externally published	Yes

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences
General

Access to Document

10.1371/journal.pone.0027992

Cite this

@article{36c579ceaefb440e9fee7a96f11d4c6e,

title = "Accurate genome relative abundance estimation based on shotgun metagenomic reads",

abstract = "Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.",

author = "Xia, {Li C.} and Cram, {Jacob A.} and Ting Chen and Fuhrman, {Jed A.} and Fengzhu Sun",

year = "2011",

month = dec,

day = "6",

doi = "10.1371/journal.pone.0027992",

language = "English (US)",

volume = "6",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "12",

}

TY - JOUR

T1 - Accurate genome relative abundance estimation based on shotgun metagenomic reads

AU - Xia, Li C.

AU - Cram, Jacob A.

AU - Chen, Ting

AU - Fuhrman, Jed A.

AU - Sun, Fengzhu

PY - 2011/12/6

Y1 - 2011/12/6

N2 - Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

AB - Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

UR - http://www.scopus.com/inward/record.url?scp=82755170566&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82755170566&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0027992

DO - 10.1371/journal.pone.0027992

M3 - Article

C2 - 22162995

AN - SCOPUS:82755170566

SN - 1932-6203

VL - 6

JO - PloS one

JF - PloS one

IS - 12

M1 - e27992

ER -

Accurate genome relative abundance estimation based on shotgun metagenomic reads

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this