Computational analysis and experimental validation of gene predictions in Toxoplasma gondii

Joseph M. Dybas, Carlos J. Madrid-Aliste, Fa Yun Che, Edward Nieves, Dmitry Rykunov, Ruth Hogue Angeletti, Louis M. Weiss, Kami Kim, Andras Fiser

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Background: Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90% of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome. Methodology/Principal Findings: We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4% to 12%. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43%. Conclusions/Significance: This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.

Original languageEnglish (US)
Article numbere3899
JournalPLoS One
Volume3
Issue number12
DOIs
StatePublished - Dec 9 2008

Fingerprint

Toxoplasma
Toxoplasma gondii
Genes
Proteome
proteome
Proteomics
proteomics
prediction
genes
infection
tachyzoites
Asymptomatic Infections
Gene Duplication
Throughput
gene duplication
Infection
Protozoa
Mass Spectrometry
Proteins
exercise

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Computational analysis and experimental validation of gene predictions in Toxoplasma gondii. / Dybas, Joseph M.; Madrid-Aliste, Carlos J.; Che, Fa Yun; Nieves, Edward; Rykunov, Dmitry; Angeletti, Ruth Hogue; Weiss, Louis M.; Kim, Kami; Fiser, Andras.

In: PLoS One, Vol. 3, No. 12, e3899, 09.12.2008.

Research output: Contribution to journalArticle

Dybas, Joseph M. ; Madrid-Aliste, Carlos J. ; Che, Fa Yun ; Nieves, Edward ; Rykunov, Dmitry ; Angeletti, Ruth Hogue ; Weiss, Louis M. ; Kim, Kami ; Fiser, Andras. / Computational analysis and experimental validation of gene predictions in Toxoplasma gondii. In: PLoS One. 2008 ; Vol. 3, No. 12.
@article{bb6e72589f7a4df9a333ccb7cefe7cd1,
title = "Computational analysis and experimental validation of gene predictions in Toxoplasma gondii",
abstract = "Background: Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90{\%} of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome. Methodology/Principal Findings: We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4{\%} to 12{\%}. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43{\%}. Conclusions/Significance: This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.",
author = "Dybas, {Joseph M.} and Madrid-Aliste, {Carlos J.} and Che, {Fa Yun} and Edward Nieves and Dmitry Rykunov and Angeletti, {Ruth Hogue} and Weiss, {Louis M.} and Kami Kim and Andras Fiser",
year = "2008",
month = "12",
day = "9",
doi = "10.1371/journal.pone.0003899",
language = "English (US)",
volume = "3",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "12",

}

TY - JOUR

T1 - Computational analysis and experimental validation of gene predictions in Toxoplasma gondii

AU - Dybas, Joseph M.

AU - Madrid-Aliste, Carlos J.

AU - Che, Fa Yun

AU - Nieves, Edward

AU - Rykunov, Dmitry

AU - Angeletti, Ruth Hogue

AU - Weiss, Louis M.

AU - Kim, Kami

AU - Fiser, Andras

PY - 2008/12/9

Y1 - 2008/12/9

N2 - Background: Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90% of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome. Methodology/Principal Findings: We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4% to 12%. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43%. Conclusions/Significance: This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.

AB - Background: Toxoplasma gondii is an obligate intracellular protozoan that infects 20 to 90% of the population. It can cause both acute and chronic infections, many of which are asymptomatic, and, in immunocompromized hosts, can cause fatal infection due to reactivation from an asymptomatic chronic infection. An essential step towards understanding molecular mechanisms controlling transitions between the various life stages and identifying candidate drug targets is to accurately characterize the T. gondii proteome. Methodology/Principal Findings: We have explored the proteome of T. gondii tachyzoites with high throughput proteomics experiments and by comparison to publicly available cDNA sequence data. Mass spectrometry analysis validated 2,477 gene coding regions with 6,438 possible alternative gene predictions; approximately one third of the T. gondii proteome. The proteomics survey identified 609 proteins that are unique to Toxoplasma as compared to any known species including other Apicomplexan. Computational analysis identified 787 cases of possible gene duplication events and located at least 6,089 gene coding regions. Commonly used gene prediction algorithms produce very disparate sets of protein sequences, with pairwise overlaps ranging from 1.4% to 12%. Through this experimental and computational exercise we benchmarked gene prediction methods and observed false negative rates of 31 to 43%. Conclusions/Significance: This study not only provides the largest proteomics exploration of the T. gondii proteome, but illustrates how high throughput proteomics experiments can elucidate correct gene structures in genomes.

UR - http://www.scopus.com/inward/record.url?scp=57549115570&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57549115570&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0003899

DO - 10.1371/journal.pone.0003899

M3 - Article

VL - 3

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 12

M1 - e3899

ER -