Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome

Shuba Gopal, Mark Schroeder, Ursula Pieper, Alexander Sczyrba, Gulriz Aytekin-Kurban, Stefan Bekiranov, Jorge E. Fajardo, Narayanan Eswar, Roberto Sanchez, Andrej Sali, Terry Gaasterland

Research output: Contribution to journalArticle

52 Citations (Scopus)

Abstract

The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms, including human.

Original languageEnglish (US)
Pages (from-to)337-340
Number of pages4
JournalNature Genetics
Volume27
Issue number3
DOIs
StatePublished - 2001
Externally publishedYes

Fingerprint

Drosophila melanogaster
Genome
Genes
Proteins
Expressed Sequence Tags
Complementary DNA

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics

Cite this

Gopal, S., Schroeder, M., Pieper, U., Sczyrba, A., Aytekin-Kurban, G., Bekiranov, S., ... Gaasterland, T. (2001). Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome. Nature Genetics, 27(3), 337-340. https://doi.org/10.1038/85922

Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome. / Gopal, Shuba; Schroeder, Mark; Pieper, Ursula; Sczyrba, Alexander; Aytekin-Kurban, Gulriz; Bekiranov, Stefan; Fajardo, Jorge E.; Eswar, Narayanan; Sanchez, Roberto; Sali, Andrej; Gaasterland, Terry.

In: Nature Genetics, Vol. 27, No. 3, 2001, p. 337-340.

Research output: Contribution to journalArticle

Gopal, S, Schroeder, M, Pieper, U, Sczyrba, A, Aytekin-Kurban, G, Bekiranov, S, Fajardo, JE, Eswar, N, Sanchez, R, Sali, A & Gaasterland, T 2001, 'Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome', Nature Genetics, vol. 27, no. 3, pp. 337-340. https://doi.org/10.1038/85922
Gopal S, Schroeder M, Pieper U, Sczyrba A, Aytekin-Kurban G, Bekiranov S et al. Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome. Nature Genetics. 2001;27(3):337-340. https://doi.org/10.1038/85922
Gopal, Shuba ; Schroeder, Mark ; Pieper, Ursula ; Sczyrba, Alexander ; Aytekin-Kurban, Gulriz ; Bekiranov, Stefan ; Fajardo, Jorge E. ; Eswar, Narayanan ; Sanchez, Roberto ; Sali, Andrej ; Gaasterland, Terry. / Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome. In: Nature Genetics. 2001 ; Vol. 27, No. 3. pp. 337-340.
@article{a22104a2c2de45e6b840698cf3e5e4bd,
title = "Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome",
abstract = "The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms, including human.",
author = "Shuba Gopal and Mark Schroeder and Ursula Pieper and Alexander Sczyrba and Gulriz Aytekin-Kurban and Stefan Bekiranov and Fajardo, {Jorge E.} and Narayanan Eswar and Roberto Sanchez and Andrej Sali and Terry Gaasterland",
year = "2001",
doi = "10.1038/85922",
language = "English (US)",
volume = "27",
pages = "337--340",
journal = "Nature Genetics",
issn = "1061-4036",
publisher = "Nature Publishing Group",
number = "3",

}

TY - JOUR

T1 - Homology-based annotation yields 1,042 new candidate genes in the drosophila melanogaster genome

AU - Gopal, Shuba

AU - Schroeder, Mark

AU - Pieper, Ursula

AU - Sczyrba, Alexander

AU - Aytekin-Kurban, Gulriz

AU - Bekiranov, Stefan

AU - Fajardo, Jorge E.

AU - Eswar, Narayanan

AU - Sanchez, Roberto

AU - Sali, Andrej

AU - Gaasterland, Terry

PY - 2001

Y1 - 2001

N2 - The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms, including human.

AB - The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes1. This annotation strategy is applicable to genomes of all organisms, including human.

UR - http://www.scopus.com/inward/record.url?scp=0035096629&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035096629&partnerID=8YFLogxK

U2 - 10.1038/85922

DO - 10.1038/85922

M3 - Article

C2 - 11242120

AN - SCOPUS:0035096629

VL - 27

SP - 337

EP - 340

JO - Nature Genetics

JF - Nature Genetics

SN - 1061-4036

IS - 3

ER -