Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution

Deyou Zheng, Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond, Woh Choo Siew, Yontao Lu, France Denoeud, Stylianos E. Antonarakis, Michael Snyder, Yijun Ruan, Chia Lin Wei, Thomas R. Gingeras, Roderic Guigó, Jennifer Harrow, Mark B. Gerstein

Research output: Contribution to journalArticle

148 Citations (Scopus)

Abstract

Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

Original languageEnglish (US)
Pages (from-to)839-851
Number of pages13
JournalGenome Research
Volume17
Issue number6
DOIs
StatePublished - Jun 2007
Externally publishedYes

Fingerprint

Encyclopedias
Pseudogenes
DNA
Genome
Primates
Gene Duplication

ASJC Scopus subject areas

  • Genetics

Cite this

Zheng, D., Frankish, A., Baertsch, R., Kapranov, P., Reymond, A., Siew, W. C., ... Gerstein, M. B. (2007). Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Research, 17(6), 839-851. https://doi.org/10.1101/gr.5586307

Pseudogenes in the ENCODE regions : Consensus annotation, analysis of transcription, and evolution. / Zheng, Deyou; Frankish, Adam; Baertsch, Robert; Kapranov, Philipp; Reymond, Alexandre; Siew, Woh Choo; Lu, Yontao; Denoeud, France; Antonarakis, Stylianos E.; Snyder, Michael; Ruan, Yijun; Wei, Chia Lin; Gingeras, Thomas R.; Guigó, Roderic; Harrow, Jennifer; Gerstein, Mark B.

In: Genome Research, Vol. 17, No. 6, 06.2007, p. 839-851.

Research output: Contribution to journalArticle

Zheng, D, Frankish, A, Baertsch, R, Kapranov, P, Reymond, A, Siew, WC, Lu, Y, Denoeud, F, Antonarakis, SE, Snyder, M, Ruan, Y, Wei, CL, Gingeras, TR, Guigó, R, Harrow, J & Gerstein, MB 2007, 'Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution', Genome Research, vol. 17, no. 6, pp. 839-851. https://doi.org/10.1101/gr.5586307
Zheng, Deyou ; Frankish, Adam ; Baertsch, Robert ; Kapranov, Philipp ; Reymond, Alexandre ; Siew, Woh Choo ; Lu, Yontao ; Denoeud, France ; Antonarakis, Stylianos E. ; Snyder, Michael ; Ruan, Yijun ; Wei, Chia Lin ; Gingeras, Thomas R. ; Guigó, Roderic ; Harrow, Jennifer ; Gerstein, Mark B. / Pseudogenes in the ENCODE regions : Consensus annotation, analysis of transcription, and evolution. In: Genome Research. 2007 ; Vol. 17, No. 6. pp. 839-851.
@article{14e99e38b0304d7fa1dfe92c48b28e44,
title = "Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution",
abstract = "Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are {"}genomic fossils{"} valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80{\%}) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.",
author = "Deyou Zheng and Adam Frankish and Robert Baertsch and Philipp Kapranov and Alexandre Reymond and Siew, {Woh Choo} and Yontao Lu and France Denoeud and Antonarakis, {Stylianos E.} and Michael Snyder and Yijun Ruan and Wei, {Chia Lin} and Gingeras, {Thomas R.} and Roderic Guig{\'o} and Jennifer Harrow and Gerstein, {Mark B.}",
year = "2007",
month = "6",
doi = "10.1101/gr.5586307",
language = "English (US)",
volume = "17",
pages = "839--851",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "6",

}

TY - JOUR

T1 - Pseudogenes in the ENCODE regions

T2 - Consensus annotation, analysis of transcription, and evolution

AU - Zheng, Deyou

AU - Frankish, Adam

AU - Baertsch, Robert

AU - Kapranov, Philipp

AU - Reymond, Alexandre

AU - Siew, Woh Choo

AU - Lu, Yontao

AU - Denoeud, France

AU - Antonarakis, Stylianos E.

AU - Snyder, Michael

AU - Ruan, Yijun

AU - Wei, Chia Lin

AU - Gingeras, Thomas R.

AU - Guigó, Roderic

AU - Harrow, Jennifer

AU - Gerstein, Mark B.

PY - 2007/6

Y1 - 2007/6

N2 - Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

AB - Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are "genomic fossils" valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome's structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

UR - http://www.scopus.com/inward/record.url?scp=34250377325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250377325&partnerID=8YFLogxK

U2 - 10.1101/gr.5586307

DO - 10.1101/gr.5586307

M3 - Article

C2 - 17568002

AN - SCOPUS:34250377325

VL - 17

SP - 839

EP - 851

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 6

ER -