Transcribed processed pseudogenes in the human genome: An intermediate form of expressed retrosequence lacking protein-coding ability

Paul M. Harrison, Deyou Zheng, Zhaolei Zhang, Nicholas Carriero, Mark Gerstein

Research output: Contribution to journalArticle

138 Citations (Scopus)

Abstract

Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (TPψgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human Pψgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPψg), which is disabled but nonetheless transcribed. TPψgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPψgs. To elucidate their role, we identified human TPψgs by mapping expressed sequences onto Pψgs and, reciprocally, extracting TPψgs from known mRNAs. We consider only those Pψgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPψgs (∼4-6% of Pψgs). Proteins/transcripts with the highest numbers of homologous TPψgs generally have many homologous Pψgs and are abundantly expressed. TPψgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPψgs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPψgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPψgs are unlike other Pψgs and processed genes in the following ways (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPψgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPψgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPψgs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.

Original languageEnglish (US)
Pages (from-to)2374-2383
Number of pages10
JournalNucleic Acids Research
Volume33
Issue number8
DOIs
StatePublished - 2005
Externally publishedYes

Fingerprint

Pseudogenes
Human Genome
Proteins
Genes
Short Interspersed Nucleotide Elements
Untranslated Regions
Messenger RNA
Intergenic DNA
X Chromosome
Oligonucleotide Array Sequence Analysis
Gene Expression

ASJC Scopus subject areas

  • Genetics

Cite this

Transcribed processed pseudogenes in the human genome : An intermediate form of expressed retrosequence lacking protein-coding ability. / Harrison, Paul M.; Zheng, Deyou; Zhang, Zhaolei; Carriero, Nicholas; Gerstein, Mark.

In: Nucleic Acids Research, Vol. 33, No. 8, 2005, p. 2374-2383.

Research output: Contribution to journalArticle

Harrison, Paul M. ; Zheng, Deyou ; Zhang, Zhaolei ; Carriero, Nicholas ; Gerstein, Mark. / Transcribed processed pseudogenes in the human genome : An intermediate form of expressed retrosequence lacking protein-coding ability. In: Nucleic Acids Research. 2005 ; Vol. 33, No. 8. pp. 2374-2383.
@article{093015e19d2847dfb4e607d676527cbd,
title = "Transcribed processed pseudogenes in the human genome: An intermediate form of expressed retrosequence lacking protein-coding ability",
abstract = "Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (TPψgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human Pψgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPψg), which is disabled but nonetheless transcribed. TPψgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPψgs. To elucidate their role, we identified human TPψgs by mapping expressed sequences onto Pψgs and, reciprocally, extracting TPψgs from known mRNAs. We consider only those Pψgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPψgs (∼4-6{\%} of Pψgs). Proteins/transcripts with the highest numbers of homologous TPψgs generally have many homologous Pψgs and are abundantly expressed. TPψgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPψgs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPψgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPψgs are unlike other Pψgs and processed genes in the following ways (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5{\%} of human TPψgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPψgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPψgs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.",
author = "Harrison, {Paul M.} and Deyou Zheng and Zhaolei Zhang and Nicholas Carriero and Mark Gerstein",
year = "2005",
doi = "10.1093/nar/gki531",
language = "English (US)",
volume = "33",
pages = "2374--2383",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "8",

}

TY - JOUR

T1 - Transcribed processed pseudogenes in the human genome

T2 - An intermediate form of expressed retrosequence lacking protein-coding ability

AU - Harrison, Paul M.

AU - Zheng, Deyou

AU - Zhang, Zhaolei

AU - Carriero, Nicholas

AU - Gerstein, Mark

PY - 2005

Y1 - 2005

N2 - Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (TPψgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human Pψgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPψg), which is disabled but nonetheless transcribed. TPψgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPψgs. To elucidate their role, we identified human TPψgs by mapping expressed sequences onto Pψgs and, reciprocally, extracting TPψgs from known mRNAs. We consider only those Pψgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPψgs (∼4-6% of Pψgs). Proteins/transcripts with the highest numbers of homologous TPψgs generally have many homologous Pψgs and are abundantly expressed. TPψgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPψgs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPψgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPψgs are unlike other Pψgs and processed genes in the following ways (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPψgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPψgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPψgs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.

AB - Pseudogenes, in the case of protein-coding genes, are gene copies that have lost the ability to code for a protein; they are typically identified through annotation of disabled, decayed or incomplete protein-coding sequences. Processed pseudogenes (TPψgs) are made through mRNA retrotransposition. There is overwhelming genomic evidence for thousands of human Pψgs and also dozens of human processed genes that comprise complete retrotransposed copies of other genes. Here, we survey for an intermediate entity, the transcribed processed pseudogene (TPψg), which is disabled but nonetheless transcribed. TPψgs may affect expression of paralogous genes, as observed in the case of the mouse makorin1-p1 TPψgs. To elucidate their role, we identified human TPψgs by mapping expressed sequences onto Pψgs and, reciprocally, extracting TPψgs from known mRNAs. We consider only those Pψgs that are homologous to either non-mammalian eukaryotic proteins or protein domains of known structure, and require detection of identical coding-sequence disablements in both the expressed and genomic sequences. Oligonucleotide microarray data provide further expression verification. Overall, we find 166-233 TPψgs (∼4-6% of Pψgs). Proteins/transcripts with the highest numbers of homologous TPψgs generally have many homologous Pψgs and are abundantly expressed. TPψgs are significantly over-represented near both the 5′ and 3′ ends of genes; this suggests that TPψgs can be formed through gene-promoter co-option, or intrusion into untranslated regions. However, roughly half of the TPψgs are located away from genes in the intergenic DNA and thus may be co-opting cryptic promoters of undesignated origin. Furthermore, TPψgs are unlike other Pψgs and processed genes in the following ways (i) they do not show a significant tendency to either deposit on or originate from the X chromosome; (ii) only 5% of human TPψgs have potential orthologs in mouse. This latter finding indicates that the vast majority of TPψgs is lineage specific. This is likely linked to well-documented extensive lineage-specific SINE/LINE activity. The list of TPψgs is available at: http://www.biology.mcgill.ca/faculty/harrison/tppg/bppg.tov (or) http:pseudogene.org.

UR - http://www.scopus.com/inward/record.url?scp=17844396017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17844396017&partnerID=8YFLogxK

U2 - 10.1093/nar/gki531

DO - 10.1093/nar/gki531

M3 - Article

C2 - 15860774

AN - SCOPUS:17844396017

VL - 33

SP - 2374

EP - 2383

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 8

ER -