A computational approach for identifying pseudogenes in the ENCODE regions.

Deyou Zheng, Mark B. Gerstein

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

Original languageEnglish (US)
JournalGenome Biology
Volume7 Suppl 1
StatePublished - 2006
Externally publishedYes

Fingerprint

Pseudogenes
pseudogenes
gene
Introns
exons
introns
Exons
mutation
Genes
genes
Mutation

ASJC Scopus subject areas

  • Genetics

Cite this

A computational approach for identifying pseudogenes in the ENCODE regions. / Zheng, Deyou; Gerstein, Mark B.

In: Genome Biology, Vol. 7 Suppl 1, 2006.

Research output: Contribution to journalArticle

@article{a38a7638e5de4b6595ae275089d1e01b,
title = "A computational approach for identifying pseudogenes in the ENCODE regions.",
abstract = "BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10{\%} of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.",
author = "Deyou Zheng and Gerstein, {Mark B.}",
year = "2006",
language = "English (US)",
volume = "7 Suppl 1",
journal = "Genome Biology",
issn = "1474-7596",
publisher = "BioMed Central",

}

TY - JOUR

T1 - A computational approach for identifying pseudogenes in the ENCODE regions.

AU - Zheng, Deyou

AU - Gerstein, Mark B.

PY - 2006

Y1 - 2006

N2 - BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

AB - BACKGROUND: Pseudogenes are inheritable genetic elements showing sequence similarity to functional genes but with deleterious mutations. We describe a computational pipeline for identifying them, which in contrast to previous work explicitly uses intron-exon structure in parent genes to classify pseudogenes. We require alignments between duplicated pseudogenes and their parents to span intron-exon junctions, and this can be used to distinguish between true duplicated and processed pseudogenes (with insertions). RESULTS: Applying our approach to the ENCODE regions, we identify about 160 pseudogenes, 10% of which have clear 'intron-exon' structure and are thus likely generated from recent duplications. CONCLUSION: Detailed examination of our results and comparison of our annotation with the GENCODE reference annotation demonstrate that our computation pipeline provides a good balance between identifying all pseudogenes and delineating the precise structure of duplicated genes.

UR - http://www.scopus.com/inward/record.url?scp=33748664359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748664359&partnerID=8YFLogxK

M3 - Article

C2 - 16925835

AN - SCOPUS:33748664359

VL - 7 Suppl 1

JO - Genome Biology

JF - Genome Biology

SN - 1474-7596

ER -