PseudoPipe: An automated pseudogene identification pipeline

Zhaolei Zhang, Nicholas Carriero, Deyou Zheng, John Karro, Paul M. Harrison, Mark Gerstein

Research output: Contribution to journalArticle

88 Citations (Scopus)

Abstract

Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" - i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.

Original languageEnglish (US)
Pages (from-to)1437-1439
Number of pages3
JournalBioinformatics
Volume22
Issue number12
DOIs
StatePublished - Jun 15 2006
Externally publishedYes

Fingerprint

Pseudogenes
Genome
Pipelines
Genes
Gene
Homology
Intergenic DNA
Gene Duplication
Terminator Codon
Duplication
Hits
Introns
Genomics
Cluster Analysis
Exons
History
Clustering
Protein
Resources
Proteins

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Zhang, Z., Carriero, N., Zheng, D., Karro, J., Harrison, P. M., & Gerstein, M. (2006). PseudoPipe: An automated pseudogene identification pipeline. Bioinformatics, 22(12), 1437-1439. https://doi.org/10.1093/bioinformatics/btl116

PseudoPipe : An automated pseudogene identification pipeline. / Zhang, Zhaolei; Carriero, Nicholas; Zheng, Deyou; Karro, John; Harrison, Paul M.; Gerstein, Mark.

In: Bioinformatics, Vol. 22, No. 12, 15.06.2006, p. 1437-1439.

Research output: Contribution to journalArticle

Zhang, Z, Carriero, N, Zheng, D, Karro, J, Harrison, PM & Gerstein, M 2006, 'PseudoPipe: An automated pseudogene identification pipeline', Bioinformatics, vol. 22, no. 12, pp. 1437-1439. https://doi.org/10.1093/bioinformatics/btl116
Zhang Z, Carriero N, Zheng D, Karro J, Harrison PM, Gerstein M. PseudoPipe: An automated pseudogene identification pipeline. Bioinformatics. 2006 Jun 15;22(12):1437-1439. https://doi.org/10.1093/bioinformatics/btl116
Zhang, Zhaolei ; Carriero, Nicholas ; Zheng, Deyou ; Karro, John ; Harrison, Paul M. ; Gerstein, Mark. / PseudoPipe : An automated pseudogene identification pipeline. In: Bioinformatics. 2006 ; Vol. 22, No. 12. pp. 1437-1439.
@article{2601154388dc4ebd832caef6a3c51b0b,
title = "PseudoPipe: An automated pseudogene identification pipeline",
abstract = "Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential {"}parent{"} proteins against the intergenic regions of the genome and then processing the resulting {"}raw hits{"} - i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.",
author = "Zhaolei Zhang and Nicholas Carriero and Deyou Zheng and John Karro and Harrison, {Paul M.} and Mark Gerstein",
year = "2006",
month = "6",
day = "15",
doi = "10.1093/bioinformatics/btl116",
language = "English (US)",
volume = "22",
pages = "1437--1439",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - PseudoPipe

T2 - An automated pseudogene identification pipeline

AU - Zhang, Zhaolei

AU - Carriero, Nicholas

AU - Zheng, Deyou

AU - Karro, John

AU - Harrison, Paul M.

AU - Gerstein, Mark

PY - 2006/6/15

Y1 - 2006/6/15

N2 - Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" - i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.

AB - Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" - i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.

UR - http://www.scopus.com/inward/record.url?scp=33745614319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745614319&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btl116

DO - 10.1093/bioinformatics/btl116

M3 - Article

C2 - 16574694

AN - SCOPUS:33745614319

VL - 22

SP - 1437

EP - 1439

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -