PseudoPipe: An automated pseudogene identification pipeline

Zhaolei Zhang, Nicholas Carriero, Deyou Zheng, John Karro, Paul M. Harrison, Mark Gerstein

Research output: Contribution to journalArticle

94 Scopus citations

Abstract

Motivation: Mammalian genomes contain many 'genomic fossils' i.e. pseudogenes. These are disabled copies of functional genes that have been retained in the genome by gene duplication or retrotransposition events. Pseudogenes are important resources in understanding the evolutionary history of genes and genomes. Results: We have developed a homology-based computational pipeline ('PseudoPipe') that can search a mammalian genome and identify pseudogene sequences in a comprehensive and consistent manner. The key steps in the pipeline involve using BLAST to rapidly cross-reference potential "parent" proteins against the intergenic regions of the genome and then processing the resulting "raw hits" - i.e. eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Finally, pseudogenes are classified based on a combination of criteria including homology, intron-exon structure, and existence of stop codons and frameshifts.

Original languageEnglish (US)
Pages (from-to)1437-1439
Number of pages3
JournalBioinformatics
Volume22
Issue number12
DOIs
StatePublished - Jun 15 2006
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'PseudoPipe: An automated pseudogene identification pipeline'. Together they form a unique fingerprint.

  • Cite this

    Zhang, Z., Carriero, N., Zheng, D., Karro, J., Harrison, P. M., & Gerstein, M. (2006). PseudoPipe: An automated pseudogene identification pipeline. Bioinformatics, 22(12), 1437-1439. https://doi.org/10.1093/bioinformatics/btl116