Predictive modeling of plant messenger RNA polyadenylation sites

Guoli Ji, Jianti Zheng, Yingjia Shen, Xiaohui Wu, Ronghan Jiang, Yun Lin, Johnny C. Loke, Kimberly M. Davis, Greg J. Reese, Qingshun Quinn Li

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Background: One of the essential processing events during pre-mRNA maturation is the post-transcriptional addition of a polyadenine [poly(A)] tail. The 3′-end poly(A) track protects mRNA from unregulated degradation, and indicates the integrity of mRNA through recognition by mRNA export and translation machinery. The position of a poly(A) site is predetermined by signals in the pre-mRNA sequence that are recognized by a complex of polyadenylation factors. These signals are generally tri-part sequence patterns around the cleavage site that serves as the future poly(A) site. In plants, there is little sequence conservation among these signal elements, which makes it difficult to develop an accurate algorithm to predict the poly(A) site of a given gene. We attempted to solve this problem. Results: Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97%. The accuracy of the program, called poly(A) site sleuth or PASS, has been demonstrated by the prediction of many validated poly(A) sites. PASS also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of PASS was demonstrated by predicting poly(A) sites within long genomic sequences. Conclusion: Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.

Original languageEnglish (US)
Article number43
JournalBMC Bioinformatics
Volume8
DOIs
StatePublished - 2007
Externally publishedYes

Fingerprint

Plant RNA
Predictive Modeling
Polyadenylation
Messenger RNA
Genes
Gene
RNA Precursors
Arabidopsis
Predict
mRNA Cleavage and Polyadenylation Factors
Molecular Sequence Annotation
Genetic engineering
Genetic Engineering
RNA Stability
Protein Biosynthesis
Hidden Markov models
Nucleotides
Transgenes
Crops
Machinery

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

Ji, G., Zheng, J., Shen, Y., Wu, X., Jiang, R., Lin, Y., ... Li, Q. Q. (2007). Predictive modeling of plant messenger RNA polyadenylation sites. BMC Bioinformatics, 8, [43]. https://doi.org/10.1186/1471-2105-8-43

Predictive modeling of plant messenger RNA polyadenylation sites. / Ji, Guoli; Zheng, Jianti; Shen, Yingjia; Wu, Xiaohui; Jiang, Ronghan; Lin, Yun; Loke, Johnny C.; Davis, Kimberly M.; Reese, Greg J.; Li, Qingshun Quinn.

In: BMC Bioinformatics, Vol. 8, 43, 2007.

Research output: Contribution to journalArticle

Ji, G, Zheng, J, Shen, Y, Wu, X, Jiang, R, Lin, Y, Loke, JC, Davis, KM, Reese, GJ & Li, QQ 2007, 'Predictive modeling of plant messenger RNA polyadenylation sites', BMC Bioinformatics, vol. 8, 43. https://doi.org/10.1186/1471-2105-8-43
Ji, Guoli ; Zheng, Jianti ; Shen, Yingjia ; Wu, Xiaohui ; Jiang, Ronghan ; Lin, Yun ; Loke, Johnny C. ; Davis, Kimberly M. ; Reese, Greg J. ; Li, Qingshun Quinn. / Predictive modeling of plant messenger RNA polyadenylation sites. In: BMC Bioinformatics. 2007 ; Vol. 8.
@article{89a63db7999743848d1a6f9e05dd75a3,
title = "Predictive modeling of plant messenger RNA polyadenylation sites",
abstract = "Background: One of the essential processing events during pre-mRNA maturation is the post-transcriptional addition of a polyadenine [poly(A)] tail. The 3′-end poly(A) track protects mRNA from unregulated degradation, and indicates the integrity of mRNA through recognition by mRNA export and translation machinery. The position of a poly(A) site is predetermined by signals in the pre-mRNA sequence that are recognized by a complex of polyadenylation factors. These signals are generally tri-part sequence patterns around the cleavage site that serves as the future poly(A) site. In plants, there is little sequence conservation among these signal elements, which makes it difficult to develop an accurate algorithm to predict the poly(A) site of a given gene. We attempted to solve this problem. Results: Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97{\%}. The accuracy of the program, called poly(A) site sleuth or PASS, has been demonstrated by the prediction of many validated poly(A) sites. PASS also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of PASS was demonstrated by predicting poly(A) sites within long genomic sequences. Conclusion: Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.",
author = "Guoli Ji and Jianti Zheng and Yingjia Shen and Xiaohui Wu and Ronghan Jiang and Yun Lin and Loke, {Johnny C.} and Davis, {Kimberly M.} and Reese, {Greg J.} and Li, {Qingshun Quinn}",
year = "2007",
doi = "10.1186/1471-2105-8-43",
language = "English (US)",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Predictive modeling of plant messenger RNA polyadenylation sites

AU - Ji, Guoli

AU - Zheng, Jianti

AU - Shen, Yingjia

AU - Wu, Xiaohui

AU - Jiang, Ronghan

AU - Lin, Yun

AU - Loke, Johnny C.

AU - Davis, Kimberly M.

AU - Reese, Greg J.

AU - Li, Qingshun Quinn

PY - 2007

Y1 - 2007

N2 - Background: One of the essential processing events during pre-mRNA maturation is the post-transcriptional addition of a polyadenine [poly(A)] tail. The 3′-end poly(A) track protects mRNA from unregulated degradation, and indicates the integrity of mRNA through recognition by mRNA export and translation machinery. The position of a poly(A) site is predetermined by signals in the pre-mRNA sequence that are recognized by a complex of polyadenylation factors. These signals are generally tri-part sequence patterns around the cleavage site that serves as the future poly(A) site. In plants, there is little sequence conservation among these signal elements, which makes it difficult to develop an accurate algorithm to predict the poly(A) site of a given gene. We attempted to solve this problem. Results: Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97%. The accuracy of the program, called poly(A) site sleuth or PASS, has been demonstrated by the prediction of many validated poly(A) sites. PASS also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of PASS was demonstrated by predicting poly(A) sites within long genomic sequences. Conclusion: Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.

AB - Background: One of the essential processing events during pre-mRNA maturation is the post-transcriptional addition of a polyadenine [poly(A)] tail. The 3′-end poly(A) track protects mRNA from unregulated degradation, and indicates the integrity of mRNA through recognition by mRNA export and translation machinery. The position of a poly(A) site is predetermined by signals in the pre-mRNA sequence that are recognized by a complex of polyadenylation factors. These signals are generally tri-part sequence patterns around the cleavage site that serves as the future poly(A) site. In plants, there is little sequence conservation among these signal elements, which makes it difficult to develop an accurate algorithm to predict the poly(A) site of a given gene. We attempted to solve this problem. Results: Based on our current working model and the profile of nucleotide sequence distribution of the poly(A) signals and around poly(A) sites in Arabidopsis, we have devised a Generalized Hidden Markov Model based algorithm to predict potential poly(A) sites. The high specificity and sensitivity of the algorithm were demonstrated by testing several datasets, and at the best combinations, both reach 97%. The accuracy of the program, called poly(A) site sleuth or PASS, has been demonstrated by the prediction of many validated poly(A) sites. PASS also predicted the changes of poly(A) site efficiency in poly(A) signal mutants that were constructed and characterized by traditional genetic experiments. The efficacy of PASS was demonstrated by predicting poly(A) sites within long genomic sequences. Conclusion: Based on the features of plant poly(A) signals, a computational model was built to effectively predict the poly(A) sites in Arabidopsis genes. The algorithm will be useful in gene annotation because a poly(A) site signifies the end of the transcript. This algorithm can also be used to predict alternative poly(A) sites in known genes, and will be useful in the design of transgenes for crop genetic engineering by predicting and eliminating undesirable poly(A) sites.

UR - http://www.scopus.com/inward/record.url?scp=33847669324&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33847669324&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-43

DO - 10.1186/1471-2105-8-43

M3 - Article

C2 - 17286857

AN - SCOPUS:33847669324

VL - 8

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 43

ER -