TY - GEN
T1 - Using SAGA and the open science grid to search for aptamers
AU - Shieh, Kevin
AU - Broin, Pilib Ó
AU - Rhee, David
AU - Levy, Matthew
AU - Golden, Aaron
PY - 2014/1/1
Y1 - 2014/1/1
N2 - RNA aptamers are small oligonucleotide molecules whose composition and resulting folded structure enable them to bind with high affinity and high selectivity to target ligands and therefore hold great promise as potential therapeutic drugs. Functional aptamers are selected from a large, randomized initial library in a process known as SELEX (systematic evolution of ligands by exponential enrichment). This is an iterative process involving numerous rounds of binding, elution, and amplification against a specific target substrate. During each iteration-or round of selection-we enrich for the species with the highest binding affinity to the target. After multiple rounds, we ideally have an enriched aptamer library suitable for subsequent investigation. Modern techniques employ massively parallel sequencing, enabling the generation of large libraries (~106 sequences) in a matter of hours for each round of selection. As RNA is singlestranded, covariance models (CMs) are ideal for representing motifs in their secondary structures, allowing us to discover patterns within functional aptamer populations following each round. CMs have been implemented in Infernal, a program that infers RNA alignments based on RNA sequence and structure. Calibrating a single CM in Infernal can take several hours and is a significant performance bottleneck for our work. However, as each CM calculation is itself independently determined and requires defined processing and memory resources, their computation in parallel offers a potential solution to this problem. In this paper, we describe using the Open Science Grid (OSG) to facilitate the identification of aptamer motifs by running CM calibrations and refinements in parallel across up to ten OSG clients. We use the Simple API for Grid Applications (SAGA) to interface with OSG and manage job submissions and file transfers. When run in parallel, our results show a significant speed up, constrained by typical latencies and QoS associated with nominal OSG usage. Our work demonstrates the ability of SAGA and the OSG to assist in parallelizing solutions to complex sequencing-based biomedical challenges.
AB - RNA aptamers are small oligonucleotide molecules whose composition and resulting folded structure enable them to bind with high affinity and high selectivity to target ligands and therefore hold great promise as potential therapeutic drugs. Functional aptamers are selected from a large, randomized initial library in a process known as SELEX (systematic evolution of ligands by exponential enrichment). This is an iterative process involving numerous rounds of binding, elution, and amplification against a specific target substrate. During each iteration-or round of selection-we enrich for the species with the highest binding affinity to the target. After multiple rounds, we ideally have an enriched aptamer library suitable for subsequent investigation. Modern techniques employ massively parallel sequencing, enabling the generation of large libraries (~106 sequences) in a matter of hours for each round of selection. As RNA is singlestranded, covariance models (CMs) are ideal for representing motifs in their secondary structures, allowing us to discover patterns within functional aptamer populations following each round. CMs have been implemented in Infernal, a program that infers RNA alignments based on RNA sequence and structure. Calibrating a single CM in Infernal can take several hours and is a significant performance bottleneck for our work. However, as each CM calculation is itself independently determined and requires defined processing and memory resources, their computation in parallel offers a potential solution to this problem. In this paper, we describe using the Open Science Grid (OSG) to facilitate the identification of aptamer motifs by running CM calibrations and refinements in parallel across up to ten OSG clients. We use the Simple API for Grid Applications (SAGA) to interface with OSG and manage job submissions and file transfers. When run in parallel, our results show a significant speed up, constrained by typical latencies and QoS associated with nominal OSG usage. Our work demonstrates the ability of SAGA and the OSG to assist in parallelizing solutions to complex sequencing-based biomedical challenges.
KW - Aptamers
KW - HTCondor
KW - High-throughput computing
KW - Open science grid
KW - Parallel computing
KW - SAGA
UR - http://www.scopus.com/inward/record.url?scp=84905496701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905496701&partnerID=8YFLogxK
U2 - 10.1145/2616498.2616517
DO - 10.1145/2616498.2616517
M3 - Conference contribution
AN - SCOPUS:84905496701
SN - 9781450328937
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the XSEDE 2014 Conference
PB - Association for Computing Machinery
T2 - 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014
Y2 - 13 July 2014 through 18 July 2014
ER -