The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE

Aaron Golden, Andrew S. McLellan, Robert A. Dubin, Qiang Jing, Pilib Ó Broin, David Moskowitz, Zhengdong Zhang, Masako Suzuki, Joseph Hargitai, R. Brent Calder, John M. Greally

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface.WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ∼ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.

Original languageEnglish (US)
Title of host publicationStudies in Health Technology and Informatics
Pages182-191
Number of pages10
Volume175
DOIs
StatePublished - 2012
Event10th HealthGrid Conference and the 4th International Workshop on Science Gateways for Life Sciences, IWSG-Life 2012 - Amsterdam, Netherlands
Duration: May 21 2012May 25 2012

Other

Other10th HealthGrid Conference and the 4th International Workshop on Science Gateways for Life Sciences, IWSG-Life 2012
CountryNetherlands
CityAmsterdam
Period5/21/125/25/12

Fingerprint

Biological Science Disciplines
Computing Methodologies
Genes
Throughput
Genome
Middleware
Technology
High-Throughput Nucleotide Sequencing
Workflow
Information Storage and Retrieval
Human Genome
Genomics
Epigenomics
Storage management
Cluster computing
Physiology
Medicine
Graphical user interfaces
Metadata
Interfaces (computer)

Keywords

  • Genomics
  • Grid Computing
  • Integrative Analysis
  • Life Science Gateways
  • Massively Parallel Sequencing
  • XSEDE

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Golden, A., McLellan, A. S., Dubin, R. A., Jing, Q., Broin, P. Ó., Moskowitz, D., ... Greally, J. M. (2012). The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE. In Studies in Health Technology and Informatics (Vol. 175, pp. 182-191) https://doi.org/10.3233/978-1-61499-054-3-182

The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE. / Golden, Aaron; McLellan, Andrew S.; Dubin, Robert A.; Jing, Qiang; Broin, Pilib Ó; Moskowitz, David; Zhang, Zhengdong; Suzuki, Masako; Hargitai, Joseph; Calder, R. Brent; Greally, John M.

Studies in Health Technology and Informatics. Vol. 175 2012. p. 182-191.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Golden, A, McLellan, AS, Dubin, RA, Jing, Q, Broin, PÓ, Moskowitz, D, Zhang, Z, Suzuki, M, Hargitai, J, Calder, RB & Greally, JM 2012, The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE. in Studies in Health Technology and Informatics. vol. 175, pp. 182-191, 10th HealthGrid Conference and the 4th International Workshop on Science Gateways for Life Sciences, IWSG-Life 2012, Amsterdam, Netherlands, 5/21/12. https://doi.org/10.3233/978-1-61499-054-3-182
Golden A, McLellan AS, Dubin RA, Jing Q, Broin PÓ, Moskowitz D et al. The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE. In Studies in Health Technology and Informatics. Vol. 175. 2012. p. 182-191 https://doi.org/10.3233/978-1-61499-054-3-182
Golden, Aaron ; McLellan, Andrew S. ; Dubin, Robert A. ; Jing, Qiang ; Broin, Pilib Ó ; Moskowitz, David ; Zhang, Zhengdong ; Suzuki, Masako ; Hargitai, Joseph ; Calder, R. Brent ; Greally, John M. / The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE. Studies in Health Technology and Informatics. Vol. 175 2012. pp. 182-191
@inproceedings{11f566c288674c05b1066b448d707c15,
title = "The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE",
abstract = "Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface.WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ∼ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.",
keywords = "Genomics, Grid Computing, Integrative Analysis, Life Science Gateways, Massively Parallel Sequencing, XSEDE",
author = "Aaron Golden and McLellan, {Andrew S.} and Dubin, {Robert A.} and Qiang Jing and Broin, {Pilib {\'O}} and David Moskowitz and Zhengdong Zhang and Masako Suzuki and Joseph Hargitai and Calder, {R. Brent} and Greally, {John M.}",
year = "2012",
doi = "10.3233/978-1-61499-054-3-182",
language = "English (US)",
isbn = "9781614990536",
volume = "175",
pages = "182--191",
booktitle = "Studies in Health Technology and Informatics",

}

TY - GEN

T1 - The Einstein Genome Gateway using WASP - A high throughput multi-layered life sciences portal for XSEDE

AU - Golden, Aaron

AU - McLellan, Andrew S.

AU - Dubin, Robert A.

AU - Jing, Qiang

AU - Broin, Pilib Ó

AU - Moskowitz, David

AU - Zhang, Zhengdong

AU - Suzuki, Masako

AU - Hargitai, Joseph

AU - Calder, R. Brent

AU - Greally, John M.

PY - 2012

Y1 - 2012

N2 - Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface.WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ∼ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.

AB - Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface.WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ∼ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.

KW - Genomics

KW - Grid Computing

KW - Integrative Analysis

KW - Life Science Gateways

KW - Massively Parallel Sequencing

KW - XSEDE

UR - http://www.scopus.com/inward/record.url?scp=84866750381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866750381&partnerID=8YFLogxK

U2 - 10.3233/978-1-61499-054-3-182

DO - 10.3233/978-1-61499-054-3-182

M3 - Conference contribution

SN - 9781614990536

VL - 175

SP - 182

EP - 191

BT - Studies in Health Technology and Informatics

ER -