A common open representation of mass spectrometry data and its application to proteomics research

Patrick G A Pedrioli, Jimmy K. Eng, Robert Hubley, Mathijs Vogelzang, Eric W. Deutsch, Brian Raught, Brian Pratt, Erik Nilsson, Ruth H. Angeletti, Rolf Apweiler, Kei Cheung, Catherine E. Costello, Henning Hermjakob, Sequin Huang, Randall K. Julian, Eugene Kapp, Mark E. McComb, Stephen G. Oliver, Gilbert Omenn, Norman W. Paton & 5 others Richard Simpson, Richard Smith, Chris F. Taylor, Weimin Zhu, Ruedi Aebersold

Research output: Contribution to journalArticle

574 Citations (Scopus)

Abstract

A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

Original languageEnglish (US)
Pages (from-to)1459-1466
Number of pages8
JournalNature Biotechnology
Volume22
Issue number11
DOIs
StatePublished - Nov 2004

Fingerprint

Mass spectrometers
XML
Proteomics
Mass spectrometry
Mass Spectrometry
Bioinformatics
Computational Biology
Research
Information Systems
Information management
Data structures
Publications
Software engineering
Language
Software
Experiments
Specifications
Datasets

ASJC Scopus subject areas

  • Microbiology

Cite this

Pedrioli, P. G. A., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., ... Aebersold, R. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology, 22(11), 1459-1466. https://doi.org/10.1038/nbt1031

A common open representation of mass spectrometry data and its application to proteomics research. / Pedrioli, Patrick G A; Eng, Jimmy K.; Hubley, Robert; Vogelzang, Mathijs; Deutsch, Eric W.; Raught, Brian; Pratt, Brian; Nilsson, Erik; Angeletti, Ruth H.; Apweiler, Rolf; Cheung, Kei; Costello, Catherine E.; Hermjakob, Henning; Huang, Sequin; Julian, Randall K.; Kapp, Eugene; McComb, Mark E.; Oliver, Stephen G.; Omenn, Gilbert; Paton, Norman W.; Simpson, Richard; Smith, Richard; Taylor, Chris F.; Zhu, Weimin; Aebersold, Ruedi.

In: Nature Biotechnology, Vol. 22, No. 11, 11.2004, p. 1459-1466.

Research output: Contribution to journalArticle

Pedrioli, PGA, Eng, JK, Hubley, R, Vogelzang, M, Deutsch, EW, Raught, B, Pratt, B, Nilsson, E, Angeletti, RH, Apweiler, R, Cheung, K, Costello, CE, Hermjakob, H, Huang, S, Julian, RK, Kapp, E, McComb, ME, Oliver, SG, Omenn, G, Paton, NW, Simpson, R, Smith, R, Taylor, CF, Zhu, W & Aebersold, R 2004, 'A common open representation of mass spectrometry data and its application to proteomics research', Nature Biotechnology, vol. 22, no. 11, pp. 1459-1466. https://doi.org/10.1038/nbt1031
Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B et al. A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology. 2004 Nov;22(11):1459-1466. https://doi.org/10.1038/nbt1031
Pedrioli, Patrick G A ; Eng, Jimmy K. ; Hubley, Robert ; Vogelzang, Mathijs ; Deutsch, Eric W. ; Raught, Brian ; Pratt, Brian ; Nilsson, Erik ; Angeletti, Ruth H. ; Apweiler, Rolf ; Cheung, Kei ; Costello, Catherine E. ; Hermjakob, Henning ; Huang, Sequin ; Julian, Randall K. ; Kapp, Eugene ; McComb, Mark E. ; Oliver, Stephen G. ; Omenn, Gilbert ; Paton, Norman W. ; Simpson, Richard ; Smith, Richard ; Taylor, Chris F. ; Zhu, Weimin ; Aebersold, Ruedi. / A common open representation of mass spectrometry data and its application to proteomics research. In: Nature Biotechnology. 2004 ; Vol. 22, No. 11. pp. 1459-1466.
@article{96e4d82e0021465899a78eb22dd035c7,
title = "A common open representation of mass spectrometry data and its application to proteomics research",
abstract = "A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.",
author = "Pedrioli, {Patrick G A} and Eng, {Jimmy K.} and Robert Hubley and Mathijs Vogelzang and Deutsch, {Eric W.} and Brian Raught and Brian Pratt and Erik Nilsson and Angeletti, {Ruth H.} and Rolf Apweiler and Kei Cheung and Costello, {Catherine E.} and Henning Hermjakob and Sequin Huang and Julian, {Randall K.} and Eugene Kapp and McComb, {Mark E.} and Oliver, {Stephen G.} and Gilbert Omenn and Paton, {Norman W.} and Richard Simpson and Richard Smith and Taylor, {Chris F.} and Weimin Zhu and Ruedi Aebersold",
year = "2004",
month = "11",
doi = "10.1038/nbt1031",
language = "English (US)",
volume = "22",
pages = "1459--1466",
journal = "Biotechnology",
issn = "1087-0156",
publisher = "Nature Publishing Group",
number = "11",

}

TY - JOUR

T1 - A common open representation of mass spectrometry data and its application to proteomics research

AU - Pedrioli, Patrick G A

AU - Eng, Jimmy K.

AU - Hubley, Robert

AU - Vogelzang, Mathijs

AU - Deutsch, Eric W.

AU - Raught, Brian

AU - Pratt, Brian

AU - Nilsson, Erik

AU - Angeletti, Ruth H.

AU - Apweiler, Rolf

AU - Cheung, Kei

AU - Costello, Catherine E.

AU - Hermjakob, Henning

AU - Huang, Sequin

AU - Julian, Randall K.

AU - Kapp, Eugene

AU - McComb, Mark E.

AU - Oliver, Stephen G.

AU - Omenn, Gilbert

AU - Paton, Norman W.

AU - Simpson, Richard

AU - Smith, Richard

AU - Taylor, Chris F.

AU - Zhu, Weimin

AU - Aebersold, Ruedi

PY - 2004/11

Y1 - 2004/11

N2 - A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

AB - A broad range of mass spectrometers are used in mass spectrometry (MS)-based proteomics research. Each type of instrument possesses a unique design, data system and performance specifications, resulting in strengths and weaknesses for different types of experiments. Unfortunately, the native binary data formats produced by each type of mass spectrometer also differ and are usually proprietary. The diverse, nontransparent nature of the data structure complicates the integration of new instruments into preexisting infrastructure, impedes the analysis, exchange, comparison and publication of results from different experiments and laboratories, and prevents the bioinformatics community from accessing data sets required for software development. Here, we introduce the 'mzXML' format, an open, generic XML (extensible markup language) representation of MS data. We have also developed an accompanying suite of supporting programs. We expect that this format will facilitate data management, interpretation and dissemination in proteomics research.

UR - http://www.scopus.com/inward/record.url?scp=8344284323&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=8344284323&partnerID=8YFLogxK

U2 - 10.1038/nbt1031

DO - 10.1038/nbt1031

M3 - Article

VL - 22

SP - 1459

EP - 1466

JO - Biotechnology

JF - Biotechnology

SN - 1087-0156

IS - 11

ER -