Variational autoencoders learn transferrable representations of metabolomics data

Daniel P. Gomari; Annalise Schweickart; Leandro Cerchietti; Elisabeth Paietta; Hugo Fernandez; Hassen Al-Amin; Karsten Suhre; Jan Krumsiek

doi:10.1038/s42003-022-03579-3

Variational autoencoders learn transferrable representations of metabolomics data

Daniel P. Gomari, Annalise Schweickart, Leandro Cerchietti, Elisabeth Paietta, Hugo Fernandez, Hassen Al-Amin, Karsten Suhre, Jan Krumsiek

Oncology

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.

Original language	English (US)
Article number	645
Journal	Communications Biology
Volume	5
Issue number	1
DOIs	https://doi.org/10.1038/s42003-022-03579-3
State	Published - Dec 2022

ASJC Scopus subject areas

Medicine (miscellaneous)
General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1038/s42003-022-03579-3

Cite this

@article{7cfc24320f774f33ad16ffea8c2f2b98,

title = "Variational autoencoders learn transferrable representations of metabolomics data",

abstract = "Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.",

author = "Gomari, {Daniel P.} and Annalise Schweickart and Leandro Cerchietti and Elisabeth Paietta and Hugo Fernandez and Hassen Al-Amin and Karsten Suhre and Jan Krumsiek",

note = "Publisher Copyright: {\textcopyright} 2022, The Author(s).",

year = "2022",

month = dec,

doi = "10.1038/s42003-022-03579-3",

language = "English (US)",

volume = "5",

journal = "Communications Biology",

issn = "2399-3642",

publisher = "Springer Nature",

number = "1",

}

TY - JOUR

T1 - Variational autoencoders learn transferrable representations of metabolomics data

AU - Gomari, Daniel P.

AU - Schweickart, Annalise

AU - Cerchietti, Leandro

AU - Paietta, Elisabeth

AU - Fernandez, Hugo

AU - Al-Amin, Hassen

AU - Suhre, Karsten

AU - Krumsiek, Jan

PY - 2022/12

Y1 - 2022/12

N2 - Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.

AB - Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.

UR - http://www.scopus.com/inward/record.url?scp=85133137437&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85133137437&partnerID=8YFLogxK

U2 - 10.1038/s42003-022-03579-3

DO - 10.1038/s42003-022-03579-3

M3 - Article

C2 - 35773471

AN - SCOPUS:85133137437

SN - 2399-3642

VL - 5

JO - Communications Biology

JF - Communications Biology

IS - 1

M1 - 645

ER -

Variational autoencoders learn transferrable representations of metabolomics data

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this