TY - JOUR
T1 - Variational autoencoders learn transferrable representations of metabolomics data
AU - Gomari, Daniel P.
AU - Schweickart, Annalise
AU - Cerchietti, Leandro
AU - Paietta, Elisabeth
AU - Fernandez, Hugo
AU - Al-Amin, Hassen
AU - Suhre, Karsten
AU - Krumsiek, Jan
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.
AB - Dimensionality reduction approaches are commonly used for the deconvolution of high-dimensional metabolomics datasets into underlying core metabolic processes. However, current state-of-the-art methods are widely incapable of detecting nonlinearities in metabolomics data. Variational Autoencoders (VAEs) are a deep learning method designed to learn nonlinear latent representations which generalize to unseen data. Here, we trained a VAE on a large-scale metabolomics population cohort of human blood samples consisting of over 4500 individuals. We analyzed the pathway composition of the latent space using a global feature importance score, which demonstrated that latent dimensions represent distinct cellular processes. To demonstrate model generalizability, we generated latent representations of unseen metabolomics datasets on type 2 diabetes, acute myeloid leukemia, and schizophrenia and found significant correlations with clinical patient groups. Notably, the VAE representations showed stronger effects than latent dimensions derived by linear and non-linear principal component analysis. Taken together, we demonstrate that the VAE is a powerful method that learns biologically meaningful, nonlinear, and transferrable latent representations of metabolomics data.
UR - http://www.scopus.com/inward/record.url?scp=85133137437&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133137437&partnerID=8YFLogxK
U2 - 10.1038/s42003-022-03579-3
DO - 10.1038/s42003-022-03579-3
M3 - Article
C2 - 35773471
AN - SCOPUS:85133137437
SN - 2399-3642
VL - 5
JO - Communications Biology
JF - Communications Biology
IS - 1
M1 - 645
ER -