Abstract
The microbiome, the community of microorganisms living within an individual, is a promis-ing avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: ran-dom forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convo-lutional net's performance complements that of the random forest, showing a lower propen-sity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes.
Original language | English (US) |
---|---|
Pages (from-to) | 55-66 |
Number of pages | 12 |
Journal | Pacific Symposium on Biocomputing |
Volume | 25 |
Issue number | 2020 |
State | Published - 2020 |
Event | 25th Pacific Symposium on Biocomputing, PSB 2020 - Big Island, United States Duration: Jan 3 2020 → Jan 7 2020 |
Keywords
- Machine learning
- Metagenomics
- Microbiome
ASJC Scopus subject areas
- Biomedical Engineering
- Computational Theory and Mathematics