Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research

Caroline J. Zeiss, Dongwook Shin, Brent Vander Wyk, Amanda P. Beck, Natalie Zatz, Charles A. Sneiderman, Halil Kilicoglu

Research output: Contribution to journalArticle

Abstract

Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

Original languageEnglish (US)
Article numbere0226176
JournalPloS one
Volume14
Issue number12
DOIs
StatePublished - Dec 2019

Fingerprint

Data Mining
translation (genetics)
Parkinson Disease
Animals
Animal Models
Natural Language Processing
National Library of Medicine (U.S.)
Parkinson disease
Program Evaluation
Research
PubMed
Neurodegenerative Diseases
Genes
Biomedical Research
National Library of Medicine
animals
animal models
Outcome Assessment (Health Care)
program evaluation
modifiers (genes)

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Cite this

Zeiss, C. J., Shin, D., Wyk, B. V., Beck, A. P., Zatz, N., Sneiderman, C. A., & Kilicoglu, H. (2019). Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research. PloS one, 14(12), [e0226176]. https://doi.org/10.1371/journal.pone.0226176

Menagerie : A text-mining tool to support animal-human translation in neurodegeneration research. / Zeiss, Caroline J.; Shin, Dongwook; Wyk, Brent Vander; Beck, Amanda P.; Zatz, Natalie; Sneiderman, Charles A.; Kilicoglu, Halil.

In: PloS one, Vol. 14, No. 12, e0226176, 12.2019.

Research output: Contribution to journalArticle

Zeiss, CJ, Shin, D, Wyk, BV, Beck, AP, Zatz, N, Sneiderman, CA & Kilicoglu, H 2019, 'Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research', PloS one, vol. 14, no. 12, e0226176. https://doi.org/10.1371/journal.pone.0226176
Zeiss, Caroline J. ; Shin, Dongwook ; Wyk, Brent Vander ; Beck, Amanda P. ; Zatz, Natalie ; Sneiderman, Charles A. ; Kilicoglu, Halil. / Menagerie : A text-mining tool to support animal-human translation in neurodegeneration research. In: PloS one. 2019 ; Vol. 14, No. 12.
@article{d90355d792e7448f99ac03c5745fde80,
title = "Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research",
abstract = "Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.",
author = "Zeiss, {Caroline J.} and Dongwook Shin and Wyk, {Brent Vander} and Beck, {Amanda P.} and Natalie Zatz and Sneiderman, {Charles A.} and Halil Kilicoglu",
year = "2019",
month = "12",
doi = "10.1371/journal.pone.0226176",
language = "English (US)",
volume = "14",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "12",

}

TY - JOUR

T1 - Menagerie

T2 - A text-mining tool to support animal-human translation in neurodegeneration research

AU - Zeiss, Caroline J.

AU - Shin, Dongwook

AU - Wyk, Brent Vander

AU - Beck, Amanda P.

AU - Zatz, Natalie

AU - Sneiderman, Charles A.

AU - Kilicoglu, Halil

PY - 2019/12

Y1 - 2019/12

N2 - Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

AB - Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

UR - http://www.scopus.com/inward/record.url?scp=85076704913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076704913&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0226176

DO - 10.1371/journal.pone.0226176

M3 - Article

C2 - 31846471

AN - SCOPUS:85076704913

VL - 14

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 12

M1 - e0226176

ER -