Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research

Caroline J. Zeiss; Dongwook Shin; Brent Vander Wyk; Amanda P. Beck; Natalie Zatz; Charles A. Sneiderman; Halil Kilicoglu

doi:10.1371/journal.pone.0226176

Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research

Caroline J. Zeiss, Dongwook Shin, Brent Vander Wyk, Amanda P. Beck, Natalie Zatz, Charles A. Sneiderman, Halil Kilicoglu

Pathology

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F₁ scores ranging from 0.75–0.95 across all modules, and exceeded F₁ scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

Original language	English (US)
Article number	e0226176
Journal	PloS one
Volume	14
Issue number	12
DOIs	https://doi.org/10.1371/journal.pone.0226176
State	Published - Dec 2019

ASJC Scopus subject areas

General

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1371/journal.pone.0226176

Cite this

@article{d90355d792e7448f99ac03c5745fde80,

title = "Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research",

abstract = "Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson{\textquoteright}s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson{\textquoteright}s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.",

author = "Zeiss, {Caroline J.} and Dongwook Shin and Wyk, {Brent Vander} and Beck, {Amanda P.} and Natalie Zatz and Sneiderman, {Charles A.} and Halil Kilicoglu",

year = "2019",

month = dec,

doi = "10.1371/journal.pone.0226176",

language = "English (US)",

volume = "14",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "12",

}

TY - JOUR

T1 - Menagerie

T2 - A text-mining tool to support animal-human translation in neurodegeneration research

AU - Zeiss, Caroline J.

AU - Shin, Dongwook

AU - Wyk, Brent Vander

AU - Beck, Amanda P.

AU - Zatz, Natalie

AU - Sneiderman, Charles A.

AU - Kilicoglu, Halil

PY - 2019/12

Y1 - 2019/12

N2 - Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

AB - Discovery studies in animals constitute a cornerstone of biomedical research, but suffer from lack of generalizability to human populations. We propose that large-scale interrogation of these data could reveal patterns of animal use that could narrow the translational divide. We describe a text-mining approach that extracts translationally useful data from PubMed abstracts. These comprise six modules: species, model, genes, interventions/disease modifiers, overall outcome and functional outcome measures. Existing National Library of Medicine natural language processing tools (SemRep, GNormPlus and the Chemical annotator) underpin the program and are further augmented by various rules, term lists, and machine learning models. Evaluation of the program using a 98-abstract test set achieved F1 scores ranging from 0.75–0.95 across all modules, and exceeded F1 scores obtained from comparable baseline programs. Next, the program was applied to a larger 14,481 abstract data set (2008–2017). Expected and previously identified patterns of species and model use for the field were obtained. As previously noted, the majority of studies reported promising outcomes. Longitudinal patterns of intervention type or gene mentions were demonstrated, and patterns of animal model use characteristic of the Parkinson’s disease field were confirmed. The primary function of the program is to overcome low external validity of animal model systems by aggregating evidence across a diversity of models that capture different aspects of a multifaceted cellular process. Some aspects of the tool are generalizable, whereas others are field-specific. In the initial version presented here, we demonstrate proof of concept within a single disease area, Parkinson’s disease. However, the program can be expanded in modular fashion to support a wider range of neurodegenerative diseases.

UR - http://www.scopus.com/inward/record.url?scp=85076704913&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076704913&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0226176

DO - 10.1371/journal.pone.0226176

M3 - Article

C2 - 31846471

AN - SCOPUS:85076704913

SN - 1932-6203

VL - 14

JO - PloS one

JF - PloS one

IS - 12

M1 - e0226176

ER -

Menagerie: A text-mining tool to support animal-human translation in neurodegeneration research

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this