GAPscreener

An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique

Wei Yu, Melinda Clyne, Siobhan M. Dolan, Ajay Yesupriya, Anja Wulf, Tiebin Liu, Muin J. Khoury, Marta Gwinn

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Background: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results: The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion: GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.

Original languageEnglish (US)
Article number205
JournalBMC Bioinformatics
Volume9
DOIs
StatePublished - Apr 22 2008

Fingerprint

Genetic Association
Medical Genetics
PubMed
Screening
Support vector machines
Support Vector Machine
Association reactions
Genetic Association Studies
Software
Specificity
Databases
Query
Information Storage and Retrieval
Premature Birth
Human Genome
Machine Tool
Graphical User Interface
Graphical user interfaces
Software Tools
Machine tools

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

GAPscreener : An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. / Yu, Wei; Clyne, Melinda; Dolan, Siobhan M.; Yesupriya, Ajay; Wulf, Anja; Liu, Tiebin; Khoury, Muin J.; Gwinn, Marta.

In: BMC Bioinformatics, Vol. 9, 205, 22.04.2008.

Research output: Contribution to journalArticle

Yu, Wei ; Clyne, Melinda ; Dolan, Siobhan M. ; Yesupriya, Ajay ; Wulf, Anja ; Liu, Tiebin ; Khoury, Muin J. ; Gwinn, Marta. / GAPscreener : An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique. In: BMC Bioinformatics. 2008 ; Vol. 9.
@article{5e68ca2eb26741d2b6c48ad51475c853,
title = "GAPscreener: An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique",
abstract = "Background: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results: The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5{\%} recall, 98.3{\%} specificity and 31.9{\%} precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90{\%} the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion: GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.",
author = "Wei Yu and Melinda Clyne and Dolan, {Siobhan M.} and Ajay Yesupriya and Anja Wulf and Tiebin Liu and Khoury, {Muin J.} and Marta Gwinn",
year = "2008",
month = "4",
day = "22",
doi = "10.1186/1471-2105-9-205",
language = "English (US)",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - GAPscreener

T2 - An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique

AU - Yu, Wei

AU - Clyne, Melinda

AU - Dolan, Siobhan M.

AU - Yesupriya, Ajay

AU - Wulf, Anja

AU - Liu, Tiebin

AU - Khoury, Muin J.

AU - Gwinn, Marta

PY - 2008/4/22

Y1 - 2008/4/22

N2 - Background: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results: The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion: GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.

AB - Background: Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies. Results: The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy. Conclusion: GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.

UR - http://www.scopus.com/inward/record.url?scp=44249093513&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44249093513&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-205

DO - 10.1186/1471-2105-9-205

M3 - Article

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 205

ER -