A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data

Benjamin J. Ainscough; Erica K. Barnell; Peter Ronning; Katie M. Campbell; Alex H. Wagner; Todd A. Fehniger; Gavin P. Dunn; Ravindra Uppaluri; Ramaswamy Govindan; Thomas E. Rohan; Malachi Griffith; Elaine R. Mardis; S. Joshua Swamidass; Obi L. Griffith

doi:10.1038/s41588-018-0257-y

A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data

Benjamin J. Ainscough, Erica K. Barnell, Peter Ronning, Katie M. Campbell, Alex H. Wagner, Todd A. Fehniger, Gavin P. Dunn, Ravindra Uppaluri, Ramaswamy Govindan, Thomas E. Rohan, Malachi Griffith, Elaine R. Mardis, S. Joshua Swamidass, Obi L. Griffith

Epidemiology & Population Health

Research output: Contribution to journal › Article › peer-review

47 Scopus citations

Abstract

Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.

Original language	English (US)
Pages (from-to)	1735-1743
Number of pages	9
Journal	Nature Genetics
Volume	50
Issue number	12
DOIs	https://doi.org/10.1038/s41588-018-0257-y
State	Published - Dec 1 2018

ASJC Scopus subject areas

Genetics

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1038/s41588-018-0257-y

Cite this

Ainscough, B. J., Barnell, E. K., Ronning, P., Campbell, K. M., Wagner, A. H., Fehniger, T. A., Dunn, G. P., Uppaluri, R., Govindan, R., Rohan, T. E., Griffith, M., Mardis, E. R., Swamidass, S. J., & Griffith, O. L. (2018). A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data. Nature Genetics, 50(12), 1735-1743. https://doi.org/10.1038/s41588-018-0257-y

Ainscough, BJ, Barnell, EK, Ronning, P, Campbell, KM, Wagner, AH, Fehniger, TA, Dunn, GP, Uppaluri, R, Govindan, R, Rohan, TE, Griffith, M, Mardis, ER, Swamidass, SJ & Griffith, OL 2018, 'A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data', Nature Genetics, vol. 50, no. 12, pp. 1735-1743. https://doi.org/10.1038/s41588-018-0257-y

@article{2d39f37ffac24fd9be455cf2552042d1,

title = "A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data",

abstract = "Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.",

author = "Ainscough, {Benjamin J.} and Barnell, {Erica K.} and Peter Ronning and Campbell, {Katie M.} and Wagner, {Alex H.} and Fehniger, {Todd A.} and Dunn, {Gavin P.} and Ravindra Uppaluri and Ramaswamy Govindan and Rohan, {Thomas E.} and Malachi Griffith and Mardis, {Elaine R.} and Swamidass, {S. Joshua} and Griffith, {Obi L.}",

note = "Publisher Copyright: {\textcopyright} 2018, The Author(s), under exclusive licence to Springer Nature America, Inc.",

year = "2018",

month = dec,

day = "1",

doi = "10.1038/s41588-018-0257-y",

language = "English (US)",

volume = "50",

pages = "1735--1743",

journal = "Nature Genetics",

issn = "1061-4036",

publisher = "Nature Publishing Group",

number = "12",

}

TY - JOUR

T1 - A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data

AU - Ainscough, Benjamin J.

AU - Barnell, Erica K.

AU - Ronning, Peter

AU - Campbell, Katie M.

AU - Wagner, Alex H.

AU - Fehniger, Todd A.

AU - Dunn, Gavin P.

AU - Uppaluri, Ravindra

AU - Govindan, Ramaswamy

AU - Rohan, Thomas E.

AU - Griffith, Malachi

AU - Mardis, Elaine R.

AU - Swamidass, S. Joshua

AU - Griffith, Obi L.

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.

AB - Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.

UR - http://www.scopus.com/inward/record.url?scp=85056204832&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056204832&partnerID=8YFLogxK

U2 - 10.1038/s41588-018-0257-y

DO - 10.1038/s41588-018-0257-y

M3 - Article

C2 - 30397337

AN - SCOPUS:85056204832

SN - 1061-4036

VL - 50

SP - 1735

EP - 1743

JO - Nature Genetics

JF - Nature Genetics

IS - 12

ER -

A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this