Viral coinfection analysis using a MinHash toolkit

Eric T. Dawson, Sarah Wagner, David Roberson, Meredith Yeager, Joseph Boland, Erik Garrison, Stephen Chanock, Mark Schiffman, Tina Raine-Bennett, Thomas Lorey, Philip E. Castle, Lisa Mirabello, Richard Durbin

Research output: Contribution to journalArticle

Abstract

Background: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods. Results: We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages. Conclusions: Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.

Original languageEnglish (US)
Article number389
JournalBMC Bioinformatics
Volume20
Issue number1
DOIs
StatePublished - Jul 12 2019

Fingerprint

Coinfection
Viruses
DNA
Infection
Cancer
Chemical analysis
Efficient Set
Sexually Transmitted Diseases
Similarity Measure
Uterine Cervical Neoplasms
Percent
Sequencing
Virus
Fold
Technology
Estimate
Demonstrate
Human
Neoplasms

Keywords

  • Bioinformatics
  • Coinfection
  • HPV
  • Human papillomavirus
  • Kmers
  • MinHash

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Dawson, E. T., Wagner, S., Roberson, D., Yeager, M., Boland, J., Garrison, E., ... Durbin, R. (2019). Viral coinfection analysis using a MinHash toolkit. BMC Bioinformatics, 20(1), [389]. https://doi.org/10.1186/s12859-019-2918-y

Viral coinfection analysis using a MinHash toolkit. / Dawson, Eric T.; Wagner, Sarah; Roberson, David; Yeager, Meredith; Boland, Joseph; Garrison, Erik; Chanock, Stephen; Schiffman, Mark; Raine-Bennett, Tina; Lorey, Thomas; Castle, Philip E.; Mirabello, Lisa; Durbin, Richard.

In: BMC Bioinformatics, Vol. 20, No. 1, 389, 12.07.2019.

Research output: Contribution to journalArticle

Dawson, ET, Wagner, S, Roberson, D, Yeager, M, Boland, J, Garrison, E, Chanock, S, Schiffman, M, Raine-Bennett, T, Lorey, T, Castle, PE, Mirabello, L & Durbin, R 2019, 'Viral coinfection analysis using a MinHash toolkit', BMC Bioinformatics, vol. 20, no. 1, 389. https://doi.org/10.1186/s12859-019-2918-y
Dawson ET, Wagner S, Roberson D, Yeager M, Boland J, Garrison E et al. Viral coinfection analysis using a MinHash toolkit. BMC Bioinformatics. 2019 Jul 12;20(1). 389. https://doi.org/10.1186/s12859-019-2918-y
Dawson, Eric T. ; Wagner, Sarah ; Roberson, David ; Yeager, Meredith ; Boland, Joseph ; Garrison, Erik ; Chanock, Stephen ; Schiffman, Mark ; Raine-Bennett, Tina ; Lorey, Thomas ; Castle, Philip E. ; Mirabello, Lisa ; Durbin, Richard. / Viral coinfection analysis using a MinHash toolkit. In: BMC Bioinformatics. 2019 ; Vol. 20, No. 1.
@article{a43ef2b665bf4207a905a9d15170d096,
title = "Viral coinfection analysis using a MinHash toolkit",
abstract = "Background: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods. Results: We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages. Conclusions: Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.",
keywords = "Bioinformatics, Coinfection, HPV, Human papillomavirus, Kmers, MinHash",
author = "Dawson, {Eric T.} and Sarah Wagner and David Roberson and Meredith Yeager and Joseph Boland and Erik Garrison and Stephen Chanock and Mark Schiffman and Tina Raine-Bennett and Thomas Lorey and Castle, {Philip E.} and Lisa Mirabello and Richard Durbin",
year = "2019",
month = "7",
day = "12",
doi = "10.1186/s12859-019-2918-y",
language = "English (US)",
volume = "20",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Viral coinfection analysis using a MinHash toolkit

AU - Dawson, Eric T.

AU - Wagner, Sarah

AU - Roberson, David

AU - Yeager, Meredith

AU - Boland, Joseph

AU - Garrison, Erik

AU - Chanock, Stephen

AU - Schiffman, Mark

AU - Raine-Bennett, Tina

AU - Lorey, Thomas

AU - Castle, Philip E.

AU - Mirabello, Lisa

AU - Durbin, Richard

PY - 2019/7/12

Y1 - 2019/7/12

N2 - Background: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods. Results: We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages. Conclusions: Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.

AB - Background: Human papillomavirus (HPV) is a common sexually transmitted infection associated with cervical cancer that frequently occurs as a coinfection of types and subtypes. Highly similar sublineages that show over 100-fold differences in cancer risk are not distinguishable in coinfections with current typing methods. Results: We describe an efficient set of computational tools, rkmh, for analyzing complex mixed infections of related viruses based on sequence data. rkmh makes extensive use of MinHash similarity measures, and includes utilities for removing host DNA and classifying reads by type, lineage, and sublineage. We show that rkmh is capable of assigning reads to their HPV type as well as HPV16 lineage and sublineages. Conclusions: Accurate read classification enables estimates of percent composition when there are multiple infecting lineages or sublineages. While we demonstrate rkmh for HPV with multiple sequencing technologies, it is also applicable to other mixtures of related sequences.

KW - Bioinformatics

KW - Coinfection

KW - HPV

KW - Human papillomavirus

KW - Kmers

KW - MinHash

UR - http://www.scopus.com/inward/record.url?scp=85068822499&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068822499&partnerID=8YFLogxK

U2 - 10.1186/s12859-019-2918-y

DO - 10.1186/s12859-019-2918-y

M3 - Article

VL - 20

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 389

ER -