Sequence imputation of HPV16 genomes for genetic association studies

Benjamin Smith, Zigui Chen, Laura Reimers, Koenraad van Doorslaer, Mark Schiffman, Rob DeSalle, Rolando Herrero, Kai Yu, Sholom Wacholder, Tao Wang, Robert D. Burk

Research output: Contribution to journalArticle

53 Citations (Scopus)

Abstract

Background: Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods: A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results: HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions: Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.

Original languageEnglish (US)
Article numbere21375
JournalPLoS One
Volume6
Issue number6
DOIs
StatePublished - 2011

Fingerprint

Human papillomavirus 16
Papillomaviridae
Genetic Association Studies
Polymorphism
single nucleotide polymorphism
Single Nucleotide Polymorphism
Nucleotides
Genes
Genome
genome
Viral Genome
Genome-Wide Association Study
odds ratio
Odds Ratio
Costa Rica
carcinogenicity
uterine cervical neoplasms
Papillomavirus Infections
cervix
Biopsy

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Smith, B., Chen, Z., Reimers, L., van Doorslaer, K., Schiffman, M., DeSalle, R., ... Burk, R. D. (2011). Sequence imputation of HPV16 genomes for genetic association studies. PLoS One, 6(6), [e21375]. https://doi.org/10.1371/journal.pone.0021375

Sequence imputation of HPV16 genomes for genetic association studies. / Smith, Benjamin; Chen, Zigui; Reimers, Laura; van Doorslaer, Koenraad; Schiffman, Mark; DeSalle, Rob; Herrero, Rolando; Yu, Kai; Wacholder, Sholom; Wang, Tao; Burk, Robert D.

In: PLoS One, Vol. 6, No. 6, e21375, 2011.

Research output: Contribution to journalArticle

Smith, B, Chen, Z, Reimers, L, van Doorslaer, K, Schiffman, M, DeSalle, R, Herrero, R, Yu, K, Wacholder, S, Wang, T & Burk, RD 2011, 'Sequence imputation of HPV16 genomes for genetic association studies', PLoS One, vol. 6, no. 6, e21375. https://doi.org/10.1371/journal.pone.0021375
Smith B, Chen Z, Reimers L, van Doorslaer K, Schiffman M, DeSalle R et al. Sequence imputation of HPV16 genomes for genetic association studies. PLoS One. 2011;6(6). e21375. https://doi.org/10.1371/journal.pone.0021375
Smith, Benjamin ; Chen, Zigui ; Reimers, Laura ; van Doorslaer, Koenraad ; Schiffman, Mark ; DeSalle, Rob ; Herrero, Rolando ; Yu, Kai ; Wacholder, Sholom ; Wang, Tao ; Burk, Robert D. / Sequence imputation of HPV16 genomes for genetic association studies. In: PLoS One. 2011 ; Vol. 6, No. 6.
@article{b00e774e6fa54f8e85ca189cabc81b21,
title = "Sequence imputation of HPV16 genomes for genetic association studies",
abstract = "Background: Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods: A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results: HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03{\%} of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions: Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.",
author = "Benjamin Smith and Zigui Chen and Laura Reimers and {van Doorslaer}, Koenraad and Mark Schiffman and Rob DeSalle and Rolando Herrero and Kai Yu and Sholom Wacholder and Tao Wang and Burk, {Robert D.}",
year = "2011",
doi = "10.1371/journal.pone.0021375",
language = "English (US)",
volume = "6",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Sequence imputation of HPV16 genomes for genetic association studies

AU - Smith, Benjamin

AU - Chen, Zigui

AU - Reimers, Laura

AU - van Doorslaer, Koenraad

AU - Schiffman, Mark

AU - DeSalle, Rob

AU - Herrero, Rolando

AU - Yu, Kai

AU - Wacholder, Sholom

AU - Wang, Tao

AU - Burk, Robert D.

PY - 2011

Y1 - 2011

N2 - Background: Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods: A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results: HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions: Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.

AB - Background: Human Papillomavirus type 16 (HPV16) causes over half of all cervical cancer and some HPV16 variants are more oncogenic than others. The genetic basis for the extraordinary oncogenic properties of HPV16 compared to other HPVs is unknown. In addition, we neither know which nucleotides vary across and within HPV types and lineages, nor which of the single nucleotide polymorphisms (SNPs) determine oncogenicity. Methods: A reference set of 62 HPV16 complete genome sequences was established and used to examine patterns of evolutionary relatedness amongst variants using a pairwise identity heatmap and HPV16 phylogeny. A BLAST-based algorithm was developed to impute complete genome data from partial sequence information using the reference database. To interrogate the oncogenic risk of determined and imputed HPV16 SNPs, odds-ratios for each SNP were calculated in a case-control viral genome-wide association study (VWAS) using biopsy confirmed high-grade cervix neoplasia and self-limited HPV16 infections from Guanacaste, Costa Rica. Results: HPV16 variants display evolutionarily stable lineages that contain conserved diagnostic SNPs. The imputation algorithm indicated that an average of 97.5±1.03% of SNPs could be accurately imputed. The VWAS revealed specific HPV16 viral SNPs associated with variant lineages and elevated odds ratios; however, individual causal SNPs could not be distinguished with certainty due to the nature of HPV evolution. Conclusions: Conserved and lineage-specific SNPs can be imputed with a high degree of accuracy from limited viral polymorphic data due to the lack of recombination and the stochastic mechanism of variation accumulation in the HPV genome. However, to determine the role of novel variants or non-lineage-specific SNPs by VWAS will require direct sequence analysis. The investigation of patterns of genetic variation and the identification of diagnostic SNPs for lineages of HPV16 variants provides a valuable resource for future studies of HPV16 pathogenicity.

UR - http://www.scopus.com/inward/record.url?scp=79959563617&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959563617&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0021375

DO - 10.1371/journal.pone.0021375

M3 - Article

C2 - 21731721

AN - SCOPUS:79959563617

VL - 6

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 6

M1 - e21375

ER -