Human leucocyte antigen class I and II imputation in a multiracial population

M. H. Kuniholm, Xianhong Xie, Kathryn Anastos, Xiaonan (Nan) Xue, L. Reimers, A. L. French, S. J. Gange, S. G. Kassaye, A. Kovacs, Tao Wang, B. E. Aouizerat, Howard Strickler

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms – SNP2HLA and HIBAG – in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy – defined as the percentage of correctly predicted alleles – of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80% training group and a 20% testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89%. Accuracy by HLA gene was 93% for HLA-A, 84% for HLA-B, 94% for HLA-C, 83% for HLA-DQA1, 91% for HLA-DQB1 and 88% for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95%) imputation accuracy.

Original languageEnglish (US)
Pages (from-to)369-375
Number of pages7
JournalInternational Journal of Immunogenetics
Volume43
Issue number6
DOIs
StatePublished - Dec 1 2016

Fingerprint

HLA Antigens
Population
Genes
Genome-Wide Association Study
Autoimmunity
Hispanic Americans

ASJC Scopus subject areas

  • Immunology
  • Molecular Biology
  • Genetics
  • Genetics(clinical)

Cite this

Human leucocyte antigen class I and II imputation in a multiracial population. / Kuniholm, M. H.; Xie, Xianhong; Anastos, Kathryn; Xue, Xiaonan (Nan); Reimers, L.; French, A. L.; Gange, S. J.; Kassaye, S. G.; Kovacs, A.; Wang, Tao; Aouizerat, B. E.; Strickler, Howard.

In: International Journal of Immunogenetics, Vol. 43, No. 6, 01.12.2016, p. 369-375.

Research output: Contribution to journalArticle

Kuniholm, M. H. ; Xie, Xianhong ; Anastos, Kathryn ; Xue, Xiaonan (Nan) ; Reimers, L. ; French, A. L. ; Gange, S. J. ; Kassaye, S. G. ; Kovacs, A. ; Wang, Tao ; Aouizerat, B. E. ; Strickler, Howard. / Human leucocyte antigen class I and II imputation in a multiracial population. In: International Journal of Immunogenetics. 2016 ; Vol. 43, No. 6. pp. 369-375.
@article{a4e679f1a61548bdb403351c564ee6bc,
title = "Human leucocyte antigen class I and II imputation in a multiracial population",
abstract = "Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms – SNP2HLA and HIBAG – in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy – defined as the percentage of correctly predicted alleles – of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80{\%} training group and a 20{\%} testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89{\%}. Accuracy by HLA gene was 93{\%} for HLA-A, 84{\%} for HLA-B, 94{\%} for HLA-C, 83{\%} for HLA-DQA1, 91{\%} for HLA-DQB1 and 88{\%} for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95{\%}) imputation accuracy.",
author = "Kuniholm, {M. H.} and Xianhong Xie and Kathryn Anastos and Xue, {Xiaonan (Nan)} and L. Reimers and French, {A. L.} and Gange, {S. J.} and Kassaye, {S. G.} and A. Kovacs and Tao Wang and Aouizerat, {B. E.} and Howard Strickler",
year = "2016",
month = "12",
day = "1",
doi = "10.1111/iji.12292",
language = "English (US)",
volume = "43",
pages = "369--375",
journal = "International Journal of Immunogenetics",
issn = "1744-3121",
publisher = "Wiley-Blackwell",
number = "6",

}

TY - JOUR

T1 - Human leucocyte antigen class I and II imputation in a multiracial population

AU - Kuniholm, M. H.

AU - Xie, Xianhong

AU - Anastos, Kathryn

AU - Xue, Xiaonan (Nan)

AU - Reimers, L.

AU - French, A. L.

AU - Gange, S. J.

AU - Kassaye, S. G.

AU - Kovacs, A.

AU - Wang, Tao

AU - Aouizerat, B. E.

AU - Strickler, Howard

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms – SNP2HLA and HIBAG – in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy – defined as the percentage of correctly predicted alleles – of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80% training group and a 20% testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89%. Accuracy by HLA gene was 93% for HLA-A, 84% for HLA-B, 94% for HLA-C, 83% for HLA-DQA1, 91% for HLA-DQB1 and 88% for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95%) imputation accuracy.

AB - Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms – SNP2HLA and HIBAG – in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy – defined as the percentage of correctly predicted alleles – of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80% training group and a 20% testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89%. Accuracy by HLA gene was 93% for HLA-A, 84% for HLA-B, 94% for HLA-C, 83% for HLA-DQA1, 91% for HLA-DQB1 and 88% for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95%) imputation accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84995752929&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84995752929&partnerID=8YFLogxK

U2 - 10.1111/iji.12292

DO - 10.1111/iji.12292

M3 - Article

C2 - 27774761

AN - SCOPUS:84995752929

VL - 43

SP - 369

EP - 375

JO - International Journal of Immunogenetics

JF - International Journal of Immunogenetics

SN - 1744-3121

IS - 6

ER -