A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

T2D-Genes Consortium and Taesung Park

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF <1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.

Original languageEnglish (US)
Article number1109
JournalBMC Genomics
Volume16
Issue number1
DOIs
StatePublished - Dec 29 2015

Fingerprint

Exome
Single Nucleotide Polymorphism
Genotype
Sample Size
Technology
Costs and Cost Analysis

Keywords

  • Combined approach
  • Exome chip
  • Imputation
  • Rare variant

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data. / T2D-Genes Consortium and Taesung Park.

In: BMC Genomics, Vol. 16, No. 1, 1109, 29.12.2015.

Research output: Contribution to journalArticle

@article{cdcd56dba16245ea94ef8e337ded162b,
title = "A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data",
abstract = "Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 {\%}, and genomic coverage for rare variants up to 117.7 {\%} (MAF <1 {\%}), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.",
keywords = "Combined approach, Exome chip, Imputation, Rare variant",
author = "{T2D-Genes Consortium and Taesung Park} and Kim, {Young Jin} and Juyoung Lee and Kim, {Bong Jo} and Taesung Park and Gon{\cc}alo Abecasis and Marcio Almeida and David Altshuler and Asimit, {Jennifer L.} and Gil Atzmon and Mathew Barber and Nir Barzilai and Beer, {Nicola L.} and Bell, {Graeme I.} and Jennifer Below and Tom Blackwell and John Blangero and Michael Boehnke and Bowden, {Donald W.} and No{\"e}l Burtt and John Chambers and Han Chen and Peng Chen and Chines, {Peter S.} and Sungkyoung Choi and Claire Churchhouse and Pablo Cingolani and Cornes, {Belinda K.} and Nancy Cox and Day-Williams, {Aaron G.} and Ravindranath Duggirala and Jos{\'e}e Dupuis and Thomas Dyer and Shuang Feng and Juan Fernandez-Tajes and Teresa Ferreira and Fingerlin, {Tasha E.} and Jason Flannick and Jose Florez and Pierre Fontanillas and Frayling, {Timothy M.} and Christian Fuchsberger and Gamazon, {Eric R.} and Kyle Gaulton and Saurabh Ghosh and Benjamin Glaser and Anna Gloyn and Grossman, {Robert L.} and Jason Grundstad and Craig Hanis and Allison Heath",
year = "2015",
month = "12",
day = "29",
doi = "10.1186/s12864-015-2192-y",
language = "English (US)",
volume = "16",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

AU - T2D-Genes Consortium and Taesung Park

AU - Kim, Young Jin

AU - Lee, Juyoung

AU - Kim, Bong Jo

AU - Park, Taesung

AU - Abecasis, Gonçalo

AU - Almeida, Marcio

AU - Altshuler, David

AU - Asimit, Jennifer L.

AU - Atzmon, Gil

AU - Barber, Mathew

AU - Barzilai, Nir

AU - Beer, Nicola L.

AU - Bell, Graeme I.

AU - Below, Jennifer

AU - Blackwell, Tom

AU - Blangero, John

AU - Boehnke, Michael

AU - Bowden, Donald W.

AU - Burtt, Noël

AU - Chambers, John

AU - Chen, Han

AU - Chen, Peng

AU - Chines, Peter S.

AU - Choi, Sungkyoung

AU - Churchhouse, Claire

AU - Cingolani, Pablo

AU - Cornes, Belinda K.

AU - Cox, Nancy

AU - Day-Williams, Aaron G.

AU - Duggirala, Ravindranath

AU - Dupuis, Josée

AU - Dyer, Thomas

AU - Feng, Shuang

AU - Fernandez-Tajes, Juan

AU - Ferreira, Teresa

AU - Fingerlin, Tasha E.

AU - Flannick, Jason

AU - Florez, Jose

AU - Fontanillas, Pierre

AU - Frayling, Timothy M.

AU - Fuchsberger, Christian

AU - Gamazon, Eric R.

AU - Gaulton, Kyle

AU - Ghosh, Saurabh

AU - Glaser, Benjamin

AU - Gloyn, Anna

AU - Grossman, Robert L.

AU - Grundstad, Jason

AU - Hanis, Craig

AU - Heath, Allison

PY - 2015/12/29

Y1 - 2015/12/29

N2 - Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF <1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.

AB - Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF <1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.

KW - Combined approach

KW - Exome chip

KW - Imputation

KW - Rare variant

UR - http://www.scopus.com/inward/record.url?scp=84952882965&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952882965&partnerID=8YFLogxK

U2 - 10.1186/s12864-015-2192-y

DO - 10.1186/s12864-015-2192-y

M3 - Article

VL - 16

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 1109

ER -