PopCluster

an algorithm to identify genetic variants with ethnicity-dependent effects

Anastasia Gurinovich, Harold Bae, John J. Farrell, Stacy L. Andersen, Stefano Monti, Annibale Puca, Gil Atzmon, Nir Barzilai, Thomas T. Perls, Paola Sebastiani

Research output: Contribution to journalArticle

Abstract

MOTIVATION: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION: PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)3046-3054
Number of pages9
JournalBioinformatics (Oxford, England)
Volume35
Issue number17
DOIs
StatePublished - Sep 1 2019

Fingerprint

Genes
Genome-Wide Association Study
Dependent
Bioinformatics
Set theory
Phenotype
Computer programming languages
Principal component analysis
Population
Programming Languages
Logistics
Genome
Availability
Drug Discovery
Principal Component Analysis
Computational Biology
Routine Diagnostic Tests
Gene Frequency
Diagnostic Tests
Cluster Analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Gurinovich, A., Bae, H., Farrell, J. J., Andersen, S. L., Monti, S., Puca, A., ... Sebastiani, P. (2019). PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects. Bioinformatics (Oxford, England), 35(17), 3046-3054. https://doi.org/10.1093/bioinformatics/btz017

PopCluster : an algorithm to identify genetic variants with ethnicity-dependent effects. / Gurinovich, Anastasia; Bae, Harold; Farrell, John J.; Andersen, Stacy L.; Monti, Stefano; Puca, Annibale; Atzmon, Gil; Barzilai, Nir; Perls, Thomas T.; Sebastiani, Paola.

In: Bioinformatics (Oxford, England), Vol. 35, No. 17, 01.09.2019, p. 3046-3054.

Research output: Contribution to journalArticle

Gurinovich, A, Bae, H, Farrell, JJ, Andersen, SL, Monti, S, Puca, A, Atzmon, G, Barzilai, N, Perls, TT & Sebastiani, P 2019, 'PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects', Bioinformatics (Oxford, England), vol. 35, no. 17, pp. 3046-3054. https://doi.org/10.1093/bioinformatics/btz017
Gurinovich, Anastasia ; Bae, Harold ; Farrell, John J. ; Andersen, Stacy L. ; Monti, Stefano ; Puca, Annibale ; Atzmon, Gil ; Barzilai, Nir ; Perls, Thomas T. ; Sebastiani, Paola. / PopCluster : an algorithm to identify genetic variants with ethnicity-dependent effects. In: Bioinformatics (Oxford, England). 2019 ; Vol. 35, No. 17. pp. 3046-3054.
@article{53d26b63406140688fea5e4938e1052b,
title = "PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects",
abstract = "MOTIVATION: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4{\%}) and high true positive rate (>80{\%}) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION: PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Anastasia Gurinovich and Harold Bae and Farrell, {John J.} and Andersen, {Stacy L.} and Stefano Monti and Annibale Puca and Gil Atzmon and Nir Barzilai and Perls, {Thomas T.} and Paola Sebastiani",
year = "2019",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/btz017",
language = "English (US)",
volume = "35",
pages = "3046--3054",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - PopCluster

T2 - an algorithm to identify genetic variants with ethnicity-dependent effects

AU - Gurinovich, Anastasia

AU - Bae, Harold

AU - Farrell, John J.

AU - Andersen, Stacy L.

AU - Monti, Stefano

AU - Puca, Annibale

AU - Atzmon, Gil

AU - Barzilai, Nir

AU - Perls, Thomas T.

AU - Sebastiani, Paola

PY - 2019/9/1

Y1 - 2019/9/1

N2 - MOTIVATION: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION: PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. RESULTS: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. AVAILABILITY AND IMPLEMENTATION: PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85072058253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072058253&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz017

DO - 10.1093/bioinformatics/btz017

M3 - Article

VL - 35

SP - 3046

EP - 3054

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -