PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects

Anastasia Gurinovich, Harold Bae, John J. Farrell, Stacy L. Andersen, Stefano Monti, Annibale Puca, Gil Atzmon, Nir Barzilai, Thomas T. Perls, Paola Sebastiani

Research output: Contribution to journalArticle

1 Scopus citations


Motivation: Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results: In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects' ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype.

Original languageEnglish (US)
Pages (from-to)3046-3054
Number of pages9
Issue number17
StatePublished - Sep 1 2019

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects'. Together they form a unique fingerprint.

  • Cite this

    Gurinovich, A., Bae, H., Farrell, J. J., Andersen, S. L., Monti, S., Puca, A., Atzmon, G., Barzilai, N., Perls, T. T., & Sebastiani, P. (2019). PopCluster: An algorithm to identify genetic variants with ethnicity-dependent effects. Bioinformatics, 35(17), 3046-3054. https://doi.org/10.1093/bioinformatics/btz017