Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients

P. Fu, A. Panneerselvam, B. Clifford, A. Dowlati, P. C. Ma, G. Zeng, Balazs Halmos, R. S. Leidner

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.

Original languageEnglish (US)
Pages (from-to)937-948
Number of pages12
JournalStatistical Methods in Medical Research
Volume24
Issue number6
DOIs
StatePublished - Dec 1 2015
Externally publishedYes

Fingerprint

Simpson's Paradox
Lung Cancer
Growth Factors
Epidermal Growth Factor Receptor
Receptor
Partitioning
Lung Neoplasms
Health
Non-Small Cell Lung Carcinoma
Population
Penalized Maximum Likelihood
Mutation
Cell
African Americans
Predictors
Likelihood Functions
Conditional Likelihood
Genetic Variation
Statistical Power
Confounding

Keywords

  • data with separation
  • exact logistic regression
  • penalized likelihood
  • Simpson's paradox
  • targeted therapy

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management

Cite this

Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients. / Fu, P.; Panneerselvam, A.; Clifford, B.; Dowlati, A.; Ma, P. C.; Zeng, G.; Halmos, Balazs; Leidner, R. S.

In: Statistical Methods in Medical Research, Vol. 24, No. 6, 01.12.2015, p. 937-948.

Research output: Contribution to journalArticle

Fu, P. ; Panneerselvam, A. ; Clifford, B. ; Dowlati, A. ; Ma, P. C. ; Zeng, G. ; Halmos, Balazs ; Leidner, R. S. / Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients. In: Statistical Methods in Medical Research. 2015 ; Vol. 24, No. 6. pp. 937-948.
@article{50038604784a4dae85f749af11f58814,
title = "Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients",
abstract = "It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.",
keywords = "data with separation, exact logistic regression, penalized likelihood, Simpson's paradox, targeted therapy",
author = "P. Fu and A. Panneerselvam and B. Clifford and A. Dowlati and Ma, {P. C.} and G. Zeng and Balazs Halmos and Leidner, {R. S.}",
year = "2015",
month = "12",
day = "1",
doi = "10.1177/0962280211434179",
language = "English (US)",
volume = "24",
pages = "937--948",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "6",

}

TY - JOUR

T1 - Simpson's paradox - Aggregating and partitioning populations in health disparities of lung cancer patients

AU - Fu, P.

AU - Panneerselvam, A.

AU - Clifford, B.

AU - Dowlati, A.

AU - Ma, P. C.

AU - Zeng, G.

AU - Halmos, Balazs

AU - Leidner, R. S.

PY - 2015/12/1

Y1 - 2015/12/1

N2 - It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.

AB - It is well known that non-small cell lung cancer (NSCLC) is a heterogeneous group of diseases. Previous studies have demonstrated genetic variation among different ethnic groups in the epidermal growth factor receptor (EGFR) in NSCLC. Research by our group and others has recently shown a lower frequency of EGFR mutations in African Americans with NSCLC, as compared to their White counterparts. In this study, we use our original study data of EGFR pathway genetics in African American NSCLC as an example to illustrate that univariate analyses based on aggregation versus partition of data leads to contradictory results, in order to emphasize the importance of controlling statistical confounding. We further investigate analytic approaches in logistic regression for data with separation, as is the case in our example data set, and apply appropriate methods to identify predictors of EGFR mutation. Our simulation shows that with separated or nearly separated data, penalized maximum likelihood (PML) produces estimates with smallest bias and approximately maintains the nominal value with statistical power equal to or better than that from maximum likelihood and exact conditional likelihood methods. Application of the PML method in our example data set shows that race and EGFR-FISH are independently significant predictors of EGFR mutation.

KW - data with separation

KW - exact logistic regression

KW - penalized likelihood

KW - Simpson's paradox

KW - targeted therapy

UR - http://www.scopus.com/inward/record.url?scp=84948402999&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948402999&partnerID=8YFLogxK

U2 - 10.1177/0962280211434179

DO - 10.1177/0962280211434179

M3 - Article

C2 - 22246415

AN - SCOPUS:84948402999

VL - 24

SP - 937

EP - 948

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 6

ER -