Binary regression with differentially misclassified response and exposure variables

Li Tang, Robert H. Lyles, Caroline C. King, David D. Celentano, Yungtai Lo

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data.

Original languageEnglish (US)
Pages (from-to)1605-1620
Number of pages16
JournalStatistics in Medicine
Volume34
Issue number9
DOIs
StatePublished - Apr 30 2015

Fingerprint

Binary Regression
Misclassification
Epidemiology
HIV
Bacterial Vaginosis
Research
Reproducibility of Results
Maximum Likelihood
Logistic Models
Odds Ratio
Misclassification Rate
Logistic Regression
Data-driven
Covariates
Valid
Binary
Internal
Evaluate
Modeling

Keywords

  • Likelihood
  • Logistic regressions
  • Misclassification
  • Odds ratio

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Binary regression with differentially misclassified response and exposure variables. / Tang, Li; Lyles, Robert H.; King, Caroline C.; Celentano, David D.; Lo, Yungtai.

In: Statistics in Medicine, Vol. 34, No. 9, 30.04.2015, p. 1605-1620.

Research output: Contribution to journalArticle

Tang, Li ; Lyles, Robert H. ; King, Caroline C. ; Celentano, David D. ; Lo, Yungtai. / Binary regression with differentially misclassified response and exposure variables. In: Statistics in Medicine. 2015 ; Vol. 34, No. 9. pp. 1605-1620.
@article{04a4528eac1548fc891912e512b61d36,
title = "Binary regression with differentially misclassified response and exposure variables",
abstract = "Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data.",
keywords = "Likelihood, Logistic regressions, Misclassification, Odds ratio",
author = "Li Tang and Lyles, {Robert H.} and King, {Caroline C.} and Celentano, {David D.} and Yungtai Lo",
year = "2015",
month = "4",
day = "30",
doi = "10.1002/sim.6440",
language = "English (US)",
volume = "34",
pages = "1605--1620",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "9",

}

TY - JOUR

T1 - Binary regression with differentially misclassified response and exposure variables

AU - Tang, Li

AU - Lyles, Robert H.

AU - King, Caroline C.

AU - Celentano, David D.

AU - Lo, Yungtai

PY - 2015/4/30

Y1 - 2015/4/30

N2 - Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data.

AB - Misclassification is a long-standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data-driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data.

KW - Likelihood

KW - Logistic regressions

KW - Misclassification

KW - Odds ratio

UR - http://www.scopus.com/inward/record.url?scp=84926419719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84926419719&partnerID=8YFLogxK

U2 - 10.1002/sim.6440

DO - 10.1002/sim.6440

M3 - Article

VL - 34

SP - 1605

EP - 1620

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 9

ER -