Efficiencies of methods dealing with missing covariates in regression analysis

Cuiling Wang, Myunghee Cho Paik

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

Original languageEnglish (US)
Pages (from-to)1169-1192
Number of pages24
JournalStatistica Sinica
Volume16
Issue number4
StatePublished - Oct 2006

Fingerprint

Missing Covariates
Regression Analysis
Inverse Probability Weighting
Imputation
Asymptotic Variance
Conditional Likelihood
Estimating Function
Regression analysis
Covariates
Missing at Random
Likelihood Methods
Conditional Expectation
Logistic Regression
Stroke
Categorical
Weighting
Simulation Study

Keywords

  • Efficiency
  • Estimating equation
  • Imputation
  • Inverse probability weighting
  • Logistic regression
  • Missing at random
  • Missing covariate

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Efficiencies of methods dealing with missing covariates in regression analysis. / Wang, Cuiling; Paik, Myunghee Cho.

In: Statistica Sinica, Vol. 16, No. 4, 10.2006, p. 1169-1192.

Research output: Contribution to journalArticle

@article{934a25b84a6747b3aa27c2b95c1dad44,
title = "Efficiencies of methods dealing with missing covariates in regression analysis",
abstract = "Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.",
keywords = "Efficiency, Estimating equation, Imputation, Inverse probability weighting, Logistic regression, Missing at random, Missing covariate",
author = "Cuiling Wang and Paik, {Myunghee Cho}",
year = "2006",
month = "10",
language = "English (US)",
volume = "16",
pages = "1169--1192",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "4",

}

TY - JOUR

T1 - Efficiencies of methods dealing with missing covariates in regression analysis

AU - Wang, Cuiling

AU - Paik, Myunghee Cho

PY - 2006/10

Y1 - 2006/10

N2 - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

AB - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

KW - Efficiency

KW - Estimating equation

KW - Imputation

KW - Inverse probability weighting

KW - Logistic regression

KW - Missing at random

KW - Missing covariate

UR - http://www.scopus.com/inward/record.url?scp=33846515081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846515081&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33846515081

VL - 16

SP - 1169

EP - 1192

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 4

ER -