Efficiencies of methods dealing with missing covariates in regression analysis

Cuiling Wang; Myunghee Cho Paik

Efficiencies of methods dealing with missing covariates in regression analysis

Cuiling Wang, Myunghee Cho Paik

Epidemiology & Population Health

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

Original language	English (US)
Pages (from-to)	1169-1192
Number of pages	24
Journal	Statistica Sinica
Volume	16
Issue number	4
State	Published - Oct 2006

Keywords

Efficiency
Estimating equation
Imputation
Inverse probability weighting
Logistic regression
Missing at random
Missing covariate

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty

Cite this

@article{934a25b84a6747b3aa27c2b95c1dad44,

title = "Efficiencies of methods dealing with missing covariates in regression analysis",

abstract = "Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.",

keywords = "Efficiency, Estimating equation, Imputation, Inverse probability weighting, Logistic regression, Missing at random, Missing covariate",

author = "Cuiling Wang and Paik, {Myunghee Cho}",

year = "2006",

month = oct,

language = "English (US)",

volume = "16",

pages = "1169--1192",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "4",

}

TY - JOUR

T1 - Efficiencies of methods dealing with missing covariates in regression analysis

AU - Wang, Cuiling

AU - Paik, Myunghee Cho

PY - 2006/10

Y1 - 2006/10

N2 - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

AB - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

KW - Efficiency

KW - Estimating equation

KW - Imputation

KW - Inverse probability weighting

KW - Logistic regression

KW - Missing at random

KW - Missing covariate

UR - http://www.scopus.com/inward/record.url?scp=33846515081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846515081&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33846515081

SN - 1017-0405

VL - 16

SP - 1169

EP - 1192

JO - Statistica Sinica

JF - Statistica Sinica

IS - 4

ER -

Efficiencies of methods dealing with missing covariates in regression analysis

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this