### Abstract

Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

Original language | English (US) |
---|---|

Pages (from-to) | 1169-1192 |

Number of pages | 24 |

Journal | Statistica Sinica |

Volume | 16 |

Issue number | 4 |

State | Published - Oct 2006 |

### Fingerprint

### Keywords

- Efficiency
- Estimating equation
- Imputation
- Inverse probability weighting
- Logistic regression
- Missing at random
- Missing covariate

### ASJC Scopus subject areas

- Mathematics(all)
- Statistics and Probability

### Cite this

*Statistica Sinica*,

*16*(4), 1169-1192.

**Efficiencies of methods dealing with missing covariates in regression analysis.** / Wang, Cuiling; Paik, Myunghee Cho.

Research output: Contribution to journal › Article

*Statistica Sinica*, vol. 16, no. 4, pp. 1169-1192.

}

TY - JOUR

T1 - Efficiencies of methods dealing with missing covariates in regression analysis

AU - Wang, Cuiling

AU - Paik, Myunghee Cho

PY - 2006/10

Y1 - 2006/10

N2 - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

AB - Various approaches have been developed to deal with missing covariate problems in regression analysis when the data are missing at random. Among them, three main non-likelihood approaches are through weighting, imputation and conditional likelihood. The imputation method replaces the missing contribution to the estimating function with its conditional expectation. The inverse probability weighting method weights each observed record by the inverse of the observation probability. The conditional method constructs an unbiased estimating function using only complete records by modelling the conditional mean, given that record is observed. In the literature, the efficiencies of these methods have been compared via simulation. In this paper we compare the asymptotic variances and prove some inequalities. We show that in logistic regression the asymptotic variance of the conditional likelihood method is smaller than or equal to that of the inverse probability weighting method. When the fully observed variables are categorical, the imputation method is more efficient than the inverse probability weighting method given that the observation model is correctly specified. We also show that if the missing mechanism is MCAR and the true known probability of observation is used, the asymptotic variance of the inverse probability weighting method is greater than or equal to that of the complete case analysis. We also conduct simulation studies to compare performances in finite samples and later illustrate the methods using data from a stroke study.

KW - Efficiency

KW - Estimating equation

KW - Imputation

KW - Inverse probability weighting

KW - Logistic regression

KW - Missing at random

KW - Missing covariate

UR - http://www.scopus.com/inward/record.url?scp=33846515081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846515081&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33846515081

VL - 16

SP - 1169

EP - 1192

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 4

ER -