Handling missing data by deleting completely observed records

Myunghee Cho Paik; Cuiling Wang

doi:10.1016/j.jspi.2008.10.024

Handling missing data by deleting completely observed records

Myunghee Cho Paik, Cuiling Wang

Epidemiology & Population Health

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

Original language	English (US)
Pages (from-to)	2341-2350
Number of pages	10
Journal	Journal of Statistical Planning and Inference
Volume	139
Issue number	7
DOIs	https://doi.org/10.1016/j.jspi.2008.10.024
State	Published - Jul 1 2009

Keywords

Deletion method
Inverse probability weighting
Missing data

ASJC Scopus subject areas

Statistics and Probability
Statistics, Probability and Uncertainty
Applied Mathematics

Access to Document

10.1016/j.jspi.2008.10.024

Cite this

@article{69ede5dab8254e50a9c72fb9ab22da5d,

title = "Handling missing data by deleting completely observed records",

abstract = "When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.",

keywords = "Deletion method, Inverse probability weighting, Missing data",

author = "Paik, {Myunghee Cho} and Cuiling Wang",

year = "2009",

month = jul,

day = "1",

doi = "10.1016/j.jspi.2008.10.024",

language = "English (US)",

volume = "139",

pages = "2341--2350",

journal = "Journal of Statistical Planning and Inference",

issn = "0378-3758",

publisher = "Elsevier",

number = "7",

}

TY - JOUR

T1 - Handling missing data by deleting completely observed records

AU - Paik, Myunghee Cho

AU - Wang, Cuiling

PY - 2009/7/1

Y1 - 2009/7/1

N2 - When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

AB - When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

KW - Deletion method

KW - Inverse probability weighting

KW - Missing data

UR - http://www.scopus.com/inward/record.url?scp=62049084143&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62049084143&partnerID=8YFLogxK

U2 - 10.1016/j.jspi.2008.10.024

DO - 10.1016/j.jspi.2008.10.024

M3 - Article

AN - SCOPUS:62049084143

SN - 0378-3758

VL - 139

SP - 2341

EP - 2350

JO - Journal of Statistical Planning and Inference

JF - Journal of Statistical Planning and Inference

IS - 7

ER -

Handling missing data by deleting completely observed records

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this