Handling missing data by deleting completely observed records

Myunghee Cho Paik, Cuiling Wang

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

Original languageEnglish (US)
Pages (from-to)2341-2350
Number of pages10
JournalJournal of Statistical Planning and Inference
Volume139
Issue number7
DOIs
StatePublished - Jul 1 2009

Fingerprint

Data handling
Missing Data
Inverse Probability Weighting
Estimator
Weighting
Medicine
Statistics
Missing Completely at Random
Asymptotic Relative Efficiency
Estimating Function
Tangent Space
Imputation
Asymptotic Variance
Conditioning
Missing data
Likelihood
Proportion
Regression
Simulation Study
Projection

Keywords

  • Deletion method
  • Inverse probability weighting
  • Missing data

ASJC Scopus subject areas

  • Statistics, Probability and Uncertainty
  • Applied Mathematics
  • Statistics and Probability

Cite this

Handling missing data by deleting completely observed records. / Paik, Myunghee Cho; Wang, Cuiling.

In: Journal of Statistical Planning and Inference, Vol. 139, No. 7, 01.07.2009, p. 2341-2350.

Research output: Contribution to journalArticle

@article{69ede5dab8254e50a9c72fb9ab22da5d,
title = "Handling missing data by deleting completely observed records",
abstract = "When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.",
keywords = "Deletion method, Inverse probability weighting, Missing data",
author = "Paik, {Myunghee Cho} and Cuiling Wang",
year = "2009",
month = "7",
day = "1",
doi = "10.1016/j.jspi.2008.10.024",
language = "English (US)",
volume = "139",
pages = "2341--2350",
journal = "Journal of Statistical Planning and Inference",
issn = "0378-3758",
publisher = "Elsevier",
number = "7",

}

TY - JOUR

T1 - Handling missing data by deleting completely observed records

AU - Paik, Myunghee Cho

AU - Wang, Cuiling

PY - 2009/7/1

Y1 - 2009/7/1

N2 - When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

AB - When data are missing, analyzing records that are completely observed may cause bias or inefficiency. Existing approaches in handling missing data include likelihood, imputation and inverse probability weighting. In this paper, we propose three estimators inspired by deleting some completely observed data in the regression setting. First, we generate artificial observation indicators that are independent of outcome given the observed data and draw inferences conditioning on the artificial observation indicators. Second, we propose a closely related weighting method. The proposed weighting method has more stable weights than those of the inverse probability weighting method (Zhao, L., Lipsitz, S., 1992. Designs and analysis of two-stage studies. Statistics in Medicine 11, 769-782). Third, we improve the efficiency of the proposed weighting estimator by subtracting the projection of the estimating function onto the nuisance tangent space. When data are missing completely at random, we show that the proposed estimators have asymptotic variances smaller than or equal to the variance of the estimator obtained from using completely observed records only. Asymptotic relative efficiency computation and simulation studies indicate that the proposed weighting estimators are more efficient than the inverse probability weighting estimators under wide range of practical situations especially when the missingness proportion is large.

KW - Deletion method

KW - Inverse probability weighting

KW - Missing data

UR - http://www.scopus.com/inward/record.url?scp=62049084143&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62049084143&partnerID=8YFLogxK

U2 - 10.1016/j.jspi.2008.10.024

DO - 10.1016/j.jspi.2008.10.024

M3 - Article

VL - 139

SP - 2341

EP - 2350

JO - Journal of Statistical Planning and Inference

JF - Journal of Statistical Planning and Inference

SN - 0378-3758

IS - 7

ER -