Correction of bias from non-random missing longitudinal data using auxiliary information

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood-based joint modeling approach is consistent and most efficient when correctly specified. However, mis-specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis-specified, though it is more robust to mis-specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study.

Original languageEnglish (US)
Pages (from-to)671-679
Number of pages9
JournalStatistics in Medicine
Volume29
Issue number6
DOIs
StatePublished - Mar 15 2010

Fingerprint

Auxiliary Information
Longitudinal Data
Missing Data
Joint Modeling
Missing at Random
Imputation
Joints
Multiple Imputation
Auxiliary Variables
Misspecification
Biased
Likelihood
Missing Data Mechanism
Mixed Effects Model
Dementia
Longitudinal Studies
Drop out
Longitudinal Study
Joint Distribution
Screening

Keywords

  • Auxiliary variable MAR (A-MAR)
  • Joint modeling
  • Linear mixed effects model
  • Missing data
  • MNAR
  • Multiple imputation (MI)

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Correction of bias from non-random missing longitudinal data using auxiliary information. / Wang, Cuiling; Hall, Charles B.

In: Statistics in Medicine, Vol. 29, No. 6, 15.03.2010, p. 671-679.

Research output: Contribution to journalArticle

@article{37addd67c0074adc8909e8507e41822d,
title = "Correction of bias from non-random missing longitudinal data using auxiliary information",
abstract = "Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood-based joint modeling approach is consistent and most efficient when correctly specified. However, mis-specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis-specified, though it is more robust to mis-specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study.",
keywords = "Auxiliary variable MAR (A-MAR), Joint modeling, Linear mixed effects model, Missing data, MNAR, Multiple imputation (MI)",
author = "Cuiling Wang and Hall, {Charles B.}",
year = "2010",
month = "3",
day = "15",
doi = "10.1002/sim.3821",
language = "English (US)",
volume = "29",
pages = "671--679",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",
number = "6",

}

TY - JOUR

T1 - Correction of bias from non-random missing longitudinal data using auxiliary information

AU - Wang, Cuiling

AU - Hall, Charles B.

PY - 2010/3/15

Y1 - 2010/3/15

N2 - Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood-based joint modeling approach is consistent and most efficient when correctly specified. However, mis-specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis-specified, though it is more robust to mis-specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study.

AB - Missing data are common in longitudinal studies due to drop-out, loss to follow-up, and death. Likelihood-based mixed effects models for longitudinal data give valid estimates when the data are missing at random (MAR). These assumptions, however, are not testable without further information. In some studies, there is additional information available in the form of an auxiliary variable known to be correlated with the missing outcome of interest. Availability of such auxiliary information provides us with an opportunity to test the MAR assumption. If the MAR assumption is violated, such information can be utilized to reduce or eliminate bias when the missing data process depends on the unobserved outcome through the auxiliary information. We compare two methods of utilizing the auxiliary information: joint modeling of the outcome of interest and the auxiliary variable, and multiple imputation (MI). Simulation studies are performed to examine the two methods. The likelihood-based joint modeling approach is consistent and most efficient when correctly specified. However, mis-specification of the joint distribution can lead to biased results. MI is slightly less efficient than a correct joint modeling approach and can also be biased when the imputation model is mis-specified, though it is more robust to mis-specification of the imputation distribution when all the variables affecting the missing data mechanism and the missing outcome are included in the imputation model. An example is presented from a dementia screening study.

KW - Auxiliary variable MAR (A-MAR)

KW - Joint modeling

KW - Linear mixed effects model

KW - Missing data

KW - MNAR

KW - Multiple imputation (MI)

UR - http://www.scopus.com/inward/record.url?scp=77649220400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77649220400&partnerID=8YFLogxK

U2 - 10.1002/sim.3821

DO - 10.1002/sim.3821

M3 - Article

VL - 29

SP - 671

EP - 679

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

IS - 6

ER -