Observer agreement paradoxes in 2x2 tables: Comparison of agreement measures

Shankar Viswanathan, Shrikant I. Bangdiwala

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Background: Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables. Methods: The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions. Results: Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied. Conclusions: The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.

Original languageEnglish (US)
Article number100
JournalBMC Medical Research Methodology
Volume14
Issue number1
DOIs
StatePublished - Aug 28 2014

Fingerprint

Sensitivity and Specificity

Keywords

  • 2x2 table
  • AC1-index
  • Aickin's alpha
  • B-statistic
  • Cohen's kappa
  • Delta
  • Rater agreement

ASJC Scopus subject areas

  • Health Informatics
  • Epidemiology
  • Medicine(all)

Cite this

Observer agreement paradoxes in 2x2 tables : Comparison of agreement measures. / Viswanathan, Shankar; Bangdiwala, Shrikant I.

In: BMC Medical Research Methodology, Vol. 14, No. 1, 100, 28.08.2014.

Research output: Contribution to journalArticle

@article{cb6cabe3e9cb4d4b9aebdb1a81d1b68d,
title = "Observer agreement paradoxes in 2x2 tables: Comparison of agreement measures",
abstract = "Background: Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables. Methods: The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions. Results: Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50{\%}. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied. Conclusions: The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.",
keywords = "2x2 table, AC1-index, Aickin's alpha, B-statistic, Cohen's kappa, Delta, Rater agreement",
author = "Shankar Viswanathan and Bangdiwala, {Shrikant I.}",
year = "2014",
month = "8",
day = "28",
doi = "10.1186/1471-2288-14-100",
language = "English (US)",
volume = "14",
journal = "BMC Medical Research Methodology",
issn = "1471-2288",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Observer agreement paradoxes in 2x2 tables

T2 - Comparison of agreement measures

AU - Viswanathan, Shankar

AU - Bangdiwala, Shrikant I.

PY - 2014/8/28

Y1 - 2014/8/28

N2 - Background: Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables. Methods: The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions. Results: Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied. Conclusions: The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.

AB - Background: Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables. Methods: The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions. Results: Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied. Conclusions: The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.

KW - 2x2 table

KW - AC1-index

KW - Aickin's alpha

KW - B-statistic

KW - Cohen's kappa

KW - Delta

KW - Rater agreement

UR - http://www.scopus.com/inward/record.url?scp=84908395910&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908395910&partnerID=8YFLogxK

U2 - 10.1186/1471-2288-14-100

DO - 10.1186/1471-2288-14-100

M3 - Article

C2 - 25168681

AN - SCOPUS:84908395910

VL - 14

JO - BMC Medical Research Methodology

JF - BMC Medical Research Methodology

SN - 1471-2288

IS - 1

M1 - 100

ER -