Randomization tests for small samples: An application for genetic expression data

Gary L. Gadbury, Grier P. Page, Moonseong Heo, John D. Mountz, David B. Allison

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

An advantage of randomization tests for small samples is that an exact P-value can be computed under an additive model. A disadvantage with very small sample sizes is that the resulting discrete distribution for P-values can make it mathematically impossible for a P-value to attain a particular degree of significance. We investigate a distribution of P-values that arises when several thousand randomization tests are conducted simultaneously using small samples, a situation that arises with microarray gene expression data. We show that the distribution yields valuable information regarding groups of genes that are differentially expressed between two groups: a treatment group and a control group. This distribution helps to categorize genes with varying degrees of overlap of genetic expression values between the two groups, and it helps to quantify the degree of overlap by using the P-value from a randomization test. Moreover, a statistical test is available that compares the actual distribution of P-values with an expected distribution if there are no genes that are differentially expressed. We demonstrate the method and illustrate the results by using a microarray data set involving a cell line for rheumatoid arthritis. A small simulation study evaluates the effect that correlated gene expression levels could have on results from the analysis.

Original languageEnglish (US)
Pages (from-to)365-376
Number of pages12
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume52
Issue number3
DOIs
StatePublished - 2003
Externally publishedYes

Fingerprint

Randomization Test
Small Sample
Microarray Data
Gene
Overlap
Additive Models
Rheumatoid Arthritis
Discrete Distributions
Small Sample Size
Statistical test
Gene Expression Data
Gene Expression
Quantify
P value
Randomization
Small sample
Simulation Study
Evaluate
Line
Cell

Keywords

  • Additivity
  • Microarray
  • Nonparametric test
  • Permutation
  • Randomization

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Randomization tests for small samples : An application for genetic expression data. / Gadbury, Gary L.; Page, Grier P.; Heo, Moonseong; Mountz, John D.; Allison, David B.

In: Journal of the Royal Statistical Society. Series C: Applied Statistics, Vol. 52, No. 3, 2003, p. 365-376.

Research output: Contribution to journalArticle

Gadbury, Gary L. ; Page, Grier P. ; Heo, Moonseong ; Mountz, John D. ; Allison, David B. / Randomization tests for small samples : An application for genetic expression data. In: Journal of the Royal Statistical Society. Series C: Applied Statistics. 2003 ; Vol. 52, No. 3. pp. 365-376.
@article{139e0658e6d44101ae9d76ef5f851aaf,
title = "Randomization tests for small samples: An application for genetic expression data",
abstract = "An advantage of randomization tests for small samples is that an exact P-value can be computed under an additive model. A disadvantage with very small sample sizes is that the resulting discrete distribution for P-values can make it mathematically impossible for a P-value to attain a particular degree of significance. We investigate a distribution of P-values that arises when several thousand randomization tests are conducted simultaneously using small samples, a situation that arises with microarray gene expression data. We show that the distribution yields valuable information regarding groups of genes that are differentially expressed between two groups: a treatment group and a control group. This distribution helps to categorize genes with varying degrees of overlap of genetic expression values between the two groups, and it helps to quantify the degree of overlap by using the P-value from a randomization test. Moreover, a statistical test is available that compares the actual distribution of P-values with an expected distribution if there are no genes that are differentially expressed. We demonstrate the method and illustrate the results by using a microarray data set involving a cell line for rheumatoid arthritis. A small simulation study evaluates the effect that correlated gene expression levels could have on results from the analysis.",
keywords = "Additivity, Microarray, Nonparametric test, Permutation, Randomization",
author = "Gadbury, {Gary L.} and Page, {Grier P.} and Moonseong Heo and Mountz, {John D.} and Allison, {David B.}",
year = "2003",
doi = "10.1111/1467-9876.00410",
language = "English (US)",
volume = "52",
pages = "365--376",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - Randomization tests for small samples

T2 - An application for genetic expression data

AU - Gadbury, Gary L.

AU - Page, Grier P.

AU - Heo, Moonseong

AU - Mountz, John D.

AU - Allison, David B.

PY - 2003

Y1 - 2003

N2 - An advantage of randomization tests for small samples is that an exact P-value can be computed under an additive model. A disadvantage with very small sample sizes is that the resulting discrete distribution for P-values can make it mathematically impossible for a P-value to attain a particular degree of significance. We investigate a distribution of P-values that arises when several thousand randomization tests are conducted simultaneously using small samples, a situation that arises with microarray gene expression data. We show that the distribution yields valuable information regarding groups of genes that are differentially expressed between two groups: a treatment group and a control group. This distribution helps to categorize genes with varying degrees of overlap of genetic expression values between the two groups, and it helps to quantify the degree of overlap by using the P-value from a randomization test. Moreover, a statistical test is available that compares the actual distribution of P-values with an expected distribution if there are no genes that are differentially expressed. We demonstrate the method and illustrate the results by using a microarray data set involving a cell line for rheumatoid arthritis. A small simulation study evaluates the effect that correlated gene expression levels could have on results from the analysis.

AB - An advantage of randomization tests for small samples is that an exact P-value can be computed under an additive model. A disadvantage with very small sample sizes is that the resulting discrete distribution for P-values can make it mathematically impossible for a P-value to attain a particular degree of significance. We investigate a distribution of P-values that arises when several thousand randomization tests are conducted simultaneously using small samples, a situation that arises with microarray gene expression data. We show that the distribution yields valuable information regarding groups of genes that are differentially expressed between two groups: a treatment group and a control group. This distribution helps to categorize genes with varying degrees of overlap of genetic expression values between the two groups, and it helps to quantify the degree of overlap by using the P-value from a randomization test. Moreover, a statistical test is available that compares the actual distribution of P-values with an expected distribution if there are no genes that are differentially expressed. We demonstrate the method and illustrate the results by using a microarray data set involving a cell line for rheumatoid arthritis. A small simulation study evaluates the effect that correlated gene expression levels could have on results from the analysis.

KW - Additivity

KW - Microarray

KW - Nonparametric test

KW - Permutation

KW - Randomization

UR - http://www.scopus.com/inward/record.url?scp=0038041495&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0038041495&partnerID=8YFLogxK

U2 - 10.1111/1467-9876.00410

DO - 10.1111/1467-9876.00410

M3 - Article

AN - SCOPUS:0038041495

VL - 52

SP - 365

EP - 376

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 3

ER -