Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

Li C. Xia; Dongmei Ai; Jacob Cram; Jed A. Fuhrman; Fengzhu Sun

doi:10.1093/bioinformatics/bts668

Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

Li C. Xia, Dongmei Ai, Jacob Cram, Jed A. Fuhrman, Fengzhu Sun

Research output: Contribution to journal › Article › peer-review

101 Scopus citations

Abstract

Motivation: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.Results: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.

Original language	English (US)
Pages (from-to)	230-237
Number of pages	8
Journal	Bioinformatics
Volume	29
Issue number	2
DOIs	https://doi.org/10.1093/bioinformatics/bts668
State	Published - Jan 15 2013
Externally published	Yes

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/bts668

Cite this

@article{9932f38cc0e440ae86f82d84a59e6ff8,

title = "Efficient statistical significance approximation for local similarity analysis of high-throughput time series data",

abstract = "Motivation: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.Results: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.",

author = "Xia, {Li C.} and Dongmei Ai and Jacob Cram and Fuhrman, {Jed A.} and Fengzhu Sun",

note = "Funding Information: Funding: This research is partially supported by US NSF DMS-1043075 and OCE 1136818, and National Natural Science Foundation of China (60928007 and 60805010).",

year = "2013",

month = jan,

day = "15",

doi = "10.1093/bioinformatics/bts668",

language = "English (US)",

volume = "29",

pages = "230--237",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "2",

}

TY - JOUR

T1 - Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

AU - Xia, Li C.

AU - Ai, Dongmei

AU - Cram, Jacob

AU - Fuhrman, Jed A.

AU - Sun, Fengzhu

N1 - Funding Information: Funding: This research is partially supported by US NSF DMS-1043075 and OCE 1136818, and National Natural Science Foundation of China (60928007 and 60805010).

PY - 2013/1/15

Y1 - 2013/1/15

N2 - Motivation: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.Results: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.

AB - Motivation: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.Results: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.

UR - http://www.scopus.com/inward/record.url?scp=84872568885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872568885&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts668

DO - 10.1093/bioinformatics/bts668

M3 - Article

C2 - 23178636

AN - SCOPUS:84872568885

SN - 1367-4803

VL - 29

SP - 230

EP - 237

JO - Bioinformatics

JF - Bioinformatics

IS - 2

ER -

Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this