Efficient statistical significance approximation for local similarity analysis of high-throughput time series data

Li C. Xia, Dongmei Ai, Jacob Cram, Jed A. Fuhrman, Fengzhu Sun

Research output: Contribution to journalArticlepeer-review

74 Scopus citations

Abstract

Motivation: Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.Results: We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.

Original languageEnglish (US)
Pages (from-to)230-237
Number of pages8
JournalBioinformatics
Volume29
Issue number2
DOIs
StatePublished - Jan 15 2013
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Efficient statistical significance approximation for local similarity analysis of high-throughput time series data'. Together they form a unique fingerprint.

Cite this