Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains

Li C. Xia; Dongmei Ai; Jacob A. Cram; Xiaoyi Liang; Jed A. Fuhrman; Fengzhu Sun

doi:10.1186/s12859-015-0732-8

Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains

Li C. Xia, Dongmei Ai, Jacob A. Cram, Xiaoyi Liang, Jed A. Fuhrman, Fengzhu Sun

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Background: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. Results: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. Availability: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

Original language	English (US)
Article number	301
Journal	BMC bioinformatics
Volume	16
Issue number	1
DOIs	https://doi.org/10.1186/s12859-015-0732-8
State	Published - Sep 21 2015
Externally published	Yes

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1186/s12859-015-0732-8

Cite this

@article{cf681ec831174b00b4bef1223b4c1b68,

title = "Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains",

abstract = "Background: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. Results: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. Availability: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.",

author = "Xia, {Li C.} and Dongmei Ai and Cram, {Jacob A.} and Xiaoyi Liang and Fuhrman, {Jed A.} and Fengzhu Sun",

note = "Publisher Copyright: {\textcopyright} 2015 Xia et al.",

year = "2015",

month = sep,

day = "21",

doi = "10.1186/s12859-015-0732-8",

language = "English (US)",

volume = "16",

journal = "BMC bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains

AU - Xia, Li C.

AU - Ai, Dongmei

AU - Cram, Jacob A.

AU - Liang, Xiaoyi

AU - Fuhrman, Jed A.

AU - Sun, Fengzhu

PY - 2015/9/21

Y1 - 2015/9/21

N2 - Background: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. Results: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. Availability: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

AB - Background: Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. Results: By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. Availability: The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

UR - http://www.scopus.com/inward/record.url?scp=84959131144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959131144&partnerID=8YFLogxK

U2 - 10.1186/s12859-015-0732-8

DO - 10.1186/s12859-015-0732-8

M3 - Article

C2 - 26390921

AN - SCOPUS:84959131144

SN - 1471-2105

VL - 16

JO - BMC bioinformatics

JF - BMC bioinformatics

IS - 1

M1 - 301

ER -

Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this