PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Joel Rozowsky, Ghia Euskirchen, Raymond K. Auerbach, Zhengdong Zhang, Theodore Gibson, Robert Bjornson, Nicholas Carriero, Michael Snyder, Mark B. Gerstein

Research output: Contribution to journalArticle

390 Citations (Scopus)

Abstract

Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

Original languageEnglish (US)
Pages (from-to)66-75
Number of pages10
JournalNature Biotechnology
Volume27
Issue number1
DOIs
StatePublished - Jan 25 2009
Externally publishedYes

Fingerprint

Chromatin Immunoprecipitation
Chromatin
Experiments
Transcription factors
Binding Sites
Binding sites
RNA
Design of experiments
RNA Polymerase II
DNA
Genes
Throughput
Research Design
Transcription Factors
Genome

ASJC Scopus subject areas

  • Applied Microbiology and Biotechnology
  • Biotechnology
  • Molecular Medicine
  • Bioengineering
  • Biomedical Engineering

Cite this

Rozowsky, J., Euskirchen, G., Auerbach, R. K., Zhang, Z., Gibson, T., Bjornson, R., ... Gerstein, M. B. (2009). PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnology, 27(1), 66-75. https://doi.org/10.1038/nbt.1518

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. / Rozowsky, Joel; Euskirchen, Ghia; Auerbach, Raymond K.; Zhang, Zhengdong; Gibson, Theodore; Bjornson, Robert; Carriero, Nicholas; Snyder, Michael; Gerstein, Mark B.

In: Nature Biotechnology, Vol. 27, No. 1, 25.01.2009, p. 66-75.

Research output: Contribution to journalArticle

Rozowsky, J, Euskirchen, G, Auerbach, RK, Zhang, Z, Gibson, T, Bjornson, R, Carriero, N, Snyder, M & Gerstein, MB 2009, 'PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls', Nature Biotechnology, vol. 27, no. 1, pp. 66-75. https://doi.org/10.1038/nbt.1518
Rozowsky J, Euskirchen G, Auerbach RK, Zhang Z, Gibson T, Bjornson R et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotechnology. 2009 Jan 25;27(1):66-75. https://doi.org/10.1038/nbt.1518
Rozowsky, Joel ; Euskirchen, Ghia ; Auerbach, Raymond K. ; Zhang, Zhengdong ; Gibson, Theodore ; Bjornson, Robert ; Carriero, Nicholas ; Snyder, Michael ; Gerstein, Mark B. / PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. In: Nature Biotechnology. 2009 ; Vol. 27, No. 1. pp. 66-75.
@article{4c6bdeb4adbc45dc97594b4732a05132,
title = "PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls",
abstract = "Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.",
author = "Joel Rozowsky and Ghia Euskirchen and Auerbach, {Raymond K.} and Zhengdong Zhang and Theodore Gibson and Robert Bjornson and Nicholas Carriero and Michael Snyder and Gerstein, {Mark B.}",
year = "2009",
month = "1",
day = "25",
doi = "10.1038/nbt.1518",
language = "English (US)",
volume = "27",
pages = "66--75",
journal = "Biotechnology",
issn = "1087-0156",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

AU - Rozowsky, Joel

AU - Euskirchen, Ghia

AU - Auerbach, Raymond K.

AU - Zhang, Zhengdong

AU - Gibson, Theodore

AU - Bjornson, Robert

AU - Carriero, Nicholas

AU - Snyder, Michael

AU - Gerstein, Mark B.

PY - 2009/1/25

Y1 - 2009/1/25

N2 - Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

AB - Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

UR - http://www.scopus.com/inward/record.url?scp=60149112271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60149112271&partnerID=8YFLogxK

U2 - 10.1038/nbt.1518

DO - 10.1038/nbt.1518

M3 - Article

VL - 27

SP - 66

EP - 75

JO - Biotechnology

JF - Biotechnology

SN - 1087-0156

IS - 1

ER -