PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Joel Rozowsky; Ghia Euskirchen; Raymond K. Auerbach; Zhengdong D. Zhang; Theodore Gibson; Robert Bjornson; Nicholas Carriero; Michael Snyder; Mark B. Gerstein

doi:10.1038/nbt.1518

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Joel Rozowsky, Ghia Euskirchen, Raymond K. Auerbach, Zhengdong D. Zhang, Theodore Gibson, Robert Bjornson, Nicholas Carriero, Michael Snyder, Mark B. Gerstein

Research output: Contribution to journal › Article › peer-review

440 Scopus citations

Abstract

Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

Original language	English (US)
Pages (from-to)	66-75
Number of pages	10
Journal	Nature biotechnology
Volume	27
Issue number	1
DOIs	https://doi.org/10.1038/nbt.1518
State	Published - Jan 25 2009
Externally published	Yes

ASJC Scopus subject areas

Biotechnology
Bioengineering
Applied Microbiology and Biotechnology
Molecular Medicine
Biomedical Engineering

Access to Document

10.1038/nbt.1518

Cite this

@article{4c6bdeb4adbc45dc97594b4732a05132,

title = "PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls",

abstract = "Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.",

author = "Joel Rozowsky and Ghia Euskirchen and Auerbach, {Raymond K.} and Zhang, {Zhengdong D.} and Theodore Gibson and Robert Bjornson and Nicholas Carriero and Michael Snyder and Gerstein, {Mark B.}",

note = "Funding Information: This work was done with support by grants from the National Institutes of Health (NIH) and made use of the Yale University Life Sciences Computing Center (NIH grant RR19895). We acknowledge Mike Wilson{\textquoteright}s assistance with submission of data to GEO.",

year = "2009",

month = jan,

day = "25",

doi = "10.1038/nbt.1518",

language = "English (US)",

volume = "27",

pages = "66--75",

journal = "Nature biotechnology",

issn = "1087-0156",

publisher = "Nature Publishing Group",

number = "1",

}

TY - JOUR

T1 - PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

AU - Rozowsky, Joel

AU - Euskirchen, Ghia

AU - Auerbach, Raymond K.

AU - Zhang, Zhengdong D.

AU - Gibson, Theodore

AU - Bjornson, Robert

AU - Carriero, Nicholas

AU - Snyder, Michael

AU - Gerstein, Mark B.

N1 - Funding Information: This work was done with support by grants from the National Institutes of Health (NIH) and made use of the Yale University Life Sciences Computing Center (NIH grant RR19895). We acknowledge Mike Wilson’s assistance with submission of data to GEO.

PY - 2009/1/25

Y1 - 2009/1/25

N2 - Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

AB - Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

UR - http://www.scopus.com/inward/record.url?scp=60149112271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60149112271&partnerID=8YFLogxK

U2 - 10.1038/nbt.1518

DO - 10.1038/nbt.1518

M3 - Article

C2 - 19122651

AN - SCOPUS:60149112271

SN - 1087-0156

VL - 27

SP - 66

EP - 75

JO - Nature biotechnology

JF - Nature biotechnology

IS - 1

ER -

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this