Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome

Nathan D. Trinklein, Ulaş Karaöz, Jiaqian Wu, Anason Halees, Shelley Force Aldred, Patrick J. Collins, Deyou Zheng, Zhengdong Zhang, Mark B. Gerstein, Michael Snyder, Richard M. Myers, Zhiping Weng

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3′-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5′-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5′-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5′-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

Original languageEnglish (US)
Pages (from-to)720-731
Number of pages12
JournalGenome Research
Volume17
Issue number6
DOIs
StatePublished - Jun 2007
Externally publishedYes

Fingerprint

Human Genome
Gene Components
Cell Line
Chromatin Immunoprecipitation
Nucleic Acid Regulatory Sequences
Transcription Initiation Site
Deoxyribonuclease I
3' Untranslated Regions
Genetic Promoter Regions
Transfection
Complementary DNA
Genome
Genes
Datasets

ASJC Scopus subject areas

  • Genetics

Cite this

Trinklein, N. D., Karaöz, U., Wu, J., Halees, A., Aldred, S. F., Collins, P. J., ... Weng, Z. (2007). Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. Genome Research, 17(6), 720-731. https://doi.org/10.1101/gr.5716607

Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. / Trinklein, Nathan D.; Karaöz, Ulaş; Wu, Jiaqian; Halees, Anason; Aldred, Shelley Force; Collins, Patrick J.; Zheng, Deyou; Zhang, Zhengdong; Gerstein, Mark B.; Snyder, Michael; Myers, Richard M.; Weng, Zhiping.

In: Genome Research, Vol. 17, No. 6, 06.2007, p. 720-731.

Research output: Contribution to journalArticle

Trinklein, ND, Karaöz, U, Wu, J, Halees, A, Aldred, SF, Collins, PJ, Zheng, D, Zhang, Z, Gerstein, MB, Snyder, M, Myers, RM & Weng, Z 2007, 'Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome', Genome Research, vol. 17, no. 6, pp. 720-731. https://doi.org/10.1101/gr.5716607
Trinklein, Nathan D. ; Karaöz, Ulaş ; Wu, Jiaqian ; Halees, Anason ; Aldred, Shelley Force ; Collins, Patrick J. ; Zheng, Deyou ; Zhang, Zhengdong ; Gerstein, Mark B. ; Snyder, Michael ; Myers, Richard M. ; Weng, Zhiping. / Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome. In: Genome Research. 2007 ; Vol. 17, No. 6. pp. 720-731.
@article{ffeaebe2a821448093d7a442d745191b,
title = "Integrated analysis of experimental data sets reveals many novel promoters in 1{\%} of the human genome",
abstract = "The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47{\%} of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3′-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5′-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25{\%} of them showed transcriptional activity above background in at least one cell line. We also performed 5′-RACE experiments on 62 novel regions, and 76{\%} of the regions were associated with the 5′-ends of at least two RACE products. Our results suggest that there are at least 35{\%} more functional promoters in the human genome than currently annotated.",
author = "Trinklein, {Nathan D.} and Ulaş Kara{\"o}z and Jiaqian Wu and Anason Halees and Aldred, {Shelley Force} and Collins, {Patrick J.} and Deyou Zheng and Zhengdong Zhang and Gerstein, {Mark B.} and Michael Snyder and Myers, {Richard M.} and Zhiping Weng",
year = "2007",
month = "6",
doi = "10.1101/gr.5716607",
language = "English (US)",
volume = "17",
pages = "720--731",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "6",

}

TY - JOUR

T1 - Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome

AU - Trinklein, Nathan D.

AU - Karaöz, Ulaş

AU - Wu, Jiaqian

AU - Halees, Anason

AU - Aldred, Shelley Force

AU - Collins, Patrick J.

AU - Zheng, Deyou

AU - Zhang, Zhengdong

AU - Gerstein, Mark B.

AU - Snyder, Michael

AU - Myers, Richard M.

AU - Weng, Zhiping

PY - 2007/6

Y1 - 2007/6

N2 - The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3′-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5′-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5′-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5′-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

AB - The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3′-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5′-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5′-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5′-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

UR - http://www.scopus.com/inward/record.url?scp=34250322116&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250322116&partnerID=8YFLogxK

U2 - 10.1101/gr.5716607

DO - 10.1101/gr.5716607

M3 - Article

VL - 17

SP - 720

EP - 731

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 6

ER -