Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

Kamil Khafizov; Carlos Madrid-Aliste; Steven C. Almo; Andras Fiser

doi:10.1073/pnas.1321614111

Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

Kamil Khafizov, Carlos Madrid-Aliste, Steven C. Almo, Andras Fiser

Research output: Contribution to journal › Article › peer-review

75 Scopus citations

Abstract

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases,which are effectively redefining the scope and contribution of these large-scale structurebased efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins-including proteins for which reliable homology models can be generated-on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

Original language	English (US)
Pages (from-to)	3733-3738
Number of pages	6
Journal	Proceedings of the National Academy of Sciences of the United States of America
Volume	111
Issue number	10
DOIs	https://doi.org/10.1073/pnas.1321614111
State	Published - Mar 11 2014

ASJC Scopus subject areas

General

Access to Document

10.1073/pnas.1321614111

Cite this

@article{078ca79123a1415e9d6844a752e72706,

title = "Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative",

abstract = "The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases,which are effectively redefining the scope and contribution of these large-scale structurebased efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins-including proteins for which reliable homology models can be generated-on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.",

author = "Kamil Khafizov and Carlos Madrid-Aliste and Almo, {Steven C.} and Andras Fiser",

year = "2014",

month = mar,

day = "11",

doi = "10.1073/pnas.1321614111",

language = "English (US)",

volume = "111",

pages = "3733--3738",

journal = "Proceedings of the National Academy of Sciences of the United States of America",

issn = "0027-8424",

publisher = "National Academy of Sciences",

number = "10",

}

TY - JOUR

T1 - Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

AU - Khafizov, Kamil

AU - Madrid-Aliste, Carlos

AU - Almo, Steven C.

AU - Fiser, Andras

PY - 2014/3/11

Y1 - 2014/3/11

N2 - The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases,which are effectively redefining the scope and contribution of these large-scale structurebased efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins-including proteins for which reliable homology models can be generated-on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

AB - The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases,which are effectively redefining the scope and contribution of these large-scale structurebased efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins-including proteins for which reliable homology models can be generated-on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.

UR - http://www.scopus.com/inward/record.url?scp=84896271802&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896271802&partnerID=8YFLogxK

U2 - 10.1073/pnas.1321614111

DO - 10.1073/pnas.1321614111

M3 - Article

C2 - 24567391

AN - SCOPUS:84896271802

SN - 0027-8424

VL - 111

SP - 3733

EP - 3738

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

IS - 10

ER -

Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this