A supersecondary structure library and search algorithm for modeling loops in protein structures

Narcis Fernandez-Fuentes; Baldomero Oliva; András Fiser

doi:10.1093/nar/gkl156

A supersecondary structure library and search algorithm for modeling loops in protein structures

Narcis Fernandez-Fuentes, Baldomero Oliva, András Fiser

Systems & Computational Biology

Research output: Contribution to journal › Article › peer-review

63 Scopus citations

Abstract

We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105-950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z -score that combines information on sequence similarity and fit of predicted and observed φ/ψ main chain dihedral angle propensities. Confidence Z -score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.

Original language	English (US)
Pages (from-to)	2085-2097
Number of pages	13
Journal	Nucleic acids research
Volume	34
Issue number	7
DOIs	https://doi.org/10.1093/nar/gkl156
State	Published - 2006

ASJC Scopus subject areas

Genetics

Access to Document

10.1093/nar/gkl156

Cite this

@article{230ddab87e524001887c4aee0df83df0,

title = "A supersecondary structure library and search algorithm for modeling loops in protein structures",

abstract = "We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105-950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z -score that combines information on sequence similarity and fit of predicted and observed φ/ψ main chain dihedral angle propensities. Confidence Z -score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 {\AA} of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.",

author = "Narcis Fernandez-Fuentes and Baldomero Oliva and Andr{\'a}s Fiser",

note = "Funding Information: The authors acknowledge all Fiser lab members for their insightful comments on the work, especially Dr D. Rykunov. N.F.F. was partially supported by a Boehringer fellowship. Financial support provided by NIH GM62519-04 and the Seaver Foundation. Funding to pay the Open Access publication charges for this article was provided by NIH GM62519-04 and MEC BI02005-00533.",

year = "2006",

doi = "10.1093/nar/gkl156",

language = "English (US)",

volume = "34",

pages = "2085--2097",

journal = "Nucleic acids research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "7",

}

TY - JOUR

T1 - A supersecondary structure library and search algorithm for modeling loops in protein structures

AU - Fernandez-Fuentes, Narcis

AU - Oliva, Baldomero

AU - Fiser, András

N1 - Funding Information: The authors acknowledge all Fiser lab members for their insightful comments on the work, especially Dr D. Rykunov. N.F.F. was partially supported by a Boehringer fellowship. Financial support provided by NIH GM62519-04 and the Seaver Foundation. Funding to pay the Open Access publication charges for this article was provided by NIH GM62519-04 and MEC BI02005-00533.

PY - 2006

Y1 - 2006

N2 - We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105-950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z -score that combines information on sequence similarity and fit of predicted and observed φ/ψ main chain dihedral angle propensities. Confidence Z -score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.

AB - We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105-950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z -score that combines information on sequence similarity and fit of predicted and observed φ/ψ main chain dihedral angle propensities. Confidence Z -score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.

UR - http://www.scopus.com/inward/record.url?scp=33645923385&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645923385&partnerID=8YFLogxK

U2 - 10.1093/nar/gkl156

DO - 10.1093/nar/gkl156

M3 - Article

C2 - 16617149

AN - SCOPUS:33645923385

SN - 0305-1048

VL - 34

SP - 2085

EP - 2097

JO - Nucleic acids research

JF - Nucleic acids research

IS - 7

ER -

A supersecondary structure library and search algorithm for modeling loops in protein structures

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this