Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination

Y. Khoo, A. Singer, David Cowburn

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms—RDC-SOS and RDC–NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér–Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalJournal of Biomolecular NMR
DOIs
StateAccepted/In press - Jun 14 2017

Fingerprint

Proteins
Ubiquitin
Convex optimization
Global optimization
Simulated annealing
Amino Acids
Nuclear magnetic resonance
Polynomials
Atoms
Geometry
Direction compound

Keywords

  • Convex optimization
  • Cramér–Rao lower-bound
  • Nuclear Overhauser effect
  • Protein structure determination
  • Residual dipolar coupling
  • Semidefinite programming
  • Sum-of-squares optimization

ASJC Scopus subject areas

  • Biochemistry
  • Spectroscopy

Cite this

Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination. / Khoo, Y.; Singer, A.; Cowburn, David.

In: Journal of Biomolecular NMR, 14.06.2017, p. 1-23.

Research output: Contribution to journalArticle

@article{302a61a739f74672bea8347756b58af6,
title = "Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination",
abstract = "We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms—RDC-SOS and RDC–NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cram{\'e}r–Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.",
keywords = "Convex optimization, Cram{\'e}r–Rao lower-bound, Nuclear Overhauser effect, Protein structure determination, Residual dipolar coupling, Semidefinite programming, Sum-of-squares optimization",
author = "Y. Khoo and A. Singer and David Cowburn",
year = "2017",
month = "6",
day = "14",
doi = "10.1007/s10858-017-0108-7",
language = "English (US)",
pages = "1--23",
journal = "Journal of Biomolecular NMR",
issn = "0925-2738",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Integrating NOE and RDC using sum-of-squares relaxation for protein structure determination

AU - Khoo, Y.

AU - Singer, A.

AU - Cowburn, David

PY - 2017/6/14

Y1 - 2017/6/14

N2 - We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms—RDC-SOS and RDC–NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér–Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.

AB - We revisit the problem of protein structure determination from geometrical restraints from NMR, using convex optimization. It is well-known that the NP-hard distance geometry problem of determining atomic positions from pairwise distance restraints can be relaxed into a convex semidefinite program (SDP). However, often the NOE distance restraints are too imprecise and sparse for accurate structure determination. Residual dipolar coupling (RDC) measurements provide additional geometric information on the angles between atom-pair directions and axes of the principal-axis-frame. The optimization problem involving RDC is highly non-convex and requires a good initialization even within the simulated annealing framework. In this paper, we model the protein backbone as an articulated structure composed of rigid units. Determining the rotation of each rigid unit gives the full protein structure. We propose solving the non-convex optimization problems using the sum-of-squares (SOS) hierarchy, a hierarchy of convex relaxations with increasing complexity and approximation power. Unlike classical global optimization approaches, SOS optimization returns a certificate of optimality if the global optimum is found. Based on the SOS method, we proposed two algorithms—RDC-SOS and RDC–NOE-SOS, that have polynomial time complexity in the number of amino-acid residues and run efficiently on a standard desktop. In many instances, the proposed methods exactly recover the solution to the original non-convex optimization problem. To the best of our knowledge this is the first time SOS relaxation is introduced to solve non-convex optimization problems in structural biology. We further introduce a statistical tool, the Cramér–Rao bound (CRB), to provide an information theoretic bound on the highest resolution one can hope to achieve when determining protein structure from noisy measurements using any unbiased estimator. Our simulation results show that when the RDC measurements are corrupted by Gaussian noise of realistic variance, both SOS based algorithms attain the CRB. We successfully apply our method in a divide-and-conquer fashion to determine the structure of ubiquitin from experimental NOE and RDC measurements obtained in two alignment media, achieving more accurate and faster reconstructions compared to the current state of the art.

KW - Convex optimization

KW - Cramér–Rao lower-bound

KW - Nuclear Overhauser effect

KW - Protein structure determination

KW - Residual dipolar coupling

KW - Semidefinite programming

KW - Sum-of-squares optimization

UR - http://www.scopus.com/inward/record.url?scp=85020722874&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020722874&partnerID=8YFLogxK

U2 - 10.1007/s10858-017-0108-7

DO - 10.1007/s10858-017-0108-7

M3 - Article

C2 - 28616711

AN - SCOPUS:85020722874

SP - 1

EP - 23

JO - Journal of Biomolecular NMR

JF - Journal of Biomolecular NMR

SN - 0925-2738

ER -