Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

Zhong Ru Xie, Jiawen Chen, Yinghao Wu

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

Original languageEnglish (US)
Article number46622
JournalScientific Reports
Volume7
DOIs
StatePublished - Apr 18 2017

Fingerprint

Proteins
Benchmarking
Machine Learning
Biological Phenomena

ASJC Scopus subject areas

  • General

Cite this

Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning. / Xie, Zhong Ru; Chen, Jiawen; Wu, Yinghao.

In: Scientific Reports, Vol. 7, 46622, 18.04.2017.

Research output: Contribution to journalArticle

@article{c6374c3feda943f691d946fec72a2b6a,
title = "Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning",
abstract = "Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.",
author = "Xie, {Zhong Ru} and Jiawen Chen and Yinghao Wu",
year = "2017",
month = "4",
day = "18",
doi = "10.1038/srep46622",
language = "English (US)",
volume = "7",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

AU - Xie, Zhong Ru

AU - Chen, Jiawen

AU - Wu, Yinghao

PY - 2017/4/18

Y1 - 2017/4/18

N2 - Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

AB - Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.

UR - http://www.scopus.com/inward/record.url?scp=85017653243&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017653243&partnerID=8YFLogxK

U2 - 10.1038/srep46622

DO - 10.1038/srep46622

M3 - Article

C2 - 28418043

AN - SCOPUS:85017653243

VL - 7

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 46622

ER -