A machine learning-based method to improve docking scoring functions and its application to drug repurposing

Sarah L. Kinnings, Nina Liu, Peter J. Tonge, Richard M. Jackson, Lei Xie, Philip E. Bourne

Research output: Contribution to journalArticle

84 Citations (Scopus)

Abstract

Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.

Original languageEnglish (US)
Pages (from-to)408-419
Number of pages12
JournalJournal of Chemical Information and Modeling
Volume51
Issue number2
DOIs
StatePublished - Feb 28 2011
Externally publishedYes

Fingerprint

Learning systems
drug
Support vector machines
energy
Pharmaceutical Preparations
learning
contagious disease
Screening
Genes
Throughput
Phosphodiesterase Inhibitors
methodology
Binding sites
interaction
Binding Sites
Ligands
regression
experiment
performance
Values

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

A machine learning-based method to improve docking scoring functions and its application to drug repurposing. / Kinnings, Sarah L.; Liu, Nina; Tonge, Peter J.; Jackson, Richard M.; Xie, Lei; Bourne, Philip E.

In: Journal of Chemical Information and Modeling, Vol. 51, No. 2, 28.02.2011, p. 408-419.

Research output: Contribution to journalArticle

Kinnings, Sarah L. ; Liu, Nina ; Tonge, Peter J. ; Jackson, Richard M. ; Xie, Lei ; Bourne, Philip E. / A machine learning-based method to improve docking scoring functions and its application to drug repurposing. In: Journal of Chemical Information and Modeling. 2011 ; Vol. 51, No. 2. pp. 408-419.
@article{79dc8f83fb9d47ae84da1fb753b63a49,
title = "A machine learning-based method to improve docking scoring functions and its application to drug repurposing",
abstract = "Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.",
author = "Kinnings, {Sarah L.} and Nina Liu and Tonge, {Peter J.} and Jackson, {Richard M.} and Lei Xie and Bourne, {Philip E.}",
year = "2011",
month = "2",
day = "28",
doi = "10.1021/ci100369f",
language = "English (US)",
volume = "51",
pages = "408--419",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "2",

}

TY - JOUR

T1 - A machine learning-based method to improve docking scoring functions and its application to drug repurposing

AU - Kinnings, Sarah L.

AU - Liu, Nina

AU - Tonge, Peter J.

AU - Jackson, Richard M.

AU - Xie, Lei

AU - Bourne, Philip E.

PY - 2011/2/28

Y1 - 2011/2/28

N2 - Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.

AB - Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC50 values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.

UR - http://www.scopus.com/inward/record.url?scp=79952178127&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952178127&partnerID=8YFLogxK

U2 - 10.1021/ci100369f

DO - 10.1021/ci100369f

M3 - Article

VL - 51

SP - 408

EP - 419

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 2

ER -