Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials

Dmitry Rykunov, Andras Fiser

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as ∼50% of potential values were statistically significant at distances below 4 Å, and only at most ∼80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.

Original languageEnglish (US)
Pages (from-to)559-568
Number of pages10
JournalProteins: Structure, Function and Genetics
Volume67
Issue number3
DOIs
StatePublished - May 15 2007

Fingerprint

Statistics
Amino Acids
Chemical analysis
Proteins
Systematic errors
Atoms

Keywords

  • Amino acid composition
  • Distance dependent statistical potentials
  • Protein structure prediction
  • Systematic errors

ASJC Scopus subject areas

  • Genetics
  • Structural Biology
  • Biochemistry

Cite this

@article{2cf78ac7769b48c983f7c413a35f1780,
title = "Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials",
abstract = "Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as ∼50{\%} of potential values were statistically significant at distances below 4 {\AA}, and only at most ∼80{\%} of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.",
keywords = "Amino acid composition, Distance dependent statistical potentials, Protein structure prediction, Systematic errors",
author = "Dmitry Rykunov and Andras Fiser",
year = "2007",
month = "5",
day = "15",
doi = "10.1002/prot.21279",
language = "English (US)",
volume = "67",
pages = "559--568",
journal = "Proteins: Structure, Function and Bioinformatics",
issn = "0887-3585",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials

AU - Rykunov, Dmitry

AU - Fiser, Andras

PY - 2007/5/15

Y1 - 2007/5/15

N2 - Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as ∼50% of potential values were statistically significant at distances below 4 Å, and only at most ∼80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.

AB - Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as ∼50% of potential values were statistically significant at distances below 4 Å, and only at most ∼80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.

KW - Amino acid composition

KW - Distance dependent statistical potentials

KW - Protein structure prediction

KW - Systematic errors

UR - http://www.scopus.com/inward/record.url?scp=34247281977&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247281977&partnerID=8YFLogxK

U2 - 10.1002/prot.21279

DO - 10.1002/prot.21279

M3 - Article

VL - 67

SP - 559

EP - 568

JO - Proteins: Structure, Function and Bioinformatics

JF - Proteins: Structure, Function and Bioinformatics

SN - 0887-3585

IS - 3

ER -