Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang, Ping Wang, Zhiqiang Ye, Heng Xu, Zhisong He, Kaiyan Feng, Lele Hu, Weiren Cui, Kai Wang, Xiao Dong, Lu Xie, Xiangyin Kong, Yu Dong Cai, Yixue Li

Research output: Contribution to journalArticle

69 Citations (Scopus)

Abstract

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

Original languageEnglish (US)
Article numbere11900
JournalPloS one
Volume5
Issue number7
DOIs
StatePublished - Aug 20 2010
Externally publishedYes

Fingerprint

Protein Interaction Maps
Polymorphism
Single Nucleotide Polymorphism
polymorphism
genetic polymorphism
Amino Acids
amino acids
prediction
Proteins
proteins
Neutral Amino Acids
Genome-Wide Association Study
protein structure
Computational methods
Redundancy
Feature extraction
amino acid sequences
Genes
methodology
Phenotype

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. / Huang, Tao; Wang, Ping; Ye, Zhiqiang; Xu, Heng; He, Zhisong; Feng, Kaiyan; Hu, Lele; Cui, Weiren; Wang, Kai; Dong, Xiao; Xie, Lu; Kong, Xiangyin; Cai, Yu Dong; Li, Yixue.

In: PloS one, Vol. 5, No. 7, e11900, 20.08.2010.

Research output: Contribution to journalArticle

Huang, T, Wang, P, Ye, Z, Xu, H, He, Z, Feng, K, Hu, L, Cui, W, Wang, K, Dong, X, Xie, L, Kong, X, Cai, YD & Li, Y 2010, 'Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties', PloS one, vol. 5, no. 7, e11900. https://doi.org/10.1371/journal.pone.0011900
Huang, Tao ; Wang, Ping ; Ye, Zhiqiang ; Xu, Heng ; He, Zhisong ; Feng, Kaiyan ; Hu, Lele ; Cui, Weiren ; Wang, Kai ; Dong, Xiao ; Xie, Lu ; Kong, Xiangyin ; Cai, Yu Dong ; Li, Yixue. / Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties. In: PloS one. 2010 ; Vol. 5, No. 7.
@article{762ae242c4244ec98f032615c47d20d8,
title = "Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties",
abstract = "Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27{\%} of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00{\%}. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05{\%} on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:",
author = "Tao Huang and Ping Wang and Zhiqiang Ye and Heng Xu and Zhisong He and Kaiyan Feng and Lele Hu and Weiren Cui and Kai Wang and Xiao Dong and Lu Xie and Xiangyin Kong and Cai, {Yu Dong} and Yixue Li",
year = "2010",
month = "8",
day = "20",
doi = "10.1371/journal.pone.0011900",
language = "English (US)",
volume = "5",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

AU - Huang, Tao

AU - Wang, Ping

AU - Ye, Zhiqiang

AU - Xu, Heng

AU - He, Zhisong

AU - Feng, Kaiyan

AU - Hu, Lele

AU - Cui, Weiren

AU - Wang, Kai

AU - Dong, Xiao

AU - Xie, Lu

AU - Kong, Xiangyin

AU - Cai, Yu Dong

AU - Li, Yixue

PY - 2010/8/20

Y1 - 2010/8/20

N2 - Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

AB - Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

UR - http://www.scopus.com/inward/record.url?scp=77955645823&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955645823&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0011900

DO - 10.1371/journal.pone.0011900

M3 - Article

C2 - 20689580

AN - SCOPUS:77955645823

VL - 5

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 7

M1 - e11900

ER -