Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang; Ping Wang; Zhiqiang Ye; Heng Xu; Zhisong He; Kaiyan Feng; Lele Hu; Weiren Cui; Kai Wang; Xiao Dong; Lu Xie; Xiangyin Kong; Yu Dong Cai; Yixue Li

doi:10.1371/journal.pone.0011900

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Tao Huang, Ping Wang, Zhiqiang Ye, Heng Xu, Zhisong He, Kaiyan Feng, Lele Hu, Weiren Cui, Kai Wang, Xiao Dong, Lu Xie, Xiangyin Kong, Yu Dong Cai, Yixue Li

Research output: Contribution to journal › Article › peer-review

74 Scopus citations

Abstract

Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

Original language	English (US)
Article number	e11900
Journal	PloS one
Volume	5
Issue number	7
DOIs	https://doi.org/10.1371/journal.pone.0011900
State	Published - 2010
Externally published	Yes

ASJC Scopus subject areas

General

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1371/journal.pone.0011900

Cite this

@article{762ae242c4244ec98f032615c47d20d8,

title = "Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties",

abstract = "Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:",

author = "Tao Huang and Ping Wang and Zhiqiang Ye and Heng Xu and Zhisong He and Kaiyan Feng and Lele Hu and Weiren Cui and Kai Wang and Xiao Dong and Lu Xie and Xiangyin Kong and Cai, {Yu Dong} and Yixue Li",

year = "2010",

doi = "10.1371/journal.pone.0011900",

language = "English (US)",

volume = "5",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "7",

}

TY - JOUR

T1 - Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

AU - Huang, Tao

AU - Wang, Ping

AU - Ye, Zhiqiang

AU - Xu, Heng

AU - He, Zhisong

AU - Feng, Kaiyan

AU - Hu, Lele

AU - Cui, Weiren

AU - Wang, Kai

AU - Dong, Xiao

AU - Xie, Lu

AU - Kong, Xiangyin

AU - Cai, Yu Dong

AU - Li, Yixue

PY - 2010

Y1 - 2010

N2 - Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

AB - Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. Copyright:

UR - http://www.scopus.com/inward/record.url?scp=77955645823&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955645823&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0011900

DO - 10.1371/journal.pone.0011900

M3 - Article

C2 - 20689580

AN - SCOPUS:77955645823

SN - 1932-6203

VL - 5

JO - PloS one

JF - PloS one

IS - 7

M1 - e11900

ER -

Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties

Abstract

ASJC Scopus subject areas

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this