Identifying functionally informative evolutionary sequence profiles

Nelson Gil; Andras Fiser

doi:10.1093/bioinformatics/btx779

Identifying functionally informative evolutionary sequence profiles

Nelson Gil, Andras Fiser

Systems & Computational Biology

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

Original language	English (US)
Pages (from-to)	1278-1286
Number of pages	9
Journal	Bioinformatics
Volume	34
Issue number	8
DOIs	https://doi.org/10.1093/bioinformatics/btx779
State	Published - Apr 15 2018

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btx779

Cite this

@article{7af4691defcc45ee8e2dadf8a0066a9e,

title = "Identifying functionally informative evolutionary sequence profiles",

abstract = "Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.",

author = "Nelson Gil and Andras Fiser",

note = "Publisher Copyright: {\textcopyright} 2017 The Author. Published by Oxford University Press. All rights reserved. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.",

year = "2018",

month = apr,

day = "15",

doi = "10.1093/bioinformatics/btx779",

language = "English (US)",

volume = "34",

pages = "1278--1286",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "8",

}

TY - JOUR

T1 - Identifying functionally informative evolutionary sequence profiles

AU - Gil, Nelson

AU - Fiser, Andras

PY - 2018/4/15

Y1 - 2018/4/15

N2 - Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

AB - Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

UR - http://www.scopus.com/inward/record.url?scp=85046729786&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046729786&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx779

DO - 10.1093/bioinformatics/btx779

M3 - Article

C2 - 29211823

AN - SCOPUS:85046729786

SN - 1367-4803

VL - 34

SP - 1278

EP - 1286

JO - Bioinformatics

JF - Bioinformatics

IS - 8

ER -

Identifying functionally informative evolutionary sequence profiles

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this