Identifying functionally informative evolutionary sequence profiles

Nelson Gil, Andras Fiser

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

Original languageEnglish (US)
Pages (from-to)1278-1286
Number of pages9
JournalBioinformatics
Volume34
Issue number8
DOIs
StatePublished - Apr 15 2018

Fingerprint

Multiple Sequence Alignment
Sequence Alignment
Mutual Information
Proteins
Alignment
Peptide Nucleic Acids
Protein
Protein Structure Prediction
Protein Databases
Protein-protein Interaction
Protein Sequence
Computational Biology
Large Set
Peptides
Annotation
Profile
Conservation
Bioinformatics
Substrate
Databases

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Identifying functionally informative evolutionary sequence profiles. / Gil, Nelson; Fiser, Andras.

In: Bioinformatics, Vol. 34, No. 8, 15.04.2018, p. 1278-1286.

Research output: Contribution to journalArticle

@article{7af4691defcc45ee8e2dadf8a0066a9e,
title = "Identifying functionally informative evolutionary sequence profiles",
abstract = "Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.",
author = "Nelson Gil and Andras Fiser",
year = "2018",
month = "4",
day = "15",
doi = "10.1093/bioinformatics/btx779",
language = "English (US)",
volume = "34",
pages = "1278--1286",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "8",

}

TY - JOUR

T1 - Identifying functionally informative evolutionary sequence profiles

AU - Gil, Nelson

AU - Fiser, Andras

PY - 2018/4/15

Y1 - 2018/4/15

N2 - Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

AB - Motivation Multiple sequence alignments (MSAs) can provide essential input to many bioinformatics applications, including protein structure prediction and functional annotation. However, the optimal selection of sequences to obtain biologically informative MSAs for such purposes is poorly explored, and has traditionally been performed manually. Results We present Selection of Alignment by Maximal Mutual Information (SAMMI), an automated, sequence-based approach to objectively select an optimal MSA from a large set of alternatives sampled from a general sequence database search. The hypothesis of this approach is that the mutual information among MSA columns will be maximal for those MSAs that contain the most diverse set possible of the most structurally and functionally homogeneous protein sequences. SAMMI was tested to select MSAs for functional site residue prediction by analysis of conservation patterns on a set of 435 proteins obtained from protein-ligand (peptides, nucleic acids and small substrates) and protein-protein interaction databases.

UR - http://www.scopus.com/inward/record.url?scp=85046729786&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046729786&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx779

DO - 10.1093/bioinformatics/btx779

M3 - Article

C2 - 29211823

AN - SCOPUS:85046729786

VL - 34

SP - 1278

EP - 1286

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 8

ER -