Amino acid runs in eukaryotic proteomes and disease associations

Samuel Karlin, Luciano Brocchieri, Aviv Bergman, Jan Mrázek, Andrew J. Gentles

Research output: Contribution to journalArticle

154 Citations (Scopus)

Abstract

We present a comparative proteome analysis of the five complete eukaryoticgenomes(human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca2+ and K+ channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80% of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.

Original languageEnglish (US)
Pages (from-to)333-338
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume99
Issue number1
DOIs
StatePublished - Jan 8 2002
Externally publishedYes

Fingerprint

Proteome
Glutamine
Amino Acids
Leucine
Serine
Drosophila
Drosophila Proteins
Genetic Translocation
Proteins
Asparagine
Caenorhabditis elegans
Nervous System Diseases
Drosophila melanogaster
DNA Replication
Arabidopsis
Proline
Diptera
DNA Repair
Alanine
Nervous System

ASJC Scopus subject areas

  • Genetics
  • General

Cite this

Amino acid runs in eukaryotic proteomes and disease associations. / Karlin, Samuel; Brocchieri, Luciano; Bergman, Aviv; Mrázek, Jan; Gentles, Andrew J.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 99, No. 1, 08.01.2002, p. 333-338.

Research output: Contribution to journalArticle

Karlin, Samuel ; Brocchieri, Luciano ; Bergman, Aviv ; Mrázek, Jan ; Gentles, Andrew J. / Amino acid runs in eukaryotic proteomes and disease associations. In: Proceedings of the National Academy of Sciences of the United States of America. 2002 ; Vol. 99, No. 1. pp. 333-338.
@article{70835c5030004ed9a2c8f2c5cd93f71c,
title = "Amino acid runs in eukaryotic proteomes and disease associations",
abstract = "We present a comparative proteome analysis of the five complete eukaryoticgenomes(human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca2+ and K+ channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80{\%} of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.",
author = "Samuel Karlin and Luciano Brocchieri and Aviv Bergman and Jan Mr{\'a}zek and Gentles, {Andrew J.}",
year = "2002",
month = "1",
day = "8",
doi = "10.1073/pnas.012608599",
language = "English (US)",
volume = "99",
pages = "333--338",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "1",

}

TY - JOUR

T1 - Amino acid runs in eukaryotic proteomes and disease associations

AU - Karlin, Samuel

AU - Brocchieri, Luciano

AU - Bergman, Aviv

AU - Mrázek, Jan

AU - Gentles, Andrew J.

PY - 2002/1/8

Y1 - 2002/1/8

N2 - We present a comparative proteome analysis of the five complete eukaryoticgenomes(human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca2+ and K+ channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80% of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.

AB - We present a comparative proteome analysis of the five complete eukaryoticgenomes(human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana), focusing on individual and multiple amino acid runs, charge and hydrophobic runs. We found that human proteins with multiple long runs are often associated with diseases; these include long glutamine runs that induce neurological disorders, various cancers, categories of leukemias (mostly involving chromosomal translocations), and an abundance of Ca2+ and K+ channel proteins. Many human proteins with multiple runs function in development and/or transcription regulation and are Drosophila homeotic homologs. A large number of these proteins are expressed in the nervous system. More than 80% of Drosophila proteins with multiple runs seem to function in transcription regulation. The most frequent amino acid runs in Drosophila sequences occur for glutamine, alanine, and serine, whereas human sequences highlight glutamate, proline, and leucine. The most frequent runs in yeast are of serine, glutamine, and acidic residues. Compared with the other eukaryotic proteomes, amino acid runs are significantly more abundant in the fly. This finding might be interpreted in terms of innate differences in DNA-replication processes, repair mechanisms, DNA-modification systems, and mutational biases. There are striking differences in amino acid runs for glutamine, asparagine, and leucine among the five proteomes.

UR - http://www.scopus.com/inward/record.url?scp=0037039436&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0037039436&partnerID=8YFLogxK

U2 - 10.1073/pnas.012608599

DO - 10.1073/pnas.012608599

M3 - Article

VL - 99

SP - 333

EP - 338

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 1

ER -