Table I

Organisms surveyed for protein simple sequences, the number of proteins in each proteome, total number of simple sequences found (SSTot), and the number of proteins containing at least one simple sequence (ProtSS)

OrganismTwo-letter codeTypeNumber of proteins in proteomeSSTotProtSSSSTot/ProtSS
Saccharomyces cerevisiae SCEukaryote6,2037,1773,2932.18
Caenorhabditis elegans CE21,96223,29511,1252.09
Drosophila melanogaster DM13,60824,7257,9893.09
A. thaliana a AT26,49627,54214,6371.88
Synechocytis sp. PCC6803SSCyanobacteria3,1691,4931,0341.44
Nostoc sp. PCC7120Nos5,3682,4971,7621.42
Escherichia coli K-12ECGram-negative bacteria4,2892,0641,6361.26
Haemophilus influenzae HI1,7096144761.29
Vibrio cholerae chr1VC2,7361,2198811.38
Helicobacter pylori 26695HP1,5666995001.40
Brucella melitensis 16M chr1BM2,0591,0677561.41
Agrobacterium tumefaciens C58AgT2,7221,6101,0781.49
Bacillus subtilis BSGram-positive bacteria4,3671,7231,2701.36
Bacillus halodurans BH4,0661,5971,1821.35
Mycoplasma pneumoniae MP6883602421.49
Mycoplasma genitalium MG4802511701.48
Deinococcus radiodurans chr1DR2,5792,2741,3111.73
Clostridium acetobutylicum ATCC824CA3,6721,5501,1321.37
Archaeoglobus fulgidus AFArchaea2,4219907471.32
Aeropyrum pernix K1AP2,6941,8271,1761.55
Methanobacterium thermoautotrophicum MT1,8696915311.30
Methanococcus jannaschii MJ1,7158756411.36
Pyrococcus abyssi PA1,7658476131.38
Pyrococcus horikoshii PH2,0649737301.33
Halobacterium sp. NRC-1HS2,0581,7851,0601.68
Thermoplasma acidophilum TA1,4785113951.29
Thermoplasma volcanium TV1,5264523761.20
Pyrobaculum aerophilum PAe2,6051,2208661.41
Sulfolobus tokodaii ST2,8261,2038661.39
Sulfolobus solfataricus SSol2,9941,3359621.39
  • a Some AT protein sequences were incomplete and were not included in the analysis. The number of proteins listed for AT corresponds to the number used.