Table VI

Analysis of the teaching dataset used by ChloroP (www.cbs.dtu.dk/services/ChloroP/pages/datasets.html)

Examination of the ChloroP training dataset
Amino terminus explicitly determined for the Swiss-Prot entry49
Amino terminus determined in orthologous protein, “By Similarity” annotation omitted in Swiss-Prot9
Amino terminus not experimentally determined, “Probable” or “Potential” omitted in Swiss-Prot8
Amino terminus annotation is incorrect in Swiss-Prot when compared to the literature6
Amino terminus from translation products imported into heterologous plastids or from transformed E. coli3
    Total75
ChloroP training dataset is biased to proteins targeted to the stroma
Stroma62
Thylakoid12
Envelope1
Lumen0
ChloroP training dataset is biased towards two species and has a significant number of algal sequences
Streptophyta62
    Spinach18
    Pea15
Chlorophyta12
    Chlamydomonas reinhardtii8
Rhodophyta1