Table II Characteristics of the three largest clusters

The three largest clusters of expression patterns (R > 0.80) as indicated in Figure 1 (of 12 clusters with >10 members). Function enrichment analyzed with FuncAssociate (p value < 0.001) (44). The F measure is the harmonic mean of precision and recall. Combined prediction aims to predict membership for all 12 clusters simultaneously, i.e. membership in cluster (A, B, C, D…); individual predictions predict membership of a gene in one cluster at a time, i.e. in cluster A (yes or no). All of the attributes (features) used are listed in Table I with more detailed descriptions. The value listed next to the selected attribute describes the result of a t test. A t value of >3.40 is significant at a p value of <0.001 level for all three clusters (given the cluster size) and is printed in bold type. Negative and positive t values indicate depletion and enrichment of the feature in the test set, respectively. [T]5–10 and [TA]3–6 are binding motifs for the poly(A)-binding proteins Pub1 and Pab1, respectively (54). AUC, area under curve (where the curve is the receiver-operator characteristic), the closer to 1 the better is the prediction; PARS, parallel analysis of RNA structure, i.e. experimental measure of double-strandedness in RNA taken from Ref. 103.

Cluster ACluster BCluster C
Cluster size1277666
Combined prediction (157 features)
    F measure0.650.140.08
    AUC0.850.650.78
Individual prediction (17 features)
    F measure0.660.330.06
    AUC0.880.710.66
Protein function enrichmentRibosome, translationOxidoreductase, protein foldingProteases
Attribute selection
    Merit of best subset of attributes in prediction0.380.220.24
Subset of predictive features
    Arginine content6.35−6.19−5.51
    Aspartate content−5.291.53−0.15
    Codon adaptation index (101)14.05−2.110.06
    DISEMBL hot loops (disorder measure) (94)6.24−4.99−2.95
    Target of Khd1 (53)−9.09−0.73−0.28
    Target of Lhp1 (52)6.14−0.97−0.82
    Logarithm of mRNA half-life under normal conditions (98)−5.795.473.57
    Logarithm of protein production rate under normal conditions (97)13.320.594.62
    PARS, average score in 5′-UTR (103)−4.823.270.89
    PARS, average score amongst first 10 nucleotides in coding sequence (103)2.28−2.681.86
    PARS CDS, maximum score in coding sequence (103)18.00−0.682.44
    PARS CDS, spread (standard deviation) of scores in coding sequence (103)13.18−1.772.98
    PARS, spread (relative standard deviation) of scores in 3′-UTR (103)−1.20−0.58−0.54
    Isoelectric point (101)6.51−5.61−5.43
    Translation efficiency under menadione stress (30)−4.582.050.92
Motif presence
    In 5′-UTR[T]5–10[T]5–10, [TA]3–6[T]5–10
    In 3′-UTR[T]5–10[T]5–10, [TA]3–6[T]5–10, [TA]3–6