Table IV Accuracy of our binary-QSAR model against the P87mix and Lit data sets

[vs P87] All of the possible cleavage sequences of P87mix (All) or those having aars in all of the P6–P6′ positions [P6–P6′; e.g., for a peptide ACDEFGHIKLMNPQRSTVWY, there are 19 possible cleavage sites. Among them, ACDEFG/HIKLMNPQRSTVWY (where/is the calpain cleavage site; for this cut, there are aars at P6–P14′) is included, but ACDEF/GHIKLMNPQRSTVWY (which is P5–P15′ and does not have aar at P6) is excluded] were tested using our binary-QSAR model. The accuracy and leave-one-out (LOO) accuracy rates for cleaved, uncleaved, and total sequences are shown. [vs Lit] Of 420 cleaved sequences in the literature, 132 P10–P10′ (20mer) sequences that were not used for training any of the predictors shown here (used as positive samples) and their reversed sequences (as negative samples) were tested (total n = 264). Various prediction rates are shown for the binary-QSAR model with a threshold of 0.5 or 0.95 (B-QSAR(0.5) or (0.95), respectively) in comparison with previously reported methods (GPS-H, -M, and -L: ccd.biocuckoo.org (16); SVL-R, -L, PSSM, and MKL: www.calpain.org (15); SP-C1, and -C2: www.dmbr.ugent.be/prx/bioit2-public/SitePrediction/ (18)). Bold numbers indicate the best scores. For each prediction result, see supplemental Table S14.

vs P87P6–P6'All
naAccuracyLOO accuracynAccuracy
Cleaved3140.5760.5734830.582
Uncleaved4920.8680.8621,2200.744
Total8060.7540.7491,7030.698
vs LitB-QSAR (0.5)B-QSAR (0.95)GPS-HGPS-MGPS-LSVL-RSVL-LPSSMMKLSP-C1SP-C2
Sensitivity [TP/(TP+FN)]0.5380.0530.2420.3480.3940.2580.2050.2880.2880.1970.220
Specificity [TN/(TN+FP)]0.7580.9920.9470.8330.7420.9390.9550.9090.9090.9320.871
PPV, positive prediction value [TP/(TP+FP)]0.6890.8750.8210.6760.6050.8100.8180.7600.7600.7430.630
NPV, negative prediction value [TN/(TN+FN)]0.6210.5120.5560.5610.5510.5590.5450.5610.5610.5370.528
Total accuracy [(TP+TN)/n]0.6480.5230.5950.5910.5680.5980.5800.5980.5980.5640.545
  • a Abbreviations used, GPS-H, -M, or -L: high-, medium-, or low-threshold mode of GPS-CCD Ver.1; SVL-R or -L: support vector machine using RBF or Linear kernels; PSSM: position-specific scoring matrix method; MKL: multiple kernel learning method; SP-C1, or -C2: Site Prediction for cleavage by calpain-1 or -2 (all species); n: number of samples used; TP, true positive; FN, false negative; TN, true negative; FP, false positive.