Table I Features used to represent PSMs

Each PSM obtained from the search is represented using 17 features. These are the same features used by Percolator, except that three features were removed. These three features—for example, the number of other spectra that match to the same peptide—capture properties of the entire collection of PSMs. We removed them to ensure complete separation between the training set and the test set.

1XCorrCross-correlation between calculated and observed spectra
2ΔCnFractional difference between current and second best XCorr
3ΔCnLFractional difference between current and fifth best XCorr
4SpPreliminary score for peptide versus predicted fragment ion values
5ln(rSp)The natural logarithm of the rank of the match based on the Sp score
8MassThe observed mass [M + H]+
6ΔMThe difference in calculated and observed mass
7abs(ΔM)The absolute value of the difference in calculated and observed mass
9ionFracThe fraction of matched b and y ions
10ln(NumSp)The natural logarithm of the number of database peptides within the specified m/z range
11enzNBoolean: Is the peptide preceded by an enzymatic (tryptic) site?
12enzCBoolean: Does the peptide have an enzymatic (tryptic) C terminus?
13enzIntNumber of missed internal enzymatic (tryptic) sites
14pepLenThe length of the matched peptide, in residues
15–17charge1–3Three Boolean features indicating the charge state