Submitted on April 19, 2005
Revised on May 9, 2005
Accepted on May 22, 2005
New database-independent, sequence-tag-based scoring of peptide MS/MS data validates mowse scores, recovers below-threshold data, singles out modified peptides and assesses the quality of MS/MS techniques
Mikhail M. Savitski, Michael L. Nielsen, and Roman A. Zubarev
Uppsala University, Laboratory for Biological and Medical Mass Spectrometry, Uppsala SE-75 123
Corresponding Author: Mikhail.Savitski{at}bmms.uu.se
The Mowse score (M-score) is one of the conventional validity measures in database identification of peptides and proteins by MS/MS data. Although tremendously useful, M-score has a number of limitations. For the same MS/MS data, M-score may change if the protein database is expanded. Low M-value may not necessarily mean poor match, but rather poor MS/MS quality. Besides, M-score does not utilize the advantage of combined use of complementary fragmentation techniques collisionally activated dissociation (CAD) and electron capture dissociation (ECD). To address these issues, a new database-independent scoring method (S-score) is designed that is based on the maximum length of the peptide sequence tag provided by the combined CAD and ECD data. The quality of MS/MS spectra assessed by S-score allows to filter out poor data (39% of all MS/MS spectra) before the database search, speeding up the data analysis and eliminating a major source of false positive identifications. Spectra with below-threshold M-scores (poor matches) but high S-scores are validated. Spectra with zero M-score (no database match) but high S-score are classified as belonging to modified sequences. As an extension of S-score, an extremely reliable sequence tag is developed based on complementary fragments simultaneously appearing in CAD and ECD spectra. Comparison of this tag with the database-derived sequence gives the most reliable peptide ID validation to date. The combined use of M- and S-scoring provides positive sequence identification from >25% of all MS/MS data, a 100% improvement over traditional M-scoring performed on the same Fourier transform MS instrumentation. The number of proteins reliably identified from E. coli cell lysate hereby increased by 160% compared to traditional CAD-only, M-score approach. Finally, S-scoring provides a quantitative measure of the quality of fragmentation techniques, such as the minimum abundance of the precursor ion, MS/MS of which gives the threshold S-score value of two.