|
|
||||||||
Submitted on May 23, 2007
Revised on February 19, 2008
Accepted on February 25, 2008
Centro de Biología Molecular Severo Ochoa, Cantoblanco, Madrid 28049
Corresponding Author: jvazquez{at}cbm.uam.es
High-throughput identification of peptides in databases from tandem mass spectrometry data is a key technique in modern Proteomics. Common approaches to interpret large-scale peptide identification results are based on the statistical analysis of average score distributions, which are constructed from the set of best scores produced by large collections of MS/MS spectra by using searching engines such as SEQUEST. Other approaches calculate individual peptide identification probabilities on the basis of theoretical models or from single-spectrum score distributions constructed by the set of scores produced by each MS/MS spectrum. In this work, we study the mathematical properties of average SEQUEST score distributions by introducing the concept of spectrum quality and expressing these average distributions as compositions of single-spectrum distributions. We predict and demonstrate in the practice that average score distributions are dominated by the quality distribution in the spectra collection, except in the low probability region, where it is possible to predict the dependence of average probability on database size. Our analysis leads to a novel indicator, the probability ratio, which takes optimally into account the statistical information provided by the first and second best scores. We also demonstrate that the probability ratio is a robust indicator that allows a peptide identification performance at least better, on the basis of error rates, than that obtained by more sophisticated empirical algorithms. The probability ratio also compares favorably with statistical probability indicators obtained by the construction of single-spectrum SEQUEST score distributions by extensive computation. These results make the conceptual simplicity and ease of automation of the probability ratio algorithm a very attractive alternative to determine peptide identification confidences and error rates in high-throughput experiments.
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |