Submitted on September 6, 2006
Revised on November 16, 2006
Accepted on December 10, 2006
EBP:Protein identification using multiple tandem mass spectrometry datasets
Thomas S. Price, Margaret B. Lucitt, Weichen Wu, David J. Austin, Angel Pizarro, Anastasia K. Yocum, Ian A. Blair, Garret A. FitzGerald, and Tilo Grosser
Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104
Corresponding Author: tilo{at}spirit.gcrc.upenn.edu
Tandem mass spectrometry (MS/MS) combined with database search methods can identify the proteins present in complex mixtures. High-throughput methods that infer probable peptide sequences from enzymatically digested protein samples create a challenge in how best to aggregate the evidence for candidate proteins. Typically the results of multiple technical and/or biological replicate experiments must be combined in order to maximize sensitivity. We present a statistical method for estimating probabilities of protein expression that integrates peptide sequence identifications from multiple search algorithms and replicate experimental runs. The method was applied to create a repository of 797 non-homologous zebrafish (Danio rerio) proteins, at an empirically-validated false identification rate under 1%, as a resource for the development of targeted quantitative proteomics assays. We have implemented this statistical method as an analytic module that can be integrated with an existing suite of open-source proteomics software.