Originally published In Press as doi:10.1074/mcp.M900317-MCP200 on July 16, 2009.
Molecular & Cellular Proteomics 8:2405-2417, 2009.
© 2009 by The American Society for Biochemistry and Molecular Biology, Inc.
Research
Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry*,
Lukas Reitera,b,c,d,e,
Manfred Claassenc,e,f,g,
Sabine P. Schrimpfa,d,
Marko Jovanovica,b,d,
Alexander Schmidtc,
Joachim M. Buhmannf,g,
Michael O. Hengartnera,b,d,h and
Ruedi Aebersoldc,g,i,j,k
From the aInstitute of Molecular Biology,
dCenter for Model Organism Proteomes, and
jFaculty of Science, University of Zurich , CH-8057 Zurich, Switzerland,
bPh.D. Program in Molecular Life Sciences Zurich, University of Zurich and ETH Zurich , CH-8057 Zurich, Switzerland,
cInstitute of Molecular Systems Biology, ETH Zurich , CH-8093 Zurich, Switzerland,
fInstitute of Computational Science, ETH Zurich , CH-8092 Zurich, Switzerland,
gCompetence Center for Systems Physiology and Metabolic Diseases , CH-8093 Zurich, Switzerland, and
iInstitute for Systems Biology , Seattle, Washington 98103-8904
Comprehensive characterization of a proteome is a fundamental goal in proteomics. To achieve saturation coverage of a proteome or specific subproteome via tandem mass spectrometric identification of tryptic protein sample digests, proteomics data sets are growing dramatically in size and heterogeneity. The trend toward very large integrated data sets poses so far unsolved challenges to control the uncertainty of protein identifications going beyond well established confidence measures for peptide-spectrum matches. We present MAYU, a novel strategy that reliably estimates false discovery rates for protein identifications in large scale data sets. We validated and applied MAYU using various large proteomics data sets. The data show that the size of the data set has an important and previously underestimated impact on the reliability of protein identifications. We particularly found that protein false discovery rates are significantly elevated compared with those of peptide-spectrum matches. The function provided by MAYU is critical to control the quality of proteome data repositories and thereby to enhance any study relying on these data sources. The MAYU software is available as standalone software and also integrated into the Trans-Proteomic Pipeline.
k To whom correspondence should be addressed: Inst. of Molecular Systems Biology, Wolfgang-Pauli-Strasse 16, HPT E78, ETH Zurich, CH-8093 Zurich, Switzerland. Tel.: 41-44-633-31-70; Fax: 41-44-633-10-51; E-mail: aebersold{at}imsb.biol.ethz.ch.

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati What's this?
Copyright © 2009 by the American Society for Biochemistry and Molecular Biology.
|
Advertisement
Advertisement
|