Advertisement

Reanalysis of ProteomicsDB Using an Accurate, Sensitive, and Scalable False Discovery Rate Estimation Approach for Protein Groups

Open AccessPublished:October 31, 2022DOI:https://doi.org/10.1016/j.mcpro.2022.100437

      Highlights

      • Evaluating protein group FDR estimation methods with entrapment and simulated data.
      • Accurate & sensitive protein group FDR method on databases with protein isoforms.
      • Tool for combining multiple large-scale MaxQuant searches on protein group-level.
      • Analysis on ProteomicsDB identified >1200 human genes with multiple protein groups.

      Abstract

      Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry–based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam’s razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.

      Graphical Abstract

      Keywords

      Abbreviations:

      bpP (best percolator PEP), cT (classic TDS), dS (discarded shared peptides), FDR (false discovery rate), mmP (multiplication of MaxQuant PEPs), MS (mass spectrometry), PEP (posterior error probability), pgT (picked group TDS), PSM (peptide-spectrum match), pT (picked TDS), rS (razor peptide), rsG (rescued subset protein grouping), sG (subset protein grouping), TDS (target-decoy strategy)
      To read this article in full you will need to make a payment
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'

      Subscribe:

      Subscribe to Molecular & Cellular Proteomics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect

      References

        • Wilhelm M.
        • Schlegl J.
        • Hahne H.
        • Gholami A.M.
        • Lieberenz M.
        • Savitski M.M.
        • et al.
        Mass-spectrometry-based draft of the human proteome.
        Nature. 2014; 509: 582-587
        • Kim M.-S.
        • Pinto S.M.
        • Getnet D.
        • Nirujogi R.S.
        • Manda S.S.
        • Chaerkady R.
        • et al.
        A draft map of the human proteome.
        Nature. 2014; 509: 575-581
        • Huttlin E.L.
        • Bruckner R.J.
        • Navarrete-Perea J.
        • Cannon J.R.
        • Baltier K.
        • Gebreab F.
        • et al.
        Dual proteome-scale networks reveal cell-specific remodeling of the human interactome.
        Cell. 2021; 184: 3022-3040.e28
        • Edwards N.J.
        • Oberti M.
        • Thangudu R.R.
        • Cai S.
        • McGarvey P.B.
        • Jacob S.
        • et al.
        The CPTAC data portal: a resource for cancer proteomics research.
        J. Proteome Res. 2015; 14: 2707-2713
        • Lautenbacher L.
        • Samaras P.
        • Muller J.
        • Grafberger A.
        • Shraideh M.
        • Rank J.
        • et al.
        ProteomicsDB: toward a FAIR open-source resource for life-science research.
        Nucleic Acids Res. 2022; 50: D1541-D1552
        • Perez-Riverol Y.
        • Csordas A.
        • Bai J.
        • Bernal-Llinares M.
        • Hewapathirana S.
        • Kundu D.J.
        • et al.
        The PRIDE database and related tools and resources in 2019: improving support for quantification data.
        Nucleic Acids Res. 2019; 47: D442-D450
        • Desiere F.
        • Deutsch E.W.
        • King N.L.
        • Nesvizhskii A.I.
        • Mallick P.
        • Eng J.
        • et al.
        The PeptideAtlas project.
        Nucleic Acids Res. 2006; 34: D655-D658
        • Savitski M.M.
        • Wilhelm M.
        • Hahne H.
        • Kuster B.
        • Bantscheff M.
        A scalable approach for protein false discovery rate estimation in large proteomic data sets.
        Mol. Cell Proteomics. 2015; 14: 2394-2404
        • The M.
        • MacCoss M.J.
        • Noble W.S.
        • Käll L.
        Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0.
        J. Am. Soc. Mass Spectrom. 2016; 27: 1719
        • Omenn G.S.
        • Lane L.
        • Overall C.M.
        • Paik Y.-K.
        • Cristea I.M.
        • Corrales F.J.
        • et al.
        Progress identifying and analyzing the human proteome: 2021 metrics from the HUPO human proteome project.
        J. Proteome Res. 2021; 20: 5227-5240
        • Plubell D.L.
        • Käll L.
        • Webb-Robertson B.-J.
        • Bramer L.
        • Ives A.
        • Kelleher N.L.
        • et al.
        Can we put Humpty Dumpty back together again? What does protein quantification mean in bottom-up proteomics?.
        bioRxiv. 2021; ([preprint])https://doi.org/10.1101/2021.01.25.428175
        • Tapial J.
        • Ha K.C.H.
        • Sterne-Weiler T.
        • Gohr A.
        • Braunschweig U.
        • Hermoso-Pulido A.
        • et al.
        An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms.
        Genome Res. 2017; 27: 1759-1768
        • Rechenberger J.
        • Samaras P.
        • Jarzab A.
        • Behr J.
        • Frejno M.
        • Djukovic A.
        • et al.
        Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae.
        Proteomes. 2019; 7: 2
        • Elias J.E.
        • Gygi S.P.
        Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.
        Nat. Methods. 2007; 4: 207-214
        • Serang O.
        • Käll L.
        Solution to statistical challenges in proteomics is more statistics, not less.
        J. proteome Res. 2015; 14: 4099-4103
        • Ezkurdia I.
        • Vázquez J.
        • Valencia A.
        • Tress M.
        Analyzing the first drafts of the human proteome.
        J. Proteome Res. 2014; 13: 3854-3855
        • Reiter L.
        • Claassen M.
        • Schrimpf S.P.
        • Jovanovic M.
        • Schmidt A.
        • Buhmann J.M.
        • et al.
        Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.
        Mol. Cell Proteomics. 2009; 8: 2405-2417
        • Dost B.
        • Bandeira N.
        • Li X.
        • Shen Z.
        • Briggs S.P.
        • Bafna V.
        Accurate mass spectrometry based protein quantification via shared peptides.
        J. Comput. Biol. 2012; 19: 337-348
        • Nesvizhskii A.I.
        • Aebersold R.
        Interpretation of shotgun proteomic data.
        Mol. Cell Proteomics. 2005; 4: 1419-1440
        • Serang O.
        • Noble W.
        A review of statistical methods for protein identification using tandem mass spectrometry.
        Stat. Interf. 2012; 5: 3
        • The M.
        • Tasnim A.
        • Käll L.
        How to talk about protein-level false discovery rates in shotgun proteomics.
        Proteomics. 2016; 16: 2461-2469
        • Audain E.
        • Uszkoreit J.
        • Sachsenberg T.
        • Pfeuffer J.
        • Liang X.
        • Hermjakob H.
        • et al.
        In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics.
        J. Proteomics. 2017; 150: 170-182
        • Schallert K.
        • Verschaffelt P.
        • Mesuere B.
        • Benndorf D.
        • Martens L.
        • Van Den Bossche T.
        Pout2Prot: an efficient tool to create protein (Sub) groups from percolator output files.
        J. Proteome Res. 2022; 21: 1175-1180
        • Serang O.
        • MacCoss M.J.
        • Noble W.S.
        Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data.
        J. Proteome Res. 2010; 9: 5346-5357
        • Pfeuffer J.
        • Sachsenberg T.
        • Dijkstra T.M.H.
        • Serang O.
        • Reinert K.
        • Kohlbacher O.
        EPIFANY: a method for efficient high-confidence protein inference.
        J. Proteome Res. 2020; 19: 1060-1072
        • Li Y.F.
        • Arnold R.J.
        • Li Y.
        • Radivojac P.
        • Sheng Q.
        • Tang H.
        A Bayesian approach to protein inference problem in shotgun proteomics.
        J. Comput. Biol. 2009; 16: 1183-1193
        • Kim M.
        • Eetemadi A.
        • Tagkopoulos I.
        DeepPep: deep proteome inference from peptide profiles.
        PLoS Comput. Biol. 2017; 13: e1005661
        • Serang O.
        • Moruz L.
        • Hoopmann M.R.
        • Kall L.
        Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences.
        J. Proteome Res. 2012; 11: 5586-5591
        • Tyanova S.
        • Temu T.
        • Cox J.
        The MaxQuant computational platform for mass spectrometry-based shotgun proteomics.
        Nat. Protoc. 2016; 11: 2301-2319
        • Wang D.
        • Eraslan B.
        • Wieland T.
        • Hallström B.
        • Hopf T.
        • Zolg D.P.
        • et al.
        A deep proteome and transcriptome abundance atlas of 29 healthy human tissues.
        Mol. Syst. Biol. 2019; 15: e8503
        • Granholm V.
        • Navarro J.F.
        • Noble W.S.
        • Käll L.
        Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics.
        J. Proteomics. 2013; 80: 123-131
        • Abascal F.
        • Ezkurdia I.
        • Rodriguez-Rivas J.
        • Rodriguez J.M.
        • Pozo A.d.
        • Vázquez J.
        • et al.
        Alternatively spliced homologous exons have ancient origins and are highly Expressed at the protein level.
        PLoS Comput. Biol. June 2015; 11: e1004325
        • Tress M.L.
        • Abascal F.
        • Valencia A.
        Alternative splicing may not be the key to proteome complexity.
        Trends Biochem. Sci. 2017; 42: 98-110
        • Gerster S.
        • Kwon T.
        • Ludwig C.
        • Matondo M.
        • Vogel C.
        • Marcotte E.M.
        • et al.
        Statistical approach to protein quantification.
        Mol. Cell. Proteomics. 2014; 13: 666-677
        • Jacob L.
        • Combes F.
        • Burger T.
        PEPA test: fast and powerful differential analysis from relative quantitative proteomics data using shared peptides.
        Biostatistics. 2019; 20: 632-647