Advertisement

Estimating the Contribution of Proteasomal Spliced Peptides to the HLA-I Ligandome*

Open AccessPublished:August 31, 2018DOI:https://doi.org/10.1074/mcp.RA118.000877
      Spliced peptides are short protein fragments spliced together in the proteasome by peptide bond formation. True estimation of the contribution of proteasome-spliced peptides (PSPs) to the global human leukocyte antigen (HLA) ligandome is critical. A recent study suggested that PSPs contribute up to 30% of the HLA ligandome. We performed a thorough reanalysis of the reported results using multiple computational tools and various validation steps and concluded that only a fraction of the proposed PSPs passes the quality filters. To better estimate the actual number of PSPs, we present an alternative workflow. We performed de novo sequencing of the HLA-peptide spectra and discarded all de novo sequences found in the UniProt database. We checked whether the remaining de novo sequences could match spliced peptides from human proteins. The spliced sequences were appended to the UniProt fasta file, which was searched by two search tools at a false discovery rate (FDR) of 1%. We find that 2–6% of the HLA ligandome could be explained as spliced protein fragments. The majority of these potential PSPs have good peptide-spectrum match properties and are predicted to bind the respective HLA molecules. However, it remains to be shown how many of these potential PSPs actually originate from proteasomal splicing events.

      Graphical Abstract

      The antigen processing and presentation machinery is responsible for the cell surface display of thousands of peptides in the context of the HLA
      The abbreviations used are:
      HLA
      human leukocyte antigen
      HLA-Ip
      HLA class I binding peptides
      FDR
      false discovery rate
      FP
      false positive
      AA
      amino acid
      LC
      liquid chromatography
      MS
      mass spectrometry
      MS/MS
      tandem mass spectrometry
      PSPs
      proteasome-spliced peptides
      DeNovo_spliced
      spliced peptides identified by de-novo
      LM_spliced
      spliced peptides identified by Liepe et al
      LM_UniProt
      UniProt peptides identified by Liepe et al
      PSM
      peptide spectrum match.
      1The abbreviations used are:HLA
      human leukocyte antigen
      HLA-Ip
      HLA class I binding peptides
      FDR
      false discovery rate
      FP
      false positive
      AA
      amino acid
      LC
      liquid chromatography
      MS
      mass spectrometry
      MS/MS
      tandem mass spectrometry
      PSPs
      proteasome-spliced peptides
      DeNovo_spliced
      spliced peptides identified by de-novo
      LM_spliced
      spliced peptides identified by Liepe et al
      LM_UniProt
      UniProt peptides identified by Liepe et al
      PSM
      peptide spectrum match.
      class I (HLA-I) molecules. The proteasome is considered as the main protease that cleaves endogenous proteins. However, in addition to the proteasome, the antigen processing and presentation machinery comprises several other proteases, transporters, and chaperones that cooperatively digest the proteins in the cytoplasm, funnel the peptides into the endoplasmic reticulum (ER), further trim and edit them, load them on newly synthesized HLA-I, and finally direct the stable complexes to the cells' surface (
      • Neefjes J.
      • Jongsma M.L.
      • Paul P.
      • Bakke O.
      Towards a systems understanding of MHC class I and MHC class II antigen presentation.
      ). The selective interaction between the HLA-I complex and the peptides is the major factor that defines the presented repertoire and is often represented with binding motifs.
      Currently, the only unbiased methodology to comprehensively interrogate the repertoire of the HLA-I binding peptides (HLA-Ip) is based on mass spectrometry (MS). HLA complexes are immunoaffinity-purified from cells in culture or from tissues; the peptides are extracted and subjected to reverse-phase liquid chromatography (LC) coupled online to sensitive MS instruments. The acquired tandem mass spectrometry (MS/MS) data are normally searched against a database of protein sequences. Applying a stringent FDR of 1% using a comparable decoy database leads to the accurate identification of thousands of HLA-Ip. HLA-Ip are mainly 9–11 amino acids (AA) long, and usually about 95% of the peptides identified with this methodology fit the consensus binding motifs of the HLA-I molecules expressed in the samples (
      • Bassani-Sternberg M.
      • Pletscher-Frankild S.
      • Jensen L.J.
      • Mann M.
      Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation.
      ).
      In a recent MS-based HLA-I ligandomics study, a novel computational algorithm predicted that a surprisingly large fraction, up to 30%, of the ligands may be derived from transpeptidation of two noncontiguous fragments of a parental protein that are spliced together within the proteasome (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ). Earlier work showed several cases of such proteasomal spliced HLA-I peptides that were naturally presented and recognized by cytotoxic T cells (
      • Ebstein F.
      • Textoris-Taube K.
      • Keller C.
      • Golnik R.
      • Vigneron N.
      • Van den Eynde B.J.
      • Schuler-Thurner B.
      • Schadendorf D.
      • Lorenz F.K.M.
      • Uckert W.
      • Urban S.
      • Lehmann A.
      • Albrecht-Koepke N.
      • Janek K.
      • Henklein P.
      • Niewienda A.
      • Kloetzel P.M.
      • Mishto M.
      Proteasomes generate spliced epitopes by two different mechanisms and as efficiently as non-spliced epitopes.
      ,
      • Hanada K.
      • Yewdell J.W.
      • Yang J.C.
      Immune recognition of a human renal cancer antigen through post-translational protein splicing.
      ,
      • Vigneron N.
      • Stroobant V.
      • Chapiro J.
      • Ooms A.
      • Degiovanni G.
      • Morel S.
      • van der Bruggen P.
      • Boon T.
      • Van den Eynde B.J.
      An antigenic peptide produced by peptide splicing in the proteasome.
      ,
      • Warren E.H.
      • Vigneron N.J.
      • Gavin M.A.
      • Coulie P.G.
      • Stroobant V.
      • Dalet A.
      • Tykodi S.S.
      • Xuereb S.M.
      • Mito J.K.
      • Riddell S.R.
      • Van den Eynde B.J.
      An antigen produced by splicing of noncontiguous peptides in the reverse order.
      ,
      • Dalet A.
      • Robbins P.F.
      • Stroobant V.
      • Vigneron N.
      • Li Y.F.
      • El-Gamil M.
      • Hanada K.
      • Yang J.C.
      • Rosenberg S.A.
      • Van den Eynde B.J.
      An antigenic peptide produced by reverse splicing and double asparagine deamidation.
      ,
      • Michaux A.
      • Larrieu P.
      • Stroobant V.
      • Fonteneau J.F.
      • Jotereau F.
      • Van den Eynde B.J.
      • Moreau-Aubry A.
      • Vigneron N.
      A spliced antigenic peptide comprising a single spliced amino acid is produced in the proteasome by reverse splicing of a longer peptide fragment followed by trimming.
      ). Hence, these may be highly interesting therapeutic targets. However, the authors of (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ) noticed that unlike the nonspliced peptides, proteasome-spliced peptides (PSPs) had low HLA binding affinities and produced ambiguous binding motifs compared with normal HLA-Ip. HLA loading takes place after the peptides have exited the proteasome and entered the ER and hence lost the identity of their creation mechanism. Currently, there is no mechanism or biological process that could explain how the antigen processing and presentation machinery can prioritize loading of HLA-I molecules with low-affinity PSPs over high-affinity nonspliced peptides.
      Understanding the contribution of PSPs to the HLA ligandome is crucial, especially as they may indeed be highly interesting therapeutic targets in many diseases. Here we critically investigated PSPs reported in Liepe et al. (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ) and found that most of the spectra attributed to them could be assigned with higher scores to normal peptide sequences within the reviewed part of UniProt database (with no isoforms) of the human proteome. We further describe an alternative computational pipeline to estimate the contribution of spliced peptides to the immunopeptidome. Our results suggest that less than 2–6% of the HLA-Ip may be spliced. As opposed to the spliced peptides reported in (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ), these peptides fit well to the relevant HLA binding motifs.

      EXPERIMENTAL PROCEDURES

      HLA Ligandomic Data

      We selected previously published MS HLA-Ip datasets of exceptionally high coverage representing a variety of binding specificities (supplemental Table 1). MS raw files of HLA-Ip isolated from two melanoma tissues, Mel15 (16 raw files) and Mel16 (12 raw files) (
      • Bassani-Sternberg M.
      • Bräunlein E.
      • Klar R.
      • Engleitner T.
      • Sinitcyn P.
      • Audehm S.
      • Straub M.
      • Weber J.
      • Slotta-Huspenina J.
      • Specht K.
      • Martignoni M.E.
      • Werner A.
      • Hein R.
      • D H.B.
      • Peschel C.
      • Rad R.
      • Cox J.
      • Mann M.
      • Krackhardt A.M.
      Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry.
      ), RA957 B cell line (four raw files) (
      • Bassani-Sternberg M.
      • Chong C.
      • Guillaume P.
      • Solleder M.
      • Pak H.
      • Gannon P.O.
      • Kandalaft L.E.
      • Coukos G.
      • Gfeller D.
      Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.
      ), and fibroblast (Fib) cells (four raw files) (
      • Bassani-Sternberg M.
      • Pletscher-Frankild S.
      • Jensen L.J.
      • Mann M.
      Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation.
      ) were downloaded from the PRoteomics IDEntifications (PRIDE) repository (
      • Vizcaíno J.A.
      • Côté R.G.
      • Csordas A.
      • Dianes J.A.
      • Fabregat A.
      • Foster J.M.
      • Griss J.
      • Alpi E.
      • Birim M.
      • Contell J.
      • O'Kelly G.
      • Schoenegger A.
      • Ovelleiro D.
      • Pérez-Riverol Y.
      • Reisinger F.
      • Rios D.
      • Wang R.
      • Hermjakob H.
      The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013.
      ) dataset PXD004894, PXD005231 and PXD000394, respectively. One of the four raw MS files of the Fib cells (20130504_EXQ3_MiBa_SA_Fib-2.raw) was also used by Liepe et al. More details about these datasets can be found on PRIDE and the respective manuscripts.

      Data Processing

      If not otherwise mentioned, data were processed with the R statistical scripting language (version 3.3.2) (https://www.r-project.org/).

      Experimental Design and Statistical Rationale

      Identification of HLA-Ip Using PEAKS

      Raw files were analyzed with the de novo sequencing software PEAKS Studio 8.0 (
      • Ma B.
      • Zhang K.
      • Hendrie C.
      • Liang C.
      • Li M.
      • Doherty-Kirby A.
      • Lajoie G.
      PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry.
      ). General parameters were set to “Ion Source”: electrospray ionization (ESI; nanospay), “Fragmentation Mode”: high energy Collision-induced dissociation (CID) (y and b ions), “MS Scan Mode,” and “MS/MS Scan Mode”: Fourier-transform ion cyclotron resonance (FT-ICR)/Orbitrap. The different PEAKS modules were used in the following order with their default parameters while special parameters are indicated in parenthesis: 1) DATA REFINE; 2) PEAKS DENOVO (“Parent Mass Error Tolerance”: 10 ppm, “Fragment Mass Error Tolerance”: 0.02 Da, “Enzyme”: None); and 3) PEAKS DB (“Parent Mass Error Tolerance”: 10 ppm, “Fragment Mass Error Tolerance”: 0.02 Da, “Variable Modifications”: Oxidation (M) 15.99, “Database”: Homo_sapiens_UP000005640_9606).
      A table containing the five best scoring de novo sequences for every spectrum, named “all de novo candidates” was exported from PEAKS DENOVO. A table containing peptides with a match to human proteins from UniProt (Homo_sapiens_UP000005640_9606), named “DB search psm,” was exported from PEAKS DB. For this export, all peptides with -10LogP > 15 (FDR around 1.5%) were considered having a match. Subsequent filtering and annotation of the “all de novo candidates” table was done using the R statistical scripting language (version 3.3.2).
      All peptides in “all de novo candidates” with a corresponding match in “DB search psm” were annotated using a new column “database peptide.” If a spectrum had a matching database peptide, then all other peptides corresponding to this spectrum were removed from “all de novo candidates”. An additional column “de novo only” was added and unmatched sequences were marked with a “+.” Only peptides with a length between 8 and 25 AA and a minimum local confidence score over 80 for every AA position were kept. Peptides containing post-translational modifications (PTMs) were removed in order to simplify association of peptides with their corresponding HLA alleles (supplemental Data 1–4). For the subsequent analysis we merely kept peptides marked as “de novo only” and named them “de novo only peptides,” and the final table is made available in the supplemental material (supplemental Data 5).

      Identification of Possible Spliced Peptides Using TagPep

      We checked whether the filtered list of sequences from the “de novo only peptides”, which did not match any UniProt sequence, could be spliced fragments from UniProt (Homo_sapiens_UP000005640_9606, the reviewed part of UniProt, with no isoforms, including 21,026 entries downloaded in March 2017) proteins. We applied an in-house software tool TagPep, which uses the index strategy described for fetchGWI (
      • Iseli C.
      • Ambrosini G.
      • Bucher P.
      • Jongeneel C.V.
      Indexing strategies for rapid searches of short words in genome sequences.
      ) adapted for AAs instead of nucleotides. TagPep first matches the whole peptide sequence to the database. If there is no complete hit, it looks for hits allowing for one splicing event, where both spliced fragments are from the same protein (supplemental Data 1). We excluded trans-spliced peptides where the fragments stem from two different proteins for three reasons. First, all spliced peptides reported in (
      • Mishto M.
      • Liepe J.
      Post-translational peptide splicing and T cell responses.
      ) are concatenated fragments from the same protein. Second, the huge number of trans-spliced may lead to strongly increased false positive rates in subsequent bioinformatics analysis, and third, for trans-splicing to happen, the two source proteins need to be present in the same proteasome at the same time, which is unlikely to happen on a large scale (16). The spliced fragments can lie anywhere in the protein, but their sequences cannot overlap. Within a protein, TagPep prioritizes the spliced peptide with the smallest splicing gap and lists all possible splicing events matching different proteins. PEAKS de novo assigns the mass 113.08406 by default as leucine; however, TagPep alignment considered either leucine or isoleucine for possible matches. We named the resulting set of TagPep matched peptides as DeNovo_spliced, and these are the subset of potentially proteasome-spliced sequences. For each sample, we provide the PSMs from PEAKS, including the UniProt hits and the “de novo only peptides”, and we flagged the potential DeNovo_spliced peptides (supplemental Data 5).

      Confirmation of Identification of Spliced Peptides Using MaxQuant and Comet

      We employed the MaxQuant platform (
      • Cox J.
      • Mann M.
      MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.
      ) version 1.5.5.1 and the Comet software release 2016.01 (
      • Eng J.K.
      • Jahan T.A.
      • Hoopmann M.R.
      Comet: An open-source MS/MS sequence database search tool.
      ) to search the peak lists against a fasta file containing the UniProt database (Homo_sapiens_UP000005640_9606, the reviewed part of UniProt, with no isoforms, including 21,026 entries downloaded in March 2017) and a list of 247 frequently observed contaminants. For each sample, we added to the fasta file the list of DeNovo_spliced peptide sequence candidates. For Fib, we also added the 1,154 spliced peptides identified by Liepe et al. (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ), which we named LM_spliced. The list of spliced peptides was kindly provided to us by the authors of (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ) (supplemental Data 6, type = “psp”). Peptides with a length between 8 and 25 AA were allowed. MaxQuant parameters: The second peptide identification option in Andromeda was enabled. The enzyme specificity was set as unspecific. An FDR of 1% was required for peptides and no protein FDR was set. The initial allowed mass deviation of the precursor ion was set to 6 ppm, and the maximum fragment mass deviation was set to 20 ppm. For Fib, methionine oxidation (15.994915 Da) was set as variable modification; however, modified peptides were removed at first for the direct comparison to the Liepe's data. For the additional modification searches in all samples, methionine oxidation, N-terminal acetylation and glutamine/asparagine deamidation (+0.98402 Da) were set as variable modifications. Comet parameters: activation method: HCD, peptide mass tolerance: 0.02 Da, fragment mass tolerance: 0.02, fragments: b- and y-ions, precursor tolerance, and modifications were the same as in MaxQuant settings. The output files summarizing MaxQuant and Comet result files are provided as supplemental Data 7–16, and explanation of the column headers are provided in supplemental Table 2.

      LC-MS/MS Analyses and Identification of Selected Synthetic HLA-Ip

      Synthetic peptides (PEPotech Heavy grade 3, Thermo Fisher Scientific) (listed in supplemental Table 3) corresponding to peptides identified from Fib data were mixed and desalted on a C-18 spin column (Harvard Apparatus, 74–4101) and measured at a total amount of 10 and 20 pmol. Synthetic peptides were separated by a nanoflow HPLC (Proxeon Biosystems, Thermo Fisher Scientific, Odense) and coupled on-line to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Bremen) with a nanoelectrospray ion source (Proxeon Biosystems). We packed a 20 cm long, 75 μm inner diameter column with ReproSil-Pur C18-AQ 1.9 μm resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany) in buffer A (0.5% acetic acid). Peptides were eluted with a linear gradient of 2–30% buffer B (80% acetonitrile and 0.5% acetic acid) at a flow rate of 250 nl/min over 90 min. Data were acquired using a data-dependent 'top 10' method. We acquired full scan MS spectra at a resolution of 70,000 at 200 m/z with an auto gain control target value of 3e6 ions. The ten most abundant ions were sequentially isolated, activated by higher-energy collisional dissociation and accumulated to an auto gain control target value of 1e5 with a maximum injection time of 120 ms. In case of unassigned precursor ion charge states, or charge states of four and above, no fragmentation was performed. The peptide match option was disabled. MS/MS resolution was set to 17,500 at 200 m/z. Selected ions form fragmentation were dynamically excluded from further selection for 20 s. We employed the MaxQuant settings mentioned above for synthetic peptides identification. The raw files and MaxQuant output tables have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD010793.

      Comparison of MS/MS Annotations of Endogenous HLA-Ip and Their Synthetic Counterparts

      To investigate if spliced peptides match the MS/MS spectra better than possible alternative sequences we first compared the MS/MS scans identified by Liepe et al. as spliced peptides and those identified by MaxQuant. The mapping to the relevant MS scans and their Mascot ion scores were kindly provided to us by the authors of (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ) (supplemental Data 6). For the Fib data, we selected three MS/MS scans of LM_spliced peptides identified both by MaxQuant and by Liepe et al., and 21 MS/MS scans corresponding to LM_spliced peptides that MaxQuant identified instead as UniProt peptides. This selection was not biased and was not based on prior knowledge, which would favor MaxQuant. We synthesized the 21 pairs of peptide sequences and the three LM_spliced peptide sequences and analyzed them by MS as mentioned above. For visual inspection, we printed the endogenous and synthetic spectra to pdf files. For each of the 21 pairs, we calculated the similarity between the spectrum of the eluted peptide from Fib, annotated as LM_spliced and the spectrum of the synthetic LM_spliced peptide. We also calculated the similarity between the spectrum of the eluted peptide from Fib annotated as a UniProt peptide, and the spectrum of the synthetic UniProt peptide. Similarly, we calculated the similarity between the three spectra of the identically identified spliced sequences and the spectra of their synthetic counterparts. The similarity was computed by the cosine score (value between 0 and 1, where a value of 0 corresponds to spectra with no peaks in common and a value of 1 to identical spectra) (
      • Stein S.E.
      • Scott D.R.
      Optimization and testing of mass spectral library search algorithms for compound identification.
      ). The MzJava class library (
      • Horlacher O.
      • Nikitin F.
      • Alocci D.
      • Mariethoz J.
      • Müller M.
      • Lisacek F.
      MzJava: An open source library for mass spectrometry data processing.
      ) was used to read the .mgf spectrum files and to calculate the similarity.

      Binding Affinity Prediction and Clustering of Peptides

      We used the NetMHCpan (
      • Nielsen M.
      • Andreatta M.
      NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.
      ) to predict binding affinity of 8–14-mer peptides to the respective HLA alleles expressed in the sample and assigned them based on maximum affinity. Hits with a rank <2% were considered as binders. Gibbscluster-1.1 (
      • Andreatta M.
      • Alvarez B.
      • Nielsen M.
      GibbsCluster: Unsupervised clustering and alignment of peptide sequences.
      ) was run independently for each list of peptides identified from the different samples, with the default settings except that the number of clusters was tested between 1 and 6, a trash cluster was enabled and alignment was disabled (
      • Andreatta M.
      • Lund O.
      • Nielsen M.
      Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.
      ). The MixMHCp tool (http://mixmhcp.org/) was used to cluster the peptides with default settings (
      • Bassani-Sternberg M.
      • Chong C.
      • Guillaume P.
      • Solleder M.
      • Pak H.
      • Gannon P.O.
      • Kandalaft L.E.
      • Coukos G.
      • Gfeller D.
      Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.
      ,
      • Bassani-Sternberg M.
      • Gfeller D.
      Unsupervised HLA peptidome deconvolution improves ligand prediction accuracy and predicts cooperative effects in peptide-HLA interactions.
      ).

      DISCUSSION

      Examples of proteasomal-spliced peptides have been reported and some of them were shown to be immunogenic (
      • Ebstein F.
      • Textoris-Taube K.
      • Keller C.
      • Golnik R.
      • Vigneron N.
      • Van den Eynde B.J.
      • Schuler-Thurner B.
      • Schadendorf D.
      • Lorenz F.K.M.
      • Uckert W.
      • Urban S.
      • Lehmann A.
      • Albrecht-Koepke N.
      • Janek K.
      • Henklein P.
      • Niewienda A.
      • Kloetzel P.M.
      • Mishto M.
      Proteasomes generate spliced epitopes by two different mechanisms and as efficiently as non-spliced epitopes.
      ,
      • Hanada K.
      • Yewdell J.W.
      • Yang J.C.
      Immune recognition of a human renal cancer antigen through post-translational protein splicing.
      ,
      • Vigneron N.
      • Stroobant V.
      • Chapiro J.
      • Ooms A.
      • Degiovanni G.
      • Morel S.
      • van der Bruggen P.
      • Boon T.
      • Van den Eynde B.J.
      An antigenic peptide produced by peptide splicing in the proteasome.
      ,
      • Warren E.H.
      • Vigneron N.J.
      • Gavin M.A.
      • Coulie P.G.
      • Stroobant V.
      • Dalet A.
      • Tykodi S.S.
      • Xuereb S.M.
      • Mito J.K.
      • Riddell S.R.
      • Van den Eynde B.J.
      An antigen produced by splicing of noncontiguous peptides in the reverse order.
      ,
      • Dalet A.
      • Robbins P.F.
      • Stroobant V.
      • Vigneron N.
      • Li Y.F.
      • El-Gamil M.
      • Hanada K.
      • Yang J.C.
      • Rosenberg S.A.
      • Van den Eynde B.J.
      An antigenic peptide produced by reverse splicing and double asparagine deamidation.
      ,
      • Michaux A.
      • Larrieu P.
      • Stroobant V.
      • Fonteneau J.F.
      • Jotereau F.
      • Van den Eynde B.J.
      • Moreau-Aubry A.
      • Vigneron N.
      A spliced antigenic peptide comprising a single spliced amino acid is produced in the proteasome by reverse splicing of a longer peptide fragment followed by trimming.
      ,
      • Berkers C.R.
      • de Jong A.
      • Schuurman K.G.
      • Linnemann C.
      • Meiring H.D.
      • Janssen L.
      • Neefjes J.J.
      • Schumacher T.N.
      • Rodenko B.
      • Ovaa H.
      Definition of proteasomal peptide splicing rules for high-efficiency spliced peptide presentation by MHC class I molecules.
      ,
      • Berkers C.R.
      • de Jong A.
      • Schuurman K.G.
      • Linnemann C.
      • Geenevasen J.A.
      • Schumacher T.N.
      • Rodenko B.
      • Ovaa H.
      Peptide splicing in the proteasome creates a novel type of antigen with an isopeptide linkage.
      ,
      • Dalet A.
      • Vigneron N.
      • Stroobant V.
      • Hanada K.
      • Van den Eynde B.J.
      Splicing of distant peptide fragments occurs in the proteasome by transpeptidation and produces the spliced antigenic peptide derived from fibroblast growth factor-5.
      ,
      • Platteel A.C.M.
      • Liepe J.
      • Textoris-Taube K.
      • Keller C.
      • Henklein P.
      • Schalkwijk H.H.
      • Cardoso R.
      • Kloetzel P.M.
      • Mishto M.
      • Sijts A.J.A.M.
      Multi-level strategy for identifying proteasome-catalyzed spliced epitopes targeted by CD8(+) T cells during bacterial infection.
      ). A true estimation of the contribution of spliced peptides to the global immunopeptidome is critical in order to fundamentally understand the biological pathways involved. Consequently, advanced computational and experimental tools must be developed and benchmarked to facilitate their confident identification.
      Liepe et al. (
      • Liepe J.
      • Marino F.
      • Sidney J.
      • Jeko A.
      • Bunting D.E.
      • Sette A.
      • Kloetzel P.M.
      • Stumpf M.P.
      • Heck A.J.
      • Mishto M.
      A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
      ) were the first to attempt to find PSPs by means of MS on a large scale. Their report concluded that about 30% of the HLA-Ip are produced by proteasomal splicing by transpeptidation of two noncontiguous fragments of a parental protein (cis-splicing). Our reanalysis of their results revealed that their approach led to the identification of PSPs candidates that did not fit the consensus binding motifs, while the nonspliced UniProt HLA-Ip identified in the same experiment did. Additional parameters related to their MS/MS spectrum matches suggest that many of the spliced peptide matches reported in Liepe et al. are ambiguous and were ruled out when we used different search engines and included common PTMs or sequence variants in the search. We postulate that because of the huge search space of potential spliced peptide, the bioinformatics approach applied by Liepe et al. led to uncontrollable propagation of false positives. The effect of database size and the increased likelihood of false positive identifications in proteogenomics applications have been thoroughly reviewed in (
      • Nesvizhskii A.I.
      Proteogenomics: Concepts, applications and computational strategies.
      ), and these concepts are relevant here as well. Therefore, the true contribution of spliced peptides to the immunopeptidome has yet to be defined.
      In a typical peptidomics setting, we match MS/MS spectra against a large set of theoretical peptide spectra, most of which are not present in the sample. This endeavor produces two types of PSMs: true matches and false positives. False positives are very common especially for spliced peptides since these peptides can produce similar MS/MS spectra to UniProt peptides with similar match scores. For example, the potential spliced peptide KRI-PLPTKK only differs from its UniProt counterpart RIKPLPTKK by a permutation of the first three AA. Because of absence of fragmentation in the region of the first three AAs in this example, their order cannot be determined. Furthermore, when using a very large proteasomal spliced peptide database there is an elevated chance that a potential spliced peptide will have a very similar spectrum to the UniProt peptide and produce a higher match score. Even if a spectrum has no match in the UniProt database (e.g. when it originates from a modified peptide, sequence variant, or contaminant not considered in the search), it may still match a spliced peptide with a score that is significant.
      The error in the multiple testing of MS/MS searches is controlled by using decoy database in order to calculate the FDR (
      • Benjamini Y.
      • Hochberg Y.
      Controlling the false discovery rate: A practical and powerful approach to multiple testing.
      ,
      • Choi H.
      • Nesvizhskii A.I.
      False discovery rates and related statistical concepts in mass spectrometry-based proteomics.
      ). One assumption behind this target-decoy approach is that the scores of the decoy peptides reflect the scores of wrongly assigned PSMs. When using decoys for spliced peptides, their similarity with the UniProt sequences may be lost, and one would have to carefully evaluate whether the assumption mentioned above still holds. If it does not hold, the target-decoy approach might underestimate the FDR and lead to many false positives especially for large spliced peptide databases.
      Trans-splicing of fragments derived from two source proteins that happen to be present in the same proteasome complex at the same time, is unlikely to happen on a large scale, hence we focused our study on cis-splicing events. To overcome biases related to searching all possible cis-splice peptides, we developed an alternative workflow based on de novo sequencing and subsequent verification with multiple search tools including the most prevalent AA modifications and sequence variants detected by exome sequencing. We found that 1–3% of the high-quality PSMs originate from potential proteasome cis-spliced peptides. These peptides fitted the HLA consensus binding motifs and had good spectrum match properties. Given that our de novo sequencing approach finds about half of the peptides compared with a UniProt sequence search, we can say that the maximal amount of spliced peptide candidates is 2–6%. This doubling is just a very rough estimate and does not mean that the number of spliced peptides would double as well. Our approach focuses on the high quality spectra required for de novo sequencing. By including more low quality spectra, we would not only increase the number of spliced peptides, but also the ambiguity of the additional spliced peptide matches. However, MS/MS-based approaches cannot ultimately determine the creation mechanism of these peptides, and different sequence interpretations may also be possible. For example, a significant number of detected HLA-Ip originates from transcripts, which do not fall into a UniProt protein coding region (
      • Pearson H.
      • Daouda T.
      • Granados D.P.
      • Durette C.
      • Bonneil E.
      • Courcelles M.
      • Rodenbrock A.
      • Laverdure J.P.
      • Côté C.
      • Mader S.
      • Lemieux S.
      • Thibault P.
      • Perreault C.
      MHC class I-associated peptides derive from selective regions of the human genome.
      ), and these noncanonical peptides could be misinterpreted as PSPs. Other ambiguities may be due to post-translational or chemical peptide modifications not considered in the search. Therefore, we recommend to consider the most prominent chemical or posttranslational protein modifications in the MS/MS search. If these modifications are not known, open modification search tools (
      • Horlacher O.
      • Lisacek F.
      • Müller M.
      Mining large scale tandem mass spectrometry data for protein modifications using spectral libraries.
      ,
      • Kong A.T.
      • Leprevost F.V.
      • Avtonomov D.M.
      • Mellacheruvu D.
      • Nesvizhskii A.I.
      MSFragger: Ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics.
      ) could be applied. Overall, our results present an upper bound for the proportion of cis-spliced peptides, and the true contribution of such PSPs to the HLA-I ligandome may be much smaller. Extensive in vitro validation assays with purified proteasomes and using controlled cellular assays are required to assure that any of the proposed sequences are actually generated by splicing events in vivo.

      DATA AVAILABILITY

      MS raw files of HLA-Ip isolated from two melanoma tissues, Mel15 (16 raw files) and Mel16 (12 raw files) (10), RA957 B cell line (four raw files) (11), and fibroblast (Fib) cells (four raw files) (2), included in datasets PXD004894, PXD005231 and PXD000394, respectively, were downloaded from the Proteome-Xchange Consortium via the PRIDE partner repository (12). The raw files and MaxQuant output tables related to the analysis of synthetic peptides used for the comparison of MS/MS annotations of endogenous HLA-Ip and their synthetic counterparts have been deposited to the Proteome-Xchange Consortium via the PRIDE partner repository with the dataset identifier PXD010793.

      REFERENCES

        • Neefjes J.
        • Jongsma M.L.
        • Paul P.
        • Bakke O.
        Towards a systems understanding of MHC class I and MHC class II antigen presentation.
        Nature Rev. Immunol. 2011; 11: 823-836
        • Bassani-Sternberg M.
        • Pletscher-Frankild S.
        • Jensen L.J.
        • Mann M.
        Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation.
        Mol. Cell. Proteomics. 2015; 14: 658-673
        • Liepe J.
        • Marino F.
        • Sidney J.
        • Jeko A.
        • Bunting D.E.
        • Sette A.
        • Kloetzel P.M.
        • Stumpf M.P.
        • Heck A.J.
        • Mishto M.
        A large fraction of HLA class I ligands are proteasome-generated spliced peptides.
        Science. 2016; 354: 354-358
        • Ebstein F.
        • Textoris-Taube K.
        • Keller C.
        • Golnik R.
        • Vigneron N.
        • Van den Eynde B.J.
        • Schuler-Thurner B.
        • Schadendorf D.
        • Lorenz F.K.M.
        • Uckert W.
        • Urban S.
        • Lehmann A.
        • Albrecht-Koepke N.
        • Janek K.
        • Henklein P.
        • Niewienda A.
        • Kloetzel P.M.
        • Mishto M.
        Proteasomes generate spliced epitopes by two different mechanisms and as efficiently as non-spliced epitopes.
        Sci. Rep. 2016; 6: 24032
        • Hanada K.
        • Yewdell J.W.
        • Yang J.C.
        Immune recognition of a human renal cancer antigen through post-translational protein splicing.
        Nature. 2004; 427: 252-256
        • Vigneron N.
        • Stroobant V.
        • Chapiro J.
        • Ooms A.
        • Degiovanni G.
        • Morel S.
        • van der Bruggen P.
        • Boon T.
        • Van den Eynde B.J.
        An antigenic peptide produced by peptide splicing in the proteasome.
        Science. 2004; 304: 587-590
        • Warren E.H.
        • Vigneron N.J.
        • Gavin M.A.
        • Coulie P.G.
        • Stroobant V.
        • Dalet A.
        • Tykodi S.S.
        • Xuereb S.M.
        • Mito J.K.
        • Riddell S.R.
        • Van den Eynde B.J.
        An antigen produced by splicing of noncontiguous peptides in the reverse order.
        Science. 2006; 313: 1444-1447
        • Dalet A.
        • Robbins P.F.
        • Stroobant V.
        • Vigneron N.
        • Li Y.F.
        • El-Gamil M.
        • Hanada K.
        • Yang J.C.
        • Rosenberg S.A.
        • Van den Eynde B.J.
        An antigenic peptide produced by reverse splicing and double asparagine deamidation.
        Proc. Natl. Acad. Sci. U.S.A. 2011; 108: E323-E331
        • Michaux A.
        • Larrieu P.
        • Stroobant V.
        • Fonteneau J.F.
        • Jotereau F.
        • Van den Eynde B.J.
        • Moreau-Aubry A.
        • Vigneron N.
        A spliced antigenic peptide comprising a single spliced amino acid is produced in the proteasome by reverse splicing of a longer peptide fragment followed by trimming.
        J. Immunol. 2014; 192: 1962-1971
        • Bassani-Sternberg M.
        • Bräunlein E.
        • Klar R.
        • Engleitner T.
        • Sinitcyn P.
        • Audehm S.
        • Straub M.
        • Weber J.
        • Slotta-Huspenina J.
        • Specht K.
        • Martignoni M.E.
        • Werner A.
        • Hein R.
        • D H.B.
        • Peschel C.
        • Rad R.
        • Cox J.
        • Mann M.
        • Krackhardt A.M.
        Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry.
        Nature Commun. 2016; 7: 13404
        • Bassani-Sternberg M.
        • Chong C.
        • Guillaume P.
        • Solleder M.
        • Pak H.
        • Gannon P.O.
        • Kandalaft L.E.
        • Coukos G.
        • Gfeller D.
        Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity.
        PLoS Computat. Biol. 2017; 13: e1005725
        • Vizcaíno J.A.
        • Côté R.G.
        • Csordas A.
        • Dianes J.A.
        • Fabregat A.
        • Foster J.M.
        • Griss J.
        • Alpi E.
        • Birim M.
        • Contell J.
        • O'Kelly G.
        • Schoenegger A.
        • Ovelleiro D.
        • Pérez-Riverol Y.
        • Reisinger F.
        • Rios D.
        • Wang R.
        • Hermjakob H.
        The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013.
        Nucleic Acids Res. 2013; 41: D1063-D1069
        • Ma B.
        • Zhang K.
        • Hendrie C.
        • Liang C.
        • Li M.
        • Doherty-Kirby A.
        • Lajoie G.
        PEAKS: Powerful software for peptide de novo sequencing by tandem mass spectrometry.
        Rapid Commun. Mass Spectrom. 2003; 17: 2337-2342
        • Iseli C.
        • Ambrosini G.
        • Bucher P.
        • Jongeneel C.V.
        Indexing strategies for rapid searches of short words in genome sequences.
        PloS One. 2007; 2: e579
        • Mishto M.
        • Liepe J.
        Post-translational peptide splicing and T cell responses.
        Trends Immunol. 2017; 38: 904-915
        • Berkers C.R.
        • de Jong A.
        • Schuurman K.G.
        • Linnemann C.
        • Meiring H.D.
        • Janssen L.
        • Neefjes J.J.
        • Schumacher T.N.
        • Rodenko B.
        • Ovaa H.
        Definition of proteasomal peptide splicing rules for high-efficiency spliced peptide presentation by MHC class I molecules.
        J. Immunol. 2015; 195: 4085-4095
        • Cox J.
        • Mann M.
        MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.
        Nature Biotechnol. 2008; 26: 1367-1372
        • Eng J.K.
        • Jahan T.A.
        • Hoopmann M.R.
        Comet: An open-source MS/MS sequence database search tool.
        Proteomics. 2013; 13: 22-24
        • Stein S.E.
        • Scott D.R.
        Optimization and testing of mass spectral library search algorithms for compound identification.
        J. Am. Soc. Mass Spectrom. 1994; 5: 859-866
        • Horlacher O.
        • Nikitin F.
        • Alocci D.
        • Mariethoz J.
        • Müller M.
        • Lisacek F.
        MzJava: An open source library for mass spectrometry data processing.
        J. Proteomics. 2015; 129: 63-70
        • Nielsen M.
        • Andreatta M.
        NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets.
        Genome Med. 2016; 8: 33
        • Andreatta M.
        • Alvarez B.
        • Nielsen M.
        GibbsCluster: Unsupervised clustering and alignment of peptide sequences.
        Nucleic Acids Res. 2017; 45: W458-W463
        • Andreatta M.
        • Lund O.
        • Nielsen M.
        Simultaneous alignment and clustering of peptide data using a Gibbs sampling approach.
        Bioinformatics. 2013; 29: 8-14
        • Bassani-Sternberg M.
        • Gfeller D.
        Unsupervised HLA peptidome deconvolution improves ligand prediction accuracy and predicts cooperative effects in peptide-HLA interactions.
        J. Immunol. 2016; 197: 2492-2499
        • Ma B.
        • Johnson R.
        De novo sequencing and homology searching.
        Mol. Cell. Proteomics. 2012; 11O111.014902
        • Schaefer H.
        • Chamrad D.C.
        • Marcus K.
        • Reidegeld K.A.
        • Blüggel M.
        • Meyer H.E.
        Tryptic transpeptidation products observed in proteome analysis by liquid chromatography-tandem mass spectrometry.
        Proteomics. 2005; 5: 846-852
        • Berkers C.R.
        • de Jong A.
        • Schuurman K.G.
        • Linnemann C.
        • Geenevasen J.A.
        • Schumacher T.N.
        • Rodenko B.
        • Ovaa H.
        Peptide splicing in the proteasome creates a novel type of antigen with an isopeptide linkage.
        J. Immunol. 2015; 195: 4075-4084
        • Dalet A.
        • Vigneron N.
        • Stroobant V.
        • Hanada K.
        • Van den Eynde B.J.
        Splicing of distant peptide fragments occurs in the proteasome by transpeptidation and produces the spliced antigenic peptide derived from fibroblast growth factor-5.
        J. Immunol. 2010; 184: 3016-3024
        • Platteel A.C.M.
        • Liepe J.
        • Textoris-Taube K.
        • Keller C.
        • Henklein P.
        • Schalkwijk H.H.
        • Cardoso R.
        • Kloetzel P.M.
        • Mishto M.
        • Sijts A.J.A.M.
        Multi-level strategy for identifying proteasome-catalyzed spliced epitopes targeted by CD8(+) T cells during bacterial infection.
        Cell Rep. 2017; 20: 1242-1253
        • Nesvizhskii A.I.
        Proteogenomics: Concepts, applications and computational strategies.
        Nat. Methods. 2014; 11: 1114-1125
        • Benjamini Y.
        • Hochberg Y.
        Controlling the false discovery rate: A practical and powerful approach to multiple testing.
        J. Royal Stat. Soc. Series B. 1995; 57: 289-300
        • Choi H.
        • Nesvizhskii A.I.
        False discovery rates and related statistical concepts in mass spectrometry-based proteomics.
        J. Proteome Res. 2008; 7: 47-50
        • Pearson H.
        • Daouda T.
        • Granados D.P.
        • Durette C.
        • Bonneil E.
        • Courcelles M.
        • Rodenbrock A.
        • Laverdure J.P.
        • Côté C.
        • Mader S.
        • Lemieux S.
        • Thibault P.
        • Perreault C.
        MHC class I-associated peptides derive from selective regions of the human genome.
        J. Clin. Invest. 2016; 126: 4690-4701
        • Horlacher O.
        • Lisacek F.
        • Müller M.
        Mining large scale tandem mass spectrometry data for protein modifications using spectral libraries.
        J. Proteome Res. 2016; 15: 721-731
        • Kong A.T.
        • Leprevost F.V.
        • Avtonomov D.M.
        • Mellacheruvu D.
        • Nesvizhskii A.I.
        MSFragger: Ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics.
        Nat. Methods. 2017; 14: 513-520