Advertisement

Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra*

  • Alicia L. Richards
    Affiliations
    Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706;

    Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
    Search for articles by this author
  • Catherine E. Vincent
    Affiliations
    Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706;

    Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
    Search for articles by this author
  • Adrian Guthals
    Affiliations
    Department of Computer Science and Engineering, University of California, San Diego, California 92093;
    Search for articles by this author
  • Christopher M. Rose
    Affiliations
    Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706;

    Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
    Search for articles by this author
  • Michael S. Westphall
    Affiliations
    Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;
    Search for articles by this author
  • Nuno Bandeira
    Affiliations
    Department of Computer Science and Engineering, University of California, San Diego, California 92093;

    Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, California 92093;
    Search for articles by this author
  • Joshua J. Coon
    Correspondence
    To whom correspondence should be addressed: Dr. Joshua J. Coon, 425 Henry Mall, Madison, WI 53706, Tel.:608-263-1718;
    Affiliations
    Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706;

    Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706;

    Department of Biomolecular Chemistry, University of Wisconsin, Madison, Wisconsin 53706
    Search for articles by this author
  • Author Footnotes
    * This work was supported by National Institutes of Health Grant GM080148 to J.J.C. A.L.R. was supported by an NHGRI training grant to the Genomic Sciences Training Program (5T32HG002760). C.E.V. was supported by an NLM training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM T15LM007359). C.M.R. was funded by an NSF Graduate Research Fellowship and NIH Traineeship (T32GM008505).
    This article contains supplemental material.
      We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two NeuCode labeling isotopologues of lysine, 13C615N2 and 2H8, which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ∼50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the NeuCode doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo+ resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification.
      The ability to make de novo sequence identifications directly from tandem mass spectra has long been a holy grail of the proteomic community. Such a capability would wean the field from its reliance upon sequenced genome databases. Even for organisms with fully annotated genomes, events such as single nucleotide polymorphisms, alternative splicing, gene fusion, and a host of other genomic transformations can result in altered proteomes. These alterations can vary from cell to cell and individual to individual. Thus, one could argue that the most valuable proteomic information, the individual and cellular proteome variation from the genome, remains elusive (
      • Seidler J.
      • Zinn N.
      • Boehm M.E.
      • Lehmann W.D.
      De novo sequencing of peptides by MS/MS.
      ). This problem has received considerable attention; that said, it is not easy to de novo correlate spectrum to sequence in a large-scale, automated fashion (
      • Liska A.J.
      • Shevchenko A.
      Combining mass spectrometry with database interrogation strategies in proteomics.
      ,
      • Mo L.
      • Dutta D.
      • Wan Y.
      • Chen T.
      MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry.
      ,
      • Pevtsov S.
      • Fedulova I.
      • Mirzaei H.
      • Buck C.
      • Zhang X.
      Performance evaluation of existing de novo sequencing algorithms.
      ,
      • Pitzer E.
      • Masselot A.
      • Colinge J.
      Assessing peptide de novo sequencing algorithms performance on large and diverse data sets.
      ,
      • Taylor J.A.
      • Johnson R.S.
      Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry.
      ). Improvements in mass accuracy have helped, but routine, reliable de novo sequencing without database assistance is not standard (
      • Horn D.M.
      • Zubarev R.A.
      • McLafferty F.W.
      Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry.
      ,
      • Frank A.M.
      • Savitski M.M.
      • Nielsen M.L.
      • Zubarev R.A.
      • Pevzner P.A.
      De novo peptide sequencing and identification with precision mass spectrometry.
      ,
      • Chongle P.
      • Park B.H.
      • McDonald W.H.
      • Carey P.A.
      • Banfield J.F.
      • VerBerkmoes N.C.
      • Hettich R.L.
      • Samatova N.F.
      A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry.
      ,
      • Chi H.
      • Sun R.X.
      • Yang B.
      • Song C.Q.
      • Wang L.H.
      • Liu C.
      • Fu Y.
      • Yuan Z.F.
      • Wang H.P.
      • He S.M.
      • Dong M.Q.
      pNovo: de novo peptide sequencing and identification using HCD spectra.
      ).
      A primary means to facilitate de novo spectral interpretation is the simple annotation of m/z peaks in tandem mass spectra as either N- or C-terminal. We and others have investigated this seemingly simple first step. Real-world spectra, however, are complex. Difficulties often arise in determining the charge state of the fragment or in differentiating between fragment ions and peaks arising from neutral loss, internal fragmentation, or spectral noise, both electronic and chemical. Several strategies have focused on product ion annotation. These approaches have included manipulation of the N-terminus basicity combined with electron transfer dissociation (ETD)
      The abbreviations used are:
      ETD
      electron transfer dissociation
      FDR
      false discovery rate
      HCD
      high-energy collision dissociation
      MS
      mass spectrometry
      SILAC
      stable isotope labeling with amino acids in cell culture.
      1The abbreviations used are:ETD
      electron transfer dissociation
      FDR
      false discovery rate
      HCD
      high-energy collision dissociation
      MS
      mass spectrometry
      SILAC
      stable isotope labeling with amino acids in cell culture.
      (
      • Taouatas N.
      • Drugan M.M.
      • Heck A.J.R.
      • Mohammed S.
      Straightforward ladder sequencing of peptides using a Lys-N metalloendopeptidase.
      ,
      • Altelaar A.F.
      • Navarro D.
      • Boekhorst J.
      • van Breukelen B.
      • Snel B.
      • Mohammed S.
      • Heck A.J.
      Database independent proteomics analysis of the ostrich and human proteome.
      ,
      • van Breukelen B.
      • Georgiou A.
      • Drugan M.M.
      • Taouatas N.
      • Mohammed S.
      • Heck A.J.
      LysNDeNovo: an algorithm enabling de novo sequencing of Lys-N generated peptides fragmented by electron transfer dissociation.
      ). This approach can yield mostly N-terminal fragments for peptides having only two charges. However, it requires both ETD and the protease LysN. Other methods have used differential labeling of N- and C-terminal peptides to shift either one or the other product ion series, by either metabolic or chemical means (
      • Gu S.
      • Pan S.Q.
      • Bradbury E.M.
      • Chen X.
      Use of deuterium-labeled lysine for efficient protein identification and peptide de novo sequencing.
      ,
      • Noga M.J.
      • Asperger A.
      • Silberring J.
      N-terminal H-3/D-3-acetylation for improved high-throughput peptide sequencing by matrix-assisted laser desorption/ionization mass spectrometry with a time-of-flight/time-of-flight analyzer.
      ,
      • Hennrich M.L.
      • Mohammed S.
      • Altelaar A.F.M.
      • Heck A.J.R.
      Dimethyl isotope labeling assisted de novo peptide sequencing.
      ,
      • Munchbach M.
      • Quadroni M.
      • Miotto G.
      • James P.
      Quantitation and facilitated de novo sequencing of proteins by isotopic N-terminal labeling of peptides with a fragmentation-directing moiety.
      ,
      • Madsen J.A.
      • Brodbelt J.S.
      Simplifying fragmentation patterns of multiply charged peptides by N-terminal derivatization and electron transfer collision activated dissociation.
      ). Metabolic incorporation of amino acids is an efficient method of introducing distinctive labels that eliminates in vitro labeling, but this method requires that the sample be amenable to cell culture (
      • Oda Y.
      • Huang K.
      • Cross F.R.
      • Cowburn D.
      • Chait B.T.
      Accurate quantitation of protein expression and site-specific phosphorylation.
      ,
      • Ong S.E.
      • Blagoev B.
      • Kratchmarova I.
      • Kristensen D.B.
      • Steen H.
      • Pandey A.
      • Mann M.
      Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
      ). Additionally, it may be difficult to achieve complete labeling in complex systems. Several other approaches used to introduce heavy isotopes onto one terminus have been investigated, including trypsin digestion in 18O water (
      • Schnolzer M.
      • Jedrzejewski P.
      • Lehmann W.D.
      Protease-catalyzed incorporation of O-18 into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry.
      ,
      • Shevchenko A.
      • Chernushevich I.
      • Ens W.
      • Standing K.G.
      • Thomson B.
      • Wilm M.
      • Mann M.
      Rapid “de novo” peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer.
      ,
      • Uttenweiler-Joseph S.
      • Neubauer G.
      • Christoforidis A.
      • Zerial M.
      • Wilm M.
      Automated de novo sequencing of proteins using the differential scanning technique.
      ), differential isotopic esterification (
      • Goodlett D.R.
      • Keller A.
      • Watts J.D.
      • Newitt R.
      • Yi E.C.
      • Purvine S.
      • Eng J.K.
      • von Haller P.
      • Aebersold R.
      • Kolker E.
      Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation.
      ,
      • Goodlett D.R.
      • Yi E.C.
      Stable isotopic labeling and mass spectrometry as a means to determine differences in protein expression.
      ), derivatization of the C-terminal carboxylate by p-bromophenethylamine (
      • Frank A.M.
      • Savitski M.M.
      • Nielsen M.L.
      • Zubarev R.A.
      • Pevzner P.A.
      De novo peptide sequencing and identification with precision mass spectrometry.
      ,
      • Kim J.-S.
      • Shin M.
      • Song J.-S.
      • An S.
      • Kim H.-J.
      C-terminal de novo sequencing of peptides using oxazolone-based derivatization with bromine signature.
      ), N-terminal derivatization with sulfonic acid groups (
      • Keough T.
      • Youngquist R.S.
      • Lacey M.P.
      Sulfonic acid derivatives for peptide sequencing.
      ,
      • Lee Y.H.
      • Han H.
      • Chang S.B.
      • Lee S.W.
      Isotope-coded N-terminal sulfonation of peptides allows quantitative proteomic analysis with increased de novo peptide sequencing capability.
      ), and formaldehyde labeling via reductive amination (
      • Hsu J.L.
      • Huang S.Y.
      • Chow N.H.
      • Chen S.H.
      Stable-isotope dimethyl labeling for quantitative proteomics.
      ,
      • Hsu J.L.
      • Huang S.Y.
      • Chow N.H.
      • Chen S.H.
      Stable-isotope dimethyl labeling for quantitative proteomics.
      ,
      • Ji C.
      • Li L.
      Quantitative proteome analysis using differential stable isotopic labeling and microbore LC-MALDI MS and MS/MS.
      ). These chemical modifications are introduced after cell lysis, often immediately prior to analysis. Although chemical labeling strategies can be used with a variety of samples, difficulties can arise from differences in labeling efficiency between samples, and often a clean-up step is required following labeling, which may lead to sample loss. No matter the labeling method, in this regime, the two precursors must be separately isolated, fragmented, and analyzed either together or separately. The recognition and selection of the broadly spaced doublet in real time also are necessary. These requirements have limited the utility of these approaches. Our own laboratory discovered that the c- and z-type product ions generated from either electron capture dissociation or ETD have distinct chemical formulae and therefore can always be distinguished based on accurate mass alone (
      • Hubler S.L.
      • Jue A.
      • Keith J.
      • McAlister G.C.
      • Craciun G.
      • Coon J.J.
      Valence parity renders z(·)-type ions chemically distinct.
      ). The problem with this approach is that extremely high mass accuracy (<500 ppb) is required in order to distinguish these product ion types above ∼600 Da in mass. Thus, the majority of the product ions within a spectrum cannot be readily mapped to either terminus with high confidence.
      Despite these difficulties, we assert that robust de novo sequencing methodology would benefit greatly from a simple method that could be used to distinguish N- and C-terminal product ions with high accuracy and precision. Ideally, the approach would work regardless of the choice of proteolytic enzyme or dissociation method. Recently, we described a new technology for protein quantification called neutron encoding (NeuCode) (
      • Hebert A.S.
      • Merrill A.E.
      • Bailey D.J.
      • Still A.J.
      • Westphall M.S.
      • Strieter E.R.
      • Pagliarini D.J.
      • Coon J.J.
      Neutron-encoded mass signatures for multiplexed proteome quantification.
      ). NeuCode embeds millidalton mass differences into peptides and proteins by exploiting the mass defect induced by differences in the nuclear binding energies of the various stable isotopes of common elements such as C, N, H, and O. For example, consider the amino acid lysine, which has eight additional neutrons (+8 Da). One way to synthesize this amino acid is to add six 13C atoms and two 15N atoms (+8.0142 Da). Another isotopologue could be constructed by adding eight 2H atoms (+8.0502). These two isotopologues differ by only 36 mDa; peptide precursors containing both of these amino acids would appear as a single, unresolved precursor m/z peak at a mass resolving power of less than ∼100,000. However, under high resolving powers (i.e. greater than ∼100,000 at m/z 400), this doublet is resolved. We first developed this NeuCode concept in the context of metabolic labeling, akin to stable isotope labeling with amino acids in cell culture (SILAC), except that instead of the precursor partners being separated by 4 to 8 Da, they are separated by only 6 to 40 mDa. For quantitative purposes, NeuCode promises to deliver ultraplexed SILAC (>12) without increasing spectral complexity.
      We reasoned that the isotopologues of Lys that permit NeuCode SILAC would generate a distinct fingerprint on C-terminal product ions. Specifically, peptides that have been labeled with NeuCode SILAC and digested with LysC uniformly contain Lys at the C terminus. Upon MS/MS, all C-terminal product ions should present as doublets (with duplex NeuCode), whereas N-terminal products will be detected as a single m/z peak. The very close m/z spacing of the NeuCode SILAC partners will ensure that each partner is always co-isolated and that the signatures are visible only upon high-resolving-power mass analysis. Here we investigate the combination of NeuCode SILAC and high-resolving-power MS/MS analysis to allow the straightforward identification of C-terminal product ions.

       Sample Preparation

      Saccharomyces cerevisiae strain BY4741 Lys1Δ was grown in defined synthetic complete (SC, Sunrise Science, San Diego, CA) drop-out media with either heavy 6C13/2N15 lysine (+8.0142 Da, Cambridge Isotopes, Tewksbury, MA), or heavy 8D (+8.0502 Da, Cambridge Isotopes). Cells were propagated to a minimum of 10 doublings. At mid-log phase, cells were harvested via centrifugation at 3,000 × g for 3 min and then washed three times with chilled double distilled H2O. Cell pellets were resuspended in 5 ml lysis buffer (50 mm Tris pH 8, 8 m urea, 75 mm sodium chloride, 100 mm sodium butyrate, 1 mm sodium orthovanadate, protease and phosphatase inhibitor tablet), and protein was extracted via glass bead milling (Retsch, Haan, Germany). Protein concentration was measured via BCA (Pierce). Cysteines in the yeast lysate were reduced with 5 m dithiothreitol at ambient temperature for 30 min, alkylated with 15 mm iodoacetamide in the dark at ambient temperature for 30 min, and then quenched with 5 mm dithiothreitol. 50 mm tris (pH 8.0) was used to dilute the urea concentration to 4 m. Proteins were digested with LysC (1:50 enzyme:protein ratio) at ambient temperature for 16 h. The digestion was quenched with TFA and desalted with a tC18 Sep-Pak (Waters, Etten-Leur, The Netherlands). Samples were prepared by mixing 6C13/2N15 (+8.0412 Da) and 8D (+8.0502 Da) labeled peptides in 1:1 ratios by mass. For strong cation exchange fractionation, peptides were dissolved in 400 μl of strong cation exchange buffer A (5 mm KH2PO4 and 30% acetonitrile; pH 2.65) and injected onto a polysulfoethylaspartamide column (9.4 mm × 200 mm; PolyLC) attached to a Surveyor LC quarternary pump (Thermo Electron, West Chester, PA) operating at 3 ml/min. Peptides were detected by photodiode array detector (Thermo Electron, West Chester, PA). Fractions were collected every 2 min starting at 10 min into the following gradient: 0–2 min at 100% buffer A, 2–5 min at 0%–15% buffer B (5 mm KH2PO4, 30% acetonitrile, and 350 mm KCl (pH 2.65)), and 5–35 min at 15%–100% buffer B. Buffer B was held at 100% for 10 min. Finally, the column was washed with buffer C (50 mm KH2PO4 and 500 mm KCl (pH 7.5)) and water before recalibration. Fractions were collected by hand every 2 to 3 min starting at 10 min into the gradient and were lyophilized and desalted with a tC18 Sep-Pak (Waters).

       LC-MS/MS

      Samples were loaded onto a 15-cm-long, 75-μm capillary column packed with 5 μm Magic C18 (Michrom, Auburn, CA) particles in mobile phase A (0.2% formic acid in water). Peptides were eluted with mobile phase B (0.2% formic acid in acetonitrile) over a 120-min gradient at a flow rate of 300 nl/min. Eluted peptides were analyzed by an Orbitrap Elite mass spectrometer. For the nonfractionated samples, mass spectrometer instrument methods comprised one MS1 scan followed by data-dependent MS2 scans of the five most intense precursors. A survey MS1 scan was performed by the Orbitrap at 30,000 resolving power to identify precursors to sample for tandem mass spectrometry, and this was followed by an additional MS1 scan at 480,000 resolving power (at m/z 400; actual mass resolving power of 470,700). Data-dependent tandem mass spectrometry was performed via beam-type collisional activated dissociation (HCD) in the Orbitrap at a resolving power of 15,000, 60,000, 120,000, or 240,000 and a collision energy of 30. Preview mode was enabled, and precursors of unknown charge or with a charge of +1 were excluded from MS2 sampling. For experiments comparing the duty cycle and resolving power required in order to distinguish y-ion doublets, MS1 and MS2 target ion accumulation values were set to 5 × 105 and 5 × 104, respectively. For all other experiments, MS1 target accumulation values were set to 1 × 106 and MS2 accumulation values were set to 4 × 105. Dynamic exclusion was set to 30 s for −0.55 m/z and +2.55 m/z of selected precursors. For ETD analysis, data-dependent top-five mass spectrometry was performed at a resolving power of 240,000 (m/z 400; actual MS2 mass resolving power of 271,000) (
      • McAlister G.C.
      • Phanstiel D.
      • Good D.M.
      • Berggren W.T.
      • Coon J.J.
      Implementation of electron-transfer dissociation on a hybrid linear ion trap-orbitrap mass spectrometer.
      ). ETD accumulation values were set to 1 × 106 for MS1 target accumulation and 4 × 105 for MS2 target accumulation. The fluoranthene reaction time was set to 100 ms. For the high-pH strong cation exchange fractions, data-dependent tandem mass spectrometry was performed via HCD at a resolving power of either 60,000 or 120,000 and a collision energy of 30. Preview mode was enabled, and precursors of unknown charge or with a charge of +1 were excluded from MS2 sampling. MS1 targets were set to 1 × 106, and MS2 accumulation values were set to 4 × 105. Dynamic exclusion was set to 45 s for −0.55 m/z and +2.55 m/z of selected precursors. Analysis by use of a wide isolation window was performed on an Orbitrap Fusion. MS1 analysis was performed at 450,000 resolving power (m/z 200), and MS2 analysis was performed at 120,000 resolving power (m/z 400). Data-dependent top-N mass spectrometry was performed, with precursors selected from sequential 25-Da windows. HCD was performed twice on the same precursor, first by use of a quadrupole isolation width of 0.7 m/z for peptide identification, and then using 25 m/z quadrupole isolation. Fragment ions were analyzed in the Orbitrap at a mass resolving power of 120,000 (m/z 400). MS1 and MS2 target accumulation values were set to 2 × 105 and 5 × 104, respectively.

       Data Analysis

      Thermo.raw files were converted to searchable DTA text files using the Coon OMSSA Proteomic Analysis Software Suite (COMPASS) (
      • Wenger C.D.
      • Phanstiel D.H.
      • Lee M.V.
      • Bailey D.J.
      • Coon J.J.
      COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA.
      ). DTA files containing exclusively y-ions were generated using an in-house algorithm. DTA files were searched against the UniProt yeast database (version 132) with Lys-C specificity using the Open Mass Spectrometry Search Algorithm (OMSSA), version 2.1.9 (
      • Geer L.Y.
      • Markey S.P.
      • Kowalak J.A.
      • Wagner L.
      • Xu M.
      • Maynard D.M.
      • Yang X.
      • Shi W.
      • Bryant S.H.
      Open mass spectrometry search algorithm.
      ). Methionine oxidation was searched as a variable modification. Cysteine carbamidomethylation and the mass shift imparted by the lysine isotopolgues were searched as fixed modifications. For MS2 scans performed at a resolving power of 60,000, 120,000, or 240,000, a shift of +8.0142, representing the mass shift of the 13C615N2 isotopologue, was searched. For MS2 scans performed at 15,000 resolving power, the average shift of the 13C615N2 and 8H2 isotopologues (+8.0322) was searched. For all analyses, the precursor mass was obtained from the 480,000 MS scan. The precursor mass tolerance was defined as 50 ppm, and the fragment ion mass tolerance was set to 0.01 Da. A histogram of precursor mass error at different search tolerances is presented in supplemental Fig. S1. Using the COMPASS software suite, obtained search results were filtered to 1% FDR based on E-values. y-ion doublets were extracted from raw files using an in-house algorithm explained in the supplemental information. Briefly, an ensemble of three different machine learning models was used to score each MS/MS spectral peak for C-terminal product ion prediction. To train our ensemble learner to correctly distinguish C-terminal product ion peaks from N-terminal product ion peaks and noise peaks within our experimental MS/MS spectra, we generated a representative training set of spectral data. Instances used for training and test sets were peaks acquired only from MS/MS spectra associated with a peptide identification. Peaks with a signal-to-noise value of less than 5 were not used. Feature information for each training/testing instance was extracted from raw spectral data. Seven MS/MS spectral features were selected to generate training and test set data: (1) “has doublet” (evaluated as “true” only if a spectral peak could be found at the predicted m/z of the peak's “heavy” partner), (2) “signal-to-noise” (discretized using a scale of 1–5 based on the peak's signal-to-noise value), (3) “is isotope,” (4) “is neutral loss,” (5) “number of isotopes,” (6) “number of doublet isotopes,” and (7) “has neutral loss.”
      To evaluate NeuCode SILAC labeling for automated de novo sequencing, PepNovo+ (8) was benchmarked on y-ion predicted spectra. First, a set of identified spectra from 13,832 unique peptides (>7,400 per precursor charge 2–3) was produced to train PepNovo+ so it could learn features such as the relative peak height ranks of b/y-ions and the probability of noise at each mass interval. These training spectra were acquired under the 11 NeuCode yeast strong cation exchange fractions prepared as described above. Thermo raw files were converted into mzXML format using ProteoWizard v2.2.2828 (with peak-picking turned on) and identified by MS-GF+ v9358 (
      • Kim S.
      • Mischerikow N.
      • Bandeira N.
      • Navarro J.D.
      • Wich L.
      • Mohammed S.
      • Heck A.J.
      • Pevzner P.A.
      The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.
      ) at a 1% spectrum-level FDR against the UniProt yeast database (plus isoforms), v20110729. A fixed modification of K+8.0142 was imposed along with variable modifications of oxidized Met and deamidated Asn/Gln. All MS/MS scans were searched with a 50-ppm precursor mass tolerance, the high-accuracy LTQ instrument setting, the HCD fragmentation setting, and one allowed missed Lys-C cleavage.
      Thermo.raw files were also converted into DTA spectra as before, except the in-house algorithm for selecting y-ion doublets was slightly altered to boost the peak height of predicted y-ions above that of other peaks (the cumulative peak height was equal to the sum of the monoisotopic doublet peaks, all isotopic doublet peaks, and two times the peak height of the base peak) and to convert their m/z to charge one. Remaining peaks not predicted to be y-ions were converted to charge one by a previously described MS/MS deconvolution tool (
      • Guthals A.
      • Clauser K.R.
      • Bandeira N.
      Shotgun protein sequencing with meta-contig assembly.
      ). Deconvoluted DTA spectra that originated from identified MS/MS scans were then paired with the MSGF+ peptide IDs and passed to PepNovo+ for training. The resulting PepNovo+ scoring model lacked the rank-boosting component (
      • Frank A.M.
      A ranking-based scoring function for peptide-spectrum matches.
      ), which requires identified spectra from >100,000 unique peptides per precursor charge state and extensive modification of the PepNovo+ source code to train. Still, the model was sufficient to perform de novo peptide sequencing on the y-ion predicted spectra. PepNovo+ was also run on the raw MS/MS scans (mzXML spectra converted to MGF with all MS/MS peaks converted to charge one) by use of a previously trained HCD scoring model that also lacks the rank-boosting component (
      • Guthals A.
      • Clauser K.R.
      • Frank A.M.
      • Bandeira N.
      Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides.
      ). The following PepNovo+ parameters were set at all stages of training and benchmarking: fixed modification of K+8.0142; variable modifications of oxidized Met and deamidated Asn; 0.01-Da fragment mass tolerance; use of spectrum precursor charge; and use of spectrum precursor m/z.

      RESULTS

       Theoretical Considerations

      To test our hypothesis—that NeuCode SILAC labeling will permit the identification of C-terminal product ions—we sought to determine the MS/MS resolving power requirements. First, we examined the charge (z), mass (m), and m/z of all detected y-type ions from a library of 19,521 y-type ions extracted from 2,392 tandem mass spectra. We then determined the percentage of y-type ions that are resolved (full width at half-maximum) when labeled with lysine isotopologues differing by 36 mDa (Fig. 1). This calculation (see the supplementary information) takes into account the diversity of product ions m, z, and m/z that is typically observed in a shotgun experiment. These data demonstrate that at a resolving power of 60,000, ∼90% of detected y-type ions were resolvable, and hence identifiable. At resolving powers above 100,000, virtually all NeuCode product ion doublets were detectable. Today it is routine to collect MS/MS spectra at resolving powers between 15,000 and 30,000 using both TOF and Orbitrap mass analyzers. Cutting-edge TOF analyzers are capable of 60,000 resolving power (
      • Satoh T.
      • Sato T.
      • Tamura J.
      Development of a high-performance MALDI-TOF mass spectrometer utilizing a spiral ion trajectory.
      ,
      • Klitzke C.F.
      • Corilo Y.E.
      • Siek K.
      • Binkley J.
      • Patrick J.
      • Patrick J.
      • Eberlin M.N.
      Petroleomics by ultrahigh-resolution time-of-flight mass spectrometry.
      ). Orbitrap systems can achieve resolving powers in excess of one million (
      • Denisov E.
      • Damoc E.
      • Lange O.
      • Makarov A.
      Orbitrap mass spectrometry with resolving powers above 1,000,000.
      ), but even low-end commercial systems offer up to 120,000 resolving power. Fourier transform ion cyclotron resonance analyzers, of course, offer the highest resolving powers (
      • Scigelova M.
      • Hornshaw M.
      • Giannakopulos A.
      • Makarov A.
      Fourier transform mass spectrometry.
      ,
      • Kaiser N.K.
      • Quinn J.P.
      • Blakney G.T.
      • Hendrickson C.L.
      • Marshall A.G.
      A novel 9.4 tesla FTICR mass spectrometer with improved sensitivity, mass resolution, and mass range.
      ,
      • Kaiser N.K.
      • McKenna A.M.
      • Savory J.J.
      • Hendrickson C.L.
      • Marshall A.G.
      Tailored ion radius distribution for increased dynamic range in FT-ICR mass analysis of complex mixtures.
      ). For the Fourier transform MS systems, increased resolving power is achieved by increasing the transient acquisition time. That said, such operation effectively reduces the tandem MS duty cycle for Orbitrap analyzers (∼300 ms/scan for 15,000 resolving power and ∼650 ms/scan for 120,000) (
      • Michalski A.
      • Damoc E.
      • Lange O.
      • Denisov E.
      • Nolting D.
      • Muller M.
      • Viner R.
      • Schwartz J.
      • Remes P.
      • Belford M.
      • Dunyach J.J.
      • Cox J.
      • Horning S.
      • Mann M.
      • Makarov A.
      Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes.
      ). Below we describe experiments that tested these theoretical considerations. Furthermore, we explore the effect of a reduced duty cycle of peptide identifications.
      Figure thumbnail gr1
      Fig. 1(A) Theoretical calculations depicting the number of y-type ions that can be resolved at various mass resolving powers. At 36 mDa NeuCode spacing, approximately 90% of y-type ions will be resolved, and hence distinguished, at a resolving power of 60,000. Note the specified R is at m/z 400. NeuCode SILAC labeling permits identification of C-terminal product ions. Peptides were labeled with two isotopologues of Lys (36 mDa difference) and digested with LysC. MS1 scan (B) with selected precursor shown in panel C. Quantitative data can be concealed or revealed depending on resolving power (15,000, concealed; 240,000, revealed). Likewise, during MS/MS scanning product ions containing the C-terminus, y-type, in this case, appear as a doublet when analyzed under higher resolving power settings (> 120,000, panels D and E).

       Experimental Proof of Concept

      We labeled a yeast cell culture with isotopologues of lysine, one containing six 13C atoms and two 15N atoms (+8.0142 Da) and the other with eight 2H (+8.0502 Da), a 36-mDa difference, to test the premise that our NeuCode labeling strategy may facilitate direct product ion annotation. Following cell lysis, we digested the proteins with LysC and mixed the proteomes in a 1:1 ratio. The resulting peptides were analyzed via nHPLC-MS/MS on a quadrupole linear ion trap Orbitrap hybrid MS system (Orbitrap Elite). Fig. 1A presents a scan sequence beginning with an MS1 analysis. Expansion of a selected precursor m/z region (m/z 757; Fig. 1B) displays the isotopic cluster profile at resolving powers of either 30,000 (black trace) or 480,000 (red trace). Note that the NeuCode SILAC doublet is not detectable in the lower resolving power analysis but is easily distinguished upon high-resolving-power scanning. An example MS/MS spectrum, following beam-type collisional activated dissociation (HCD), is presented in Figs. 1C and 1D. This scan confirms our guiding supposition, that only y-type product ions appear as doublets, indicating the presence of a NeuCode labeled lysine within the product ion sequence. Of course, a small percentage of doublet-containing fragment ions could also arise from enzymatic cleavage at adjacent lysines, resulting in an N-terminal lysine, or from missed cleavage at KP residues. To confirm that the approach is not affected by the dissociation method, we performed a separate analysis using ETD dissociation (Fig. 2). Again, only C-terminal fragments—z-type ions in this case—existed as doublets and were readily distinguished. Manual validation of a number of these tandem mass spectra confirmed that C-terminal fragments, if present, contained the NeuCode doublet.
      Figure thumbnail gr2
      Fig. 2NeuCode SILAC labeling permits identification of C-terminal product ions and is indifferent to fragmentation method, in this case ETD. Peptides were labeled with two isotopologues of Lys (36 mDa difference) and digested with LysC. MS1 scan (A) with ETD MS/MS analysis of a selected precursor shown in panel B. Quantitative data can be concealed or revealed, depending on resolving power (15,000, concealed; 240,000, revealed). Product ions containing the C-terminus, z-type, in this case, appear as a doublet when analyzed under higher resolving powers (>120,000, panels C and D).
      With these promising preliminary data, we sought to test the frequency of C-terminal fragment ions detected as doublets as a function of MS/MS resolving power ranging from 15,000 to 240,000. To accomplish this we analyzed the complex peptide mixture of NeuCode labeled yeast peptides using nHPLC-MS/MS with varied resolving powers. Table I summarizes the number of MS/MS scans that were acquired over the 120-min separation across the various resolving powers. We note only a subtle drop in total scans from resolving powers of 15,000 to 60,000 (18,654 versus 16,230, respectively). Increasing the resolving power to 120,000 and 240,000 further reduced the total number of scans to 13,126 and 9,083. Next we performed a standard database search of these spectra to map them to sequence. Having the sequences of each precursor in hand, we wrote a custom algorithm to inspect each MS/MS spectrum for the presence or absence of a NeuCode doublet at the m/z value of each predicted fragment ion. For the spectra collected at 15,000 resolving power, the majority of y-type ions did not appear as doublets. Specifically, 26% of y-type products showed some evidence of a doublet; this number is higher than expected and is likely an overestimation, as manual inspection revealed that only the y1 ion was consistently resolved. In contrast, 85% percent of y-type ions were detected as NeuCode doublets when analyzed at 60,000 resolving power. This result corresponds very well with our theoretical predictions (88.69%; Fig. 1). Again, as predicted by theory, over 90% of all y-type ions were detected as NeuCode doublets at resolving powers in excess of 120,000 (Table I). At all resolving powers, b-type ions were never detected as having a NeuCode doublet more than 2% of the time. From these data, we conclude that the vast majority of y-type ions can be detected with MS/MS resolving powers of ∼60,000. For the Orbitrap system used here, the extended scan duration caused by this mode of analysis incurs only a subtle penalty in peptide identifications relative to the standard 15,000 resolving power (2,839 versus 2,546, respectively).
      Table I
      MS2 resolution15,00060,000120,000240,000
      Number of scans in 120-min gradient18,65416,23013,1619,083
      Unique peptides2,028*1,8601,7451,345
      Peptide spectral matches2,839*2,5462,4661,831
      Percentage of spectral fragments appearing as doublets
      y-ions26.2785.4293.2995.23
      b-ions0.451.351.792.04

       Product Ion Annotation

      With confidence that our theoretical predictions were experimentally sound, we next sought to develop a methodology to annotate product ion type without knowledge of the peptide sequence. The premise was to search for NeuCode doublets quickly and with high precision. We developed a custom algorithm that considered NeuCode doublet m/z spacing and the total lysine count (determined from the MS1 scan) to determine the product ion type. To avoid overassignment, spectra were also searched for corresponding isotopes and neutral loss peaks whenever a doublet pair was identified. The receiver operating characteristic curves for y-type ion prediction by use of the various datasets of MS/MS resolving power are presented in Fig. 3. Note that we dismissed peptides that result from missed cleavage (∼5% to 10%), as they contain internal lysine residues. This knowledge is gained via inspection of the NeuCode doublet in the prior MS1 scan. The algorithm, which uses machine learning, achieved an overall accuracy of 94.3% with a specificity of 94.7% and a sensitivity of 93.3% from the highest resolving power dataset (240,000). At 120,000, that performance was almost identical (94.9% sensitivity and 92.1% specificity). At 60,000 resolving power, sensitivity slipped just a bit (88.9%), but specificity was comparable (93.1%). To put this into context, consider the 2,466 MS/MS spectra that were mapped to sequence with high confidence from the 120,000 resolving power dataset. In total, these spectra contained 15,503 y-type product ion peaks as detected by the search algorithm. Our y-type ion identification algorithm correctly identified 14,552 (93.2%) of these via detection of the NeuCode doublet. Also encouraging is the relatively low rate of false positive predictions: out of all remaining m/z peaks in the experimental spectra, only 6.8% were misclassified.
      Figure thumbnail gr3
      Fig. 3ROC curve showing the ability of the algorithm to identify C-terminal product ions for peptides containing one lysine at resolving powers of 240,000, 120,000 or 60,000.
      Next we reasoned that our fragment ion annotation method should be unaffected by the dissociation method—that is, it should detect C-terminal ions regardless of the ion type. To test this idea, we collected a tandem mass spectral dataset by use of ETD, a dissociation method that is complementary to collisional activated dissociation (
      • Guthals A.
      • Clauser K.R.
      • Frank A.M.
      • Bandeira N.
      Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides.
      ,
      • Frese C.K.
      • Altelaar A.F.
      • Hennrich M.L.
      • Nolting D.
      • Zeller M.
      • Griep-Raming J.
      • Heck A.J.
      • Mohammed S.
      Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos.
      ,
      • Good D.M.
      • Wirtala M.
      • McAlister G.C.
      • Coon J.J.
      Performance characteristics of electron transfer dissociation mass spectrometry.
      ,
      • Swaney D.L.
      • McAlister G.C.
      • Coon J.J.
      Decision tree-driven tandem mass spectrometry for shotgun proteomics.
      ). Using an ETD dataset acquired at an MS2 resolving power of 240,000, our overall accuracy in correctly identifying a z-type was comparable to that achieved with y-type ions at 92.70%. Although the specificity of 93.00% was similar to what is achieved with HCD, the sensitivity of ETD was slightly lower at 82.80% (supplemental Fig. S2). We suspect that a reason for this drop in sensitivity is that our learner was trained on HCD data and is more adept at predicting y-type than z-type ions. Given these strong results and the minimal effect of the dissociation type, we envision utilizing these data for a variety of applications, including the facilitation of spectral pre-processing and de novo annotation.

       Spectral Identification with Product Ion Annotation

      We reasoned that a direct means by which to confirm the quality of our product ion annotation method was to couple it with traditional database searching. Aside from testing our approach, product ion annotation could offer opportunities to improve traditional database spectral correlation algorithms by reducing search space and/or adding specificity. We provide here preliminary data to convey the promise of this approach. Briefly, we executed our annotation algorithm on a compendium of 23,442 tandem mass spectra that had been collected from peptides carrying the NeuCode SILAC labels (peak scoring and annotation for a typical experiment are provided in the supplemental material). We then removed every m/z peak in each of the 23,442 spectra that was not annotated as a y-type ion. On average, only 2.3% of the m/z peaks in a tandem mass spectrum were retained following filtering. In other words, 98 out of 100 product ion peaks were dismissed. Note that a large quantity of these m/z peaks represent spectral noise or isotopic clusters and are typically removed prior to database searching (
      • Geer L.Y.
      • Markey S.P.
      • Kowalak J.A.
      • Wagner L.
      • Xu M.
      • Maynard D.M.
      • Yang X.
      • Shi W.
      • Bryant S.H.
      Open mass spectrometry search algorithm.
      ,
      • Hoopmann M.R.
      • Finney G.L.
      • MacCoss M.J.
      High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry.
      ). Nevertheless, this process provides us with a simplification of the dataset that will speed searching and analysis, as only the most relevant m/z peaks are retained. These abridged DTA files were then searched in the conventional way, but for only y-type products, increasing the speed of the search, and potentially the specificity. The DTAs were also generated without filtering, and these files were searched to provide a control using normal parameters. Comparable results were obtained for both searches (Fig. 4): 3,688 versus 3,823 identifications at a 1% FDR for the annotated and un-annotated spectra, respectively. Overall, the two searches jointly identified the vast majority (3,427) of the spectra, with each search identifying a handful of unique sequences. The majority of the spectra uniquely identified through database searching contained five or fewer y-type ions (supplemental Fig. S3). These data suggest that we correctly annotated y-type ions. Moving forward, we expect significant benefits to be achieved with this NeuCode-facilitated approach (see below).
      Figure thumbnail gr4
      Fig. 4Venn diagram illustrating the overlap between a database search of all fragment ions and a database search of only annotated y-type fragments.

       Spectral Quality Assessment

      An immediate application for NeuCode product ion annotation is the assessment of tandem mass spectral quality, both pre- and post-database searching. Even with state-of-the-art MS instruments, often only 30% to 50% of MS/MS spectra are matched to sequence following a typical search. It is difficult, however, to know why these unmatched spectra did not correlate to a sequence in the database. Certainly, many of these are of dubious quality; more interesting, some might arise from precursors whose sequences either are not in the database or have been modified in ways that were not considered. It is likely all of these scenarios, among several others, are at play.
      We reason that the y-type ion count could be a metric used to help determine spectral quality, similar to existing methods for assessing MS/MS quality, including the use of peak intensity (
      • Sadygov R.G.
      • Cociorva D.
      • Yates 3rd, J.R.
      Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.
      ) and sequence tags (
      • Frank A.
      • Tanner S.
      • Bafna V.
      • Pevzner P.
      Peptide sequence tags for fast database search in mass-spectrometry.
      ,
      • Tabb D.L.
      • Ma Z.Q.
      • Martin D.B.
      • Ham A.J.
      • Chambers M.C.
      DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring.
      ,
      • Mann M.
      • Wilm M.
      Error-tolerant identification of peptides in sequence databases by peptide sequence tags.
      ). More specifically, we assert that unmatched spectra containing a high count of NeuCode doublets (i.e. y-type ions) may be peptidic in origin, and potentially salvageable. In this way the “y-type ion count” could be used as a metric by which to sort the proteomic wheat from the chaff. To test this idea, we used the compendium of 23,442 tandem mass spectra described above and searched them by use of the standard database correlation method (
      • Geer L.Y.
      • Markey S.P.
      • Kowalak J.A.
      • Wagner L.
      • Xu M.
      • Maynard D.M.
      • Yang X.
      • Shi W.
      • Bryant S.H.
      Open mass spectrometry search algorithm.
      ). Afterward we annotated all spectra (matched and unmatched) for y-type ions by use of our algorithm, described above. Next, we plotted the y-type ion count, as measured by our algorithm, versus the spectral match quality score (e-value, assigned by the search algorithm) for all spectra (Fig. 5A). Spectra not identified by database searching were assigned an e-value of 0. From these data we observed a similar distribution of y-type ion counts irrespective of a positive identification following traditional database searching. Note that there is a significant fraction of unidentified tandem mass spectra with no detectable y-type product ions (Fig. 5B). The bulk of these presumably result from nonpeptidic precursors, ineffective dissociation parameters, or precursors of very low abundance. Of the 9,858 unidentified spectra, 4,599 (46.7%) contained 8 or more y-type product ions.
      Figure thumbnail gr5
      Fig. 5(A) Scatterplot showing the number of y ions present as a function of e-value. (B) Number of y-type ions present in the dataset.
      These 4,599 spectra likely represent quality tandem mass spectra that contain considerable information but did not get mapped to sequence. We envisage triaging these spectra for broader searches or for de novo sequencing. To test this hypothesis, the applicability of NeuCode for de novo sequencing was benchmarked through the PepNovo+ algorithm, as described above. A training dataset comprising 11 NeuCode yeast fractions analyzed over 90-min LC gradients resulted in 13,832 unique peptide identifications at a 1% FDR when searched with OMSSA against a UniProt yeast database (version 2011 07) (
      • Bairoch A.
      • Apweiler R.
      • Wu C.H.
      • Barker W.C.
      • Boeckmann B.
      • Ferro S.
      • Gasteiger E.
      • Huang H.Z.
      • Lopez R.
      • Magrane M.
      • Martin M.J.
      • Natale D.A.
      • O'Donovan C.
      • Redaschi N.
      • Yeh L.S.L.
      The universal protein resource (UniProt).
      ). Raw files were converted into DTA spectra, with the in-house algorithm for selecting y-type ion doublets revised to increase the peak height of predicted y-type ions. This dataset was used to train PepNovo+ predicted intensities and spectral features and compared with output of a traditional database search to establish scoring parameters. To compare the performance of NeuCode for de novo sequencing, raw files were also searched without any alterations to the DTA files—that is, by use of MS/MS without knowledge of which ions were y-type. In this dataset, 51.1% of the top NeuCode candidate sequences predicted by PepNovo+ matched the database results. When the top five candidates were considered, this number increased to 67.9%. The identification of y-type ions provided a considerable boost in correct de novo identifications. These NeuCode results represent a 34% increase in correct identifications relative to the number obtained with MS/MS only, where 37.9% of spectra were correctly identified by PepNovo+ (Fig. 6). The percentage of correct sequences further increased when sequence tags were used rather than full de novo predictions, for this dataset peaking at 78.6% of NeuCode spectra mapped to the correct peptide sequence using amino acid tags of length 3 (supplemental Fig. S4). Once PepNovo+ was trained, it was used to analyze our dataset of 23,442 tandem mass spectra. To increase the accuracy of PepNovo+, only spectra with more than 15 fragment ions were considered. For this dataset, 61.6% of spectra were correctly ranked by PepNovo+. When sequence tags were considered, this number increased to a high of 86.4% (supplemental Fig. S5).
      Figure thumbnail gr6
      Fig. 6(A) Graph illustrating the percentage of correct de novo identifications with PepNovo+ by use of either MS/MS or MS/MS with NeuCode y-type ions. (B) Top de novo predictions by use of PepNovo+.

       Correlation of Fragment to Precursor

      There has been a recent trend toward data-independent acquisition routines in which larger precursor isolation windows are applied to dissociate and mass analyze product ions from multiple precursors in parallel. SWATH is an example of one such approach (
      • Gillet L.C.
      • Navarro P.
      • Tate S.
      • Rost H.
      • Selevsek N.
      • Reiter L.
      • Bonner R.
      • Aebersold R.
      Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis.
      ,
      • Silva J.C.
      • Denny R.
      • Dorschel C.A.
      • Gorenstein M.
      • Kass I.J.
      • Li G.Z.
      • McKenna T.
      • Nold M.J.
      • Richardson K.
      • Young P.
      • Geromanos S.
      Quantitative proteomic analysis by accurate mass retention time pairs.
      ). These NeuCode doublets could likewise be used to group identified y-type ions from disparate precursors by matching the NeuCode ratios. To explore this idea, we performed an experiment in which we divided the MS1 into sequential 25 m/z windows. During data-dependent MS/MS scanning, each selected precursor was isolated first with a 0.7 m/z window, dissociated, and mass analyzed. Immediately following that scan, a second isolation was performed, this time with a 25 m/z window. That MS/MS scan, which simulated a SWATH experiment, was mass analyzed with high resolving power. Based on the differences in the relative abundances between the NeuCode peptides, we reasoned that product ions could be mapped back to their corresponding precursor displaying the same ratio. Fig. 7 demonstrates this concept. A 25 m/z isolation window was applied to the precursor ion at m/z 467.57. A list of identified peptides co-eluting during that scan was generated. We then searched for y-type ions corresponding to these peptides in the wide isolation MS2 scans. Low m/z y-ions were avoided, as they are more likely to be nonspecific to a given peptide, resulting in distorted ratios. It was determined that peptides RCIASDAELTEK, VESNDIIK, and IESPESIK had sufficient y-type ions for identification. NeuCode y-type ion intensities were extracted, with average peak height ratios between the two isotopologues of 1.88, 1.10, and 0.47, respectively (Fig. 7). Corresponding MS1 intensities are presented in supplemental Fig. S6. Based on these data, we anticipate the possible use of NeuCode SILAC to assist in the simplification of product ion spectra from co-fragmented species.
      Figure thumbnail gr7
      Fig. 7Identification of peptides i. RCIASDAELTEK, ii. VESNDIIK, and iii. IESPESIK based on the relative abundances of lysine isotopologues with a precursor isolation window of 25 m/z.

      DISCUSSION

      Here we described a fresh approach that can be used to annotate product ion type. The primary advantage of the method is its use of closely spaced NeuCode amino acid isotopologues that ensure the co-isolation of labeled peptide precursors. Product ion annotation by use of NeuCode will require the collection of tandem mass spectra under higher resolving powers than typically used. That said, through theory and experiment we have demonstrated that a resolving power of ∼50,000 is sufficient, which puts the technology within reach of all Fourier transform MS systems and higher-end TOF analyzers. The major downside of this requirement is the added transient acquisition time needed to achieve the higher resolving power. Here we show that only subtle penalties in total scan number and identifications are incurred relative to the results obtained with a standard MS/MS resolving power of 15,000, as opposed to the 60,000 needed here (18,654 versus 16,230 scans and 2,839 versus 2,546 identifications, respectively). Despite the reduction in scans and, consequently, identifications, we are positioned to directly annotate product ion types with these improved data. Moving forward, we anticipate further improvements in instrument acquisition and resolving power so that this time penalty will likely become inconsequential. Note that just five years ago acquisitions of tandem MS/MS spectra at resolving powers exceeding 20,000 were far from routine. Finally, we note that methods do exist for the analysis of multiple product ion sets in parallel, providing another avenue by which to eliminate the time penalty of high-resolving-power data collection.
      Through an in-house algorithm, we extracted NeuCode labeled y-type product ions from raw tandem mass spectra with excellent sensitivity and precision. We envision many avenues of research stemming from this core technology. First, with knowledge of C-terminal products, one can develop simple algorithms to extract N-terminal product information. Thus, we have a tractable method by which to eliminate the vast majority of spectral noise, both electronic and chemical. A second spoke likely leads to increased spectral searching rates. Specifically, with reductions in spectral complexity of over 90%, significant gains in spectral processing times should be achievable. We also imagine multiple routes to more rigorous methods by which to perform FDR filtering. Much as mass accuracy is used to filter decoy hits from true positive results, one could examine correct product ion assignment to candidate sequences as a filter to increase the number of spectral identifications at a fixed FDR rate. Yet another path is to use the added specificity of annotated product ions to increase the number of variable modifications considered.
      Though our current algorithm predicts C-terminal product ions with high accuracy, we will continue its optimization so as to achieve a sensitivity and specificity closer to 100%. We hypothesize that information is lost in the form of peaks that do not show doublet splitting, either because they are b-type ions or because they are y-type ions with an undetectable doublet. A more intensive algorithm would consider some peaks as y-type product ions only and others as b- or possibly b- and y-type product ions, as in a standard search. Further improvements include the determination of complementary product pairs on the basis of the precursor mass and y-type products identified by peak splitting. This could potentially yield very information-rich peak lists in whih most fragments are known to be b- or y-type a priori and virtually all electronic and chemical noise is filtered out.
      The final, and perhaps most enticing, use of the NeuCode product ion annotation approach is to facilitate large-scale de novo sequence analysis. Here we have provided preliminary evidence that y-type product ion identification can easily distinguish good spectra from bad. Further, we use this information to feed automated de novo analysis algorithms already in existence.

      Acknowledgments

      We thank A. J. Bureta for assistance with figure illustrations and A. E. Merrill and A. S. Hebert for culturing the yeast cells. We appreciate the intellectual stimulation provided by an anonymous reviewer to apply our approach to the correlation of product and precursor ions when multiple precursors are co-fragmented.

      Supplementary Material

      REFERENCES

        • Seidler J.
        • Zinn N.
        • Boehm M.E.
        • Lehmann W.D.
        De novo sequencing of peptides by MS/MS.
        Proteomics. 2010; 10: 634-649
        • Liska A.J.
        • Shevchenko A.
        Combining mass spectrometry with database interrogation strategies in proteomics.
        Trends Analyt. Chem. 2003; 22: 291-298
        • Mo L.
        • Dutta D.
        • Wan Y.
        • Chen T.
        MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry.
        Anal. Chem. 2007; 79: 4870-4878
        • Pevtsov S.
        • Fedulova I.
        • Mirzaei H.
        • Buck C.
        • Zhang X.
        Performance evaluation of existing de novo sequencing algorithms.
        J. Proteome Res. 2006; 5: 3018-3028
        • Pitzer E.
        • Masselot A.
        • Colinge J.
        Assessing peptide de novo sequencing algorithms performance on large and diverse data sets.
        Proteomics. 2007; 7: 3051-3054
        • Taylor J.A.
        • Johnson R.S.
        Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry.
        Anal. Chem. 2001; 73: 2594-2604
        • Horn D.M.
        • Zubarev R.A.
        • McLafferty F.W.
        Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry.
        Proc. Natl. Acad. Sci. U.S.A. 2000; 97: 10313-10317
        • Frank A.M.
        • Savitski M.M.
        • Nielsen M.L.
        • Zubarev R.A.
        • Pevzner P.A.
        De novo peptide sequencing and identification with precision mass spectrometry.
        J. Proteome Res. 2007; 6: 114-123
        • Chongle P.
        • Park B.H.
        • McDonald W.H.
        • Carey P.A.
        • Banfield J.F.
        • VerBerkmoes N.C.
        • Hettich R.L.
        • Samatova N.F.
        A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry.
        BMC Bioinformatics. 2010; 11: 118-131
        • Chi H.
        • Sun R.X.
        • Yang B.
        • Song C.Q.
        • Wang L.H.
        • Liu C.
        • Fu Y.
        • Yuan Z.F.
        • Wang H.P.
        • He S.M.
        • Dong M.Q.
        pNovo: de novo peptide sequencing and identification using HCD spectra.
        J. Proteome Res. 2010; 9: 2713-2724
        • Taouatas N.
        • Drugan M.M.
        • Heck A.J.R.
        • Mohammed S.
        Straightforward ladder sequencing of peptides using a Lys-N metalloendopeptidase.
        Nat. Methods. 2008; 5: 405-407
        • Altelaar A.F.
        • Navarro D.
        • Boekhorst J.
        • van Breukelen B.
        • Snel B.
        • Mohammed S.
        • Heck A.J.
        Database independent proteomics analysis of the ostrich and human proteome.
        Proc. Natl. Acad. Sci. U.S.A. 2012; 109: 407-412
        • van Breukelen B.
        • Georgiou A.
        • Drugan M.M.
        • Taouatas N.
        • Mohammed S.
        • Heck A.J.
        LysNDeNovo: an algorithm enabling de novo sequencing of Lys-N generated peptides fragmented by electron transfer dissociation.
        Proteomics. 2010; 10: 1196-1201
        • Gu S.
        • Pan S.Q.
        • Bradbury E.M.
        • Chen X.
        Use of deuterium-labeled lysine for efficient protein identification and peptide de novo sequencing.
        Anal. Chem. 2002; 74: 5774-5785
        • Noga M.J.
        • Asperger A.
        • Silberring J.
        N-terminal H-3/D-3-acetylation for improved high-throughput peptide sequencing by matrix-assisted laser desorption/ionization mass spectrometry with a time-of-flight/time-of-flight analyzer.
        Rapid Commun. Mass Spectrom. 2006; 20: 1823-1827
        • Hennrich M.L.
        • Mohammed S.
        • Altelaar A.F.M.
        • Heck A.J.R.
        Dimethyl isotope labeling assisted de novo peptide sequencing.
        J. Am. Soc. Mass Spectrom. 2010; 21: 1957-1965
        • Munchbach M.
        • Quadroni M.
        • Miotto G.
        • James P.
        Quantitation and facilitated de novo sequencing of proteins by isotopic N-terminal labeling of peptides with a fragmentation-directing moiety.
        Anal. Chem. 2000; 72: 4047-4057
        • Madsen J.A.
        • Brodbelt J.S.
        Simplifying fragmentation patterns of multiply charged peptides by N-terminal derivatization and electron transfer collision activated dissociation.
        Anal. Chem. 2009; 81: 3645-3653
        • Oda Y.
        • Huang K.
        • Cross F.R.
        • Cowburn D.
        • Chait B.T.
        Accurate quantitation of protein expression and site-specific phosphorylation.
        Proc. Natl. Acad. Sci. U.S.A. 1999; 96: 6591-6596
        • Ong S.E.
        • Blagoev B.
        • Kratchmarova I.
        • Kristensen D.B.
        • Steen H.
        • Pandey A.
        • Mann M.
        Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
        Mol. Cell. Proteomics. 2002; 1: 376-386
        • Schnolzer M.
        • Jedrzejewski P.
        • Lehmann W.D.
        Protease-catalyzed incorporation of O-18 into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry.
        Electrophoresis. 1996; 17: 945-953
        • Shevchenko A.
        • Chernushevich I.
        • Ens W.
        • Standing K.G.
        • Thomson B.
        • Wilm M.
        • Mann M.
        Rapid “de novo” peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer.
        Rapid Commun. Mass Spectrom. 1997; 11: 1015-1024
        • Uttenweiler-Joseph S.
        • Neubauer G.
        • Christoforidis A.
        • Zerial M.
        • Wilm M.
        Automated de novo sequencing of proteins using the differential scanning technique.
        Proteomics. 2001; 1: 668-682
        • Goodlett D.R.
        • Keller A.
        • Watts J.D.
        • Newitt R.
        • Yi E.C.
        • Purvine S.
        • Eng J.K.
        • von Haller P.
        • Aebersold R.
        • Kolker E.
        Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation.
        Rapid Commun. Mass Spectrom. 2001; 15: 1214-1221
        • Goodlett D.R.
        • Yi E.C.
        Stable isotopic labeling and mass spectrometry as a means to determine differences in protein expression.
        Trends Analyt. Chem. 2003; 22: 282-290
        • Kim J.-S.
        • Shin M.
        • Song J.-S.
        • An S.
        • Kim H.-J.
        C-terminal de novo sequencing of peptides using oxazolone-based derivatization with bromine signature.
        Anal. Biochem. 2011; 419: 211-216
        • Keough T.
        • Youngquist R.S.
        • Lacey M.P.
        Sulfonic acid derivatives for peptide sequencing.
        Anal. Chem. 2003; 75: 156A-165A
        • Lee Y.H.
        • Han H.
        • Chang S.B.
        • Lee S.W.
        Isotope-coded N-terminal sulfonation of peptides allows quantitative proteomic analysis with increased de novo peptide sequencing capability.
        Rapid Commun. Mass Spectrom. 2004; 18: 3019-3027
        • Hsu J.L.
        • Huang S.Y.
        • Chow N.H.
        • Chen S.H.
        Stable-isotope dimethyl labeling for quantitative proteomics.
        Anal. Chem. 2003; 75: 6843-6852
        • Hsu J.L.
        • Huang S.Y.
        • Chow N.H.
        • Chen S.H.
        Stable-isotope dimethyl labeling for quantitative proteomics.
        Anal. Chem. 2003; 75: 6843-6852
        • Ji C.
        • Li L.
        Quantitative proteome analysis using differential stable isotopic labeling and microbore LC-MALDI MS and MS/MS.
        J. Proteome Res. 2005; 4: 734-742
        • Hubler S.L.
        • Jue A.
        • Keith J.
        • McAlister G.C.
        • Craciun G.
        • Coon J.J.
        Valence parity renders z(·)-type ions chemically distinct.
        J. Am. Chem. Soc. 2008; 130: 6388-6394
        • Hebert A.S.
        • Merrill A.E.
        • Bailey D.J.
        • Still A.J.
        • Westphall M.S.
        • Strieter E.R.
        • Pagliarini D.J.
        • Coon J.J.
        Neutron-encoded mass signatures for multiplexed proteome quantification.
        Nat. Methods. 2013; 10: 332-334
        • McAlister G.C.
        • Phanstiel D.
        • Good D.M.
        • Berggren W.T.
        • Coon J.J.
        Implementation of electron-transfer dissociation on a hybrid linear ion trap-orbitrap mass spectrometer.
        Anal. Chem. 2007; 79: 3525-3534
        • Wenger C.D.
        • Phanstiel D.H.
        • Lee M.V.
        • Bailey D.J.
        • Coon J.J.
        COMPASS: a suite of pre- and post-search proteomics software tools for OMSSA.
        Proteomics. 2011; 11: 1064-1074
        • Geer L.Y.
        • Markey S.P.
        • Kowalak J.A.
        • Wagner L.
        • Xu M.
        • Maynard D.M.
        • Yang X.
        • Shi W.
        • Bryant S.H.
        Open mass spectrometry search algorithm.
        J. Proteome Res. 2004; 3: 958-964
        • Kim S.
        • Mischerikow N.
        • Bandeira N.
        • Navarro J.D.
        • Wich L.
        • Mohammed S.
        • Heck A.J.
        • Pevzner P.A.
        The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.
        Mol. Cell. Proteomics. 2010; 9: 2840-2852
        • Guthals A.
        • Clauser K.R.
        • Bandeira N.
        Shotgun protein sequencing with meta-contig assembly.
        Mol. Cell. Proteomics. 2012; 11: 1084-1096
        • Frank A.M.
        A ranking-based scoring function for peptide-spectrum matches.
        J. Proteome Res. 2009; 8: 2241-2252
        • Guthals A.
        • Clauser K.R.
        • Frank A.M.
        • Bandeira N.
        Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides.
        J. Proteome Res. 2013; 12: 2846-2857
        • Satoh T.
        • Sato T.
        • Tamura J.
        Development of a high-performance MALDI-TOF mass spectrometer utilizing a spiral ion trajectory.
        J. Am. Soc. Mass Spectrom. 2007; 18: 1318-1323
        • Klitzke C.F.
        • Corilo Y.E.
        • Siek K.
        • Binkley J.
        • Patrick J.
        • Patrick J.
        • Eberlin M.N.
        Petroleomics by ultrahigh-resolution time-of-flight mass spectrometry.
        Energy Fuels. 2012; 26: 5787-5794
        • Denisov E.
        • Damoc E.
        • Lange O.
        • Makarov A.
        Orbitrap mass spectrometry with resolving powers above 1,000,000.
        Int. J. Mass Spectrom. 2012; 325: 80-85
        • Scigelova M.
        • Hornshaw M.
        • Giannakopulos A.
        • Makarov A.
        Fourier transform mass spectrometry.
        Mol. Cell. Proteomics. 2011; 10 (M111.009431)
        • Kaiser N.K.
        • Quinn J.P.
        • Blakney G.T.
        • Hendrickson C.L.
        • Marshall A.G.
        A novel 9.4 tesla FTICR mass spectrometer with improved sensitivity, mass resolution, and mass range.
        J. Am. Soc. Mass Spectrom. 2011; 22: 1343-1351
        • Kaiser N.K.
        • McKenna A.M.
        • Savory J.J.
        • Hendrickson C.L.
        • Marshall A.G.
        Tailored ion radius distribution for increased dynamic range in FT-ICR mass analysis of complex mixtures.
        Anal. Chem. 2013; 85: 265-272
        • Michalski A.
        • Damoc E.
        • Lange O.
        • Denisov E.
        • Nolting D.
        • Muller M.
        • Viner R.
        • Schwartz J.
        • Remes P.
        • Belford M.
        • Dunyach J.J.
        • Cox J.
        • Horning S.
        • Mann M.
        • Makarov A.
        Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes.
        Mol. Cell. Proteomics. 2012; 11 (O111.013698)
        • Frese C.K.
        • Altelaar A.F.
        • Hennrich M.L.
        • Nolting D.
        • Zeller M.
        • Griep-Raming J.
        • Heck A.J.
        • Mohammed S.
        Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos.
        J. Proteome Res. 2011; 10: 2377-2388
        • Good D.M.
        • Wirtala M.
        • McAlister G.C.
        • Coon J.J.
        Performance characteristics of electron transfer dissociation mass spectrometry.
        Mol. Cell. Proteomics. 2007; 6: 1942-1951
        • Swaney D.L.
        • McAlister G.C.
        • Coon J.J.
        Decision tree-driven tandem mass spectrometry for shotgun proteomics.
        Nat. Methods. 2008; 5: 959-964
        • Hoopmann M.R.
        • Finney G.L.
        • MacCoss M.J.
        High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry.
        Anal. Chem. 2007; 79: 5620-5632
        • Sadygov R.G.
        • Cociorva D.
        • Yates 3rd, J.R.
        Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book.
        Nat. Methods. 2004; 1: 195-202
        • Frank A.
        • Tanner S.
        • Bafna V.
        • Pevzner P.
        Peptide sequence tags for fast database search in mass-spectrometry.
        J. Proteome Res. 2005; 4: 1287-1295
        • Tabb D.L.
        • Ma Z.Q.
        • Martin D.B.
        • Ham A.J.
        • Chambers M.C.
        DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring.
        J. Proteome Res. 2008; 7: 3838-3846
        • Mann M.
        • Wilm M.
        Error-tolerant identification of peptides in sequence databases by peptide sequence tags.
        Anal. Chem. 1994; 66: 4390-4399
        • Bairoch A.
        • Apweiler R.
        • Wu C.H.
        • Barker W.C.
        • Boeckmann B.
        • Ferro S.
        • Gasteiger E.
        • Huang H.Z.
        • Lopez R.
        • Magrane M.
        • Martin M.J.
        • Natale D.A.
        • O'Donovan C.
        • Redaschi N.
        • Yeh L.S.L.
        The universal protein resource (UniProt).
        Nucleic Acids Res. 2005; 33: D154-D159
        • Gillet L.C.
        • Navarro P.
        • Tate S.
        • Rost H.
        • Selevsek N.
        • Reiter L.
        • Bonner R.
        • Aebersold R.
        Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis.
        Mol. Cell. Proteomics. 2012; 11 (O111.016717)
        • Silva J.C.
        • Denny R.
        • Dorschel C.A.
        • Gorenstein M.
        • Kass I.J.
        • Li G.Z.
        • McKenna T.
        • Nold M.J.
        • Richardson K.
        • Young P.
        • Geromanos S.
        Quantitative proteomic analysis by accurate mass retention time pairs.
        Anal. Chem. 2005; 77: 2187-2200