Advertisement

Multiple Layers of Complexity in O-Glycosylation Illustrated with the Urinary Glycoproteome

Open AccessPublished:November 02, 2022DOI:https://doi.org/10.1016/j.mcpro.2022.100439

      Highlights

      • Urinary O-glycopeptides were enriched using lectin affinity chromatography
      • HCD and EThcD data were analyzed by four proteomic software packages
      • High misidentification rate in spite of strict probability based acceptance criteria
      • Software development recommendations for more reliable O-glycopeptide analysis

      Abstract

      While N-glycopeptides are relatively easy to characterize, O-glycosylation analysis is more complex. In this paper we illustrate the multiple layers of O-glycopeptide characterization that make this task so challenging. We believe our carefully curated dataset represents perhaps the largest, intact human glycopeptide mixture derived from individuals, not from cell lines. The samples were collected from healthy individuals, patients with superficial or advanced bladder cancer (3 of each group) and a single bladder inflammation patient. The data were scrutinized manually, and interpreted using three different search engines: Byonic, Protein Prospector and O-Pair, and the tool MS-Filter. Despite all the recent advances, reliable automatic O-glycopeptide assignment has not been solved yet. Our data reveal such diversity of site-specific O-glycosylation that has not been presented before. In addition to the potential biological implications, this dataset should be a valuable resource for software developers in the same way as some of our previously released data has been used in the development of O-Pair and O-Glycoproteome Analyzer. Based on the manual evaluation of the performance of the existing tools with our data we lined up a series of recommendations that if implemented could significantly improve the reliability of glycopeptide assignments.

      Graphical abstract

      Abbreviations:

      ETD (Electron-Transfer Dissociation), EThcD (Electron-Transfer/Higher-Energy Collision Dissociation), HCD (Higher-Energy Collision Dissociation), NCE (Normalized collision energy), PTM (Post-translational modification), PSM (Peptide-to-spectrum match), WGA (Wheat germ agglutinin), PP (Protein Prospector), HexNAc (N-acetylhexosamine), Hex (Hexose), Gal (Galactose), Fuc (Fucose), NeuAc (N-acetylneuraminic acid), NeuAcAc (O-acetyl-N-acetylneuraminic acid)

      Introduction

      Mass spectrometry has been used for the characterization of glycoproteins ever since the advent of soft ionization techniques, and played a significant role in the discovery of novel O-linked modifications [
      • Harris R.J.
      • Leonard C.K.
      • Guzzetta A.W.
      • Spellman M.W.
      Tissue plasminogen activator has an O-linked fucose attached to threonine-61 in the epidermal growth factor domain.
      ,
      • Harris R.J.
      • van Halbeek H.
      • Glushka J.
      • Basa L.J.
      • Ling V.T.
      • Smith K.J.
      • Spellman M.W.
      Identification and structural analysis of the tetrasaccharide NeuAc alpha(2-->6)Gal beta(1-->4)GlcNAc beta(1-->3)Fuc alpha 1-->O-linked to serine 61 of human factor IX.
      ,
      • Hofsteenge J.
      • Müller D.R.
      • de Beer T.
      • Löffler A.
      • Richter W.J.
      • Vliegenthart J.F.
      New type of linkage between a carbohydrate and a protein: C-glycosylation of a specific tryptophan residue in human RNase Us.
      ]. With the growing importance of recombinant therapeutic proteins glycosylation analysis has become essential for the pharmacological industry [
      • Hashii N.
      • Suzuki J.
      • Hanamatsu H.
      • Furukawa J.I.
      • Ishii-Watabe A.
      In-depth site-specific O-Glycosylation analysis of therapeutic Fc-fusion protein by electron-transfer/higher-energy collisional dissociation mass spectrometry.
      ,
      • Stavenhagen K.
      • Gahoual R.
      • Dominguez Vega E.
      • Palmese A.
      • Ederveen A.L.H.
      • Cutillo F.
      • Palinsky W.
      • Bierau H.
      • Wuhrer M.
      Site-specific N- and O-glycosylation analysis of atacicept.
      ]. At the same time an improved tool set – including new enrichment methods, fast mass spectrometers with high mass accuracy and detection sensitivity, and the development of ETD and EThcD – has permitted the characterization of labile post-translational modifications (PTMs), and thus, enabled N- and recently O-glycosylation analysis of wild-type samples aimed at the better understanding of the biological roles of these diverse PTMs and also in the search of biomarkers [
      • Napoletano C.
      • Steentoff C.
      • Battisti F.
      • Ye Z.
      • Rahimi H.
      • Zizzari I.G.
      • Dionisi M.
      • Cerbelli B.
      • Tomao F.
      • French D.
      • d'Amati G.
      • Panici P.B.
      • Vakhrushev S.
      • Clausen H.
      • Nuti M.
      • Rughetti A.
      Investigating Patterns of Immune Interaction in Ovarian Cancer: Probing the O-glycoproteome by the Macrophage Galactose-Like C-type Lectin (MGL).
      ,
      • Pirro M.
      • Schoof E.
      • van Vliet S.J.
      • Rombouts Y.
      • Stella A.
      • de Ru A.
      • Mohammed Y.
      • Wuhrer M.
      • van Veelen P.A.
      • Hensbergen P.J.
      Glycoproteomic Analysis of MGL-Binding Proteins on Acute T-Cell Leukemia Cells.
      ,
      • Chernykh A.
      • Kawahara R.
      • Thaysen-Andersen M.
      Towards structure-focused glycoproteomics.
      ].
      For over a decade our research group has been engaged in method development for the enrichment and mass spectrometric characterization of mucin-type O-glycopeptides first from bovine and human serum [
      • Darula Z.
      • Medzihradszky K.F.
      Affinity enrichment and characterization of mucin core-1 type glycopeptides from bovine serum.
      ,
      • Darula Z.
      • Sherman J.
      • Medzihradszky K.F.
      How to dig deeper? Improved enrichment methods for mucin core-1 type glycopeptides.
      ,
      • Darula Z.
      • Sarnyai F.
      • Medzihradszky K.F.
      O-glycosylation sites identified from mucin core-1 type glycopeptides from human serum.
      ] and more recently from human urine. This latter study started out as a quest for elusive biomarkers. Urine was collected from healthy individuals, bladder cancer patients and one outpatient with bladder inflammation. We used affinity-chromatography with WGA to enrich glycopeptides, followed by LC/MS analysis with HCD-triggered EThcD activation for data acquisition. Thus, we obtained MS/MS spectra by two mechanistically different fragmentation methods that are essential for successful glycopeptide characterization, because these techniques deliver complementary information on the amino acid sequence and the glycan [
      • Darula Z.
      • Medzihradszky K.F.
      Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go.
      ]. HCD activation is still popular even in high throughput glycosylation analysis of complex mixtures [
      • Kim J.
      • Ryu C.
      • Ha J.
      • Lee J.
      • Kim D.
      • Ji M.
      • Park C.S.
      • Lee J.
      • Kim D.K.
      • Kim H.H.
      Structural and Quantitative Characterization of Mucin-Type O-Glycans and the Identification of O-Glycosylation Sites in Bovine Submaxillary Mucin.
      ,
      • Wang S.
      • Qin H.
      • Mao J.
      • Fang Z.
      • Chen Y.
      • Zhang X.
      • Hu L.
      • Ye M.
      Profiling of Endogenously Intact N-Linked and O-Linked Glycopeptides from Human Serum Using an Integrated Platform.
      ,
      • Zhao X.
      • Zheng S.
      • Li Y.
      • Huang J.
      • Zhang W.
      • Xie Y.
      • Qin W.
      • Qian X.
      An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
      ,
      • Kawahara R.
      • Ortega F.
      • Rosa-Fernandes L.
      • Guimarães V.
      • Quina D.
      • Nahas W.
      • Schwämmle V.
      • Srougi M.
      • Leite K.R.M.
      • Thaysen-Andersen M.
      • Larsen M.R.
      • Palmisano G.
      Distinct urinary glycoprotein signatures in prostate cancer patients.
      ] because it affords efficient peptide identification. However, relying solely on HCD data prevents localization of glycosylation sites, moreover, only the total composition of the modifying glycans can be determined [
      • Darula Z.
      • Medzihradszky K.F.
      Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go.
      ,
      • Riley N.M.
      • Malaker S.A.
      • Driessen M.D.
      • Bertozzi C.R.
      Optimal Dissociation Methods Differ for N- and O-Glycopeptides.
      ]. EThcD offers an additional yet untapped advantage beside site assignment providing information on the size and composition of the modifying glycans and on the connectivity of the different units. EThcD spectra acquired at minimal normalized collisional energy (15% NCE) permitted us to distinguish isomeric glycoforms, revealing structural differences in the modifying glycans [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]. Manual data interpretation was essential in the identification and characterization of the 36 glycan structures, 29 of them were never reported in a site specific manner. Evidently, the identity of the sugar units, and the stereochemistry and exact positions of the linkages cannot be determined from these MS data. Thus, the structures were assigned based on known glycan biosynthesis pathways and on glycan structures described in mucin glycan studies [
      • Rossez Y.
      • Maes E.
      • Lefebvre Darroman T.
      • Gosset P.
      • Ecobichon C.
      • Joncquel Chevalier Curt M.
      • Boneca I.G.
      • Michalski J.C.
      • Robbe-Masselot C.
      Almost all human gastric mucin O-glycans harbor blood group A, B or H antigens and are potential binding sites for Helicobacter pylori.
      ,
      • Jin C.
      • Kenny D.T.
      • Skoog E.C.
      • Padra M.
      • Adamczyk B.
      • Vitizeva V.
      • Thorell A.
      • Venkatakrishnan V.
      • Lindén S.K.
      • Karlsson N.G.
      Structural Diversity of Human Gastric Mucin Glycans.
      ,
      • Tanaka-Okamoto M.
      • Hanzawa K.
      • Mukai M.
      • Takahashi H.
      • Ohue M.
      • Miyamoto Y.
      Identification of internally sialylated carbohydrate tumor marker candidates, including Sda/CAD antigens, by focused glycomic analyses utilizing the substrate specificity of neuraminidase.
      ]. Our results indicated that the urinary O-glycosylation landscape is more complex than expected [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ].
      In this study we have evaluated the performance of different software tools for automated interpretation of O-glycopeptide data. We used three different search engines: Byonic [
      • Bern M.
      • Kil Y.J.
      • Becker C.
      Byonic: advanced peptide and protein identification software.
      ], Protein Prospector (PP) [
      • Baker P.R.
      • Medzihradszky K.F.
      • Chalkley R.J.
      Improving software performance for peptide electron transfer dissociation data analysis by implementation of charge state- and sequence-dependent scoring.
      ] and O-Pair [
      • Lu L.
      • Riley N.M.
      • Shortreed M.R.
      • Bertozzi C.R.
      • Smith L.M.
      O-Pair Search with MetaMorpheus for O-glycopeptide characterization.
      ], and the MS-Filter program in Protein Prospector [
      • Chalkley R.J.
      • Medzihradszky K.F.
      • Darula Z.
      • Pap A.
      • Baker P.R.
      The effectiveness of filtering glycopeptide peak list files for Y ions.
      ]. Byonic and Protein Prospector have both been adapted for glycopeptide analysis. From our perspective their major difference is how the glycan fragmentation is handled. Fragments formed via glycosidic bond cleavages are searched for and scored by Byonic in both HCD and EThcD spectra, while PP can annotate these fragments when prompted to do so, but only activation-dependent peptide fragments contribute to the score. Byonic considers the glycan(s) linked to the peptide even upon collisional activation although it also permits gas-phase deglycosylation, PP considers O-glycans as neutral losses by default. Byonic assigns the glycosylation site(s) even in HCD, and its Delta Mod score signals the reliability of the glycan placement. PP’s built-in site localization (SLIP) [
      • Baker P.R.
      • Trinidad J.C.
      • Chalkley R.J.
      Modification site localization scoring integrated into a search engine.
      ] score applies to glycosylation only in ET(hc)D, but it also will signal the lack of sufficient information. A rather limiting factor for both software is that data acquired on the same precursor with different activation methods, e.g. HCD and EThcD are not considered in concert, therefore the complementary nature of these data are not exploited. O-Pair [
      • Lu L.
      • Riley N.M.
      • Shortreed M.R.
      • Bertozzi C.R.
      • Smith L.M.
      O-Pair Search with MetaMorpheus for O-glycopeptide characterization.
      ] is the only search engine that analyzes the two datasets combined, starting with the interpretation of HCD data to identify the peptide sequence and the additive mass of the glycan, and using the ETD data to determine the modification sites and further confirm the assignment. Site localization assessment also has been included in the output. Level 1 refers to complete confidence in both the glycan compositions and site localizations, level 2 indicates confidence about at least one glycan in multiply glycosylated peptides, while level 3 identifications deliver glycan composition assignment only. MS-Filter is the simplest approach developed for the identification of new glycoforms of glycopeptides confidently identified in preceding database searches. HCD spectra are searched for specific Y fragments (Y0 and Y1) (for nomenclature see [
      • Domon B.
      • Costello C.E.
      A systematic nomenclature for carbohydrate fragmentations in FAB-MS/MS spectra of glycoconjugates.
      ]) of the input peptide list, and the glycan composition is assigned based on the mass difference between the peptide and the precursor mass. Peptide backbone fragments, if there are any, are scored [
      • Chalkley R.J.
      • Medzihradszky K.F.
      • Darula Z.
      • Pap A.
      • Baker P.R.
      The effectiveness of filtering glycopeptide peak list files for Y ions.
      ].
      O-glycosylation analysis is a much more complex task than regular protein identification and even simple PTM analysis. A wide variety of glycoforms have to be considered during automated data interpretation, because of the lack of consensus sites, frequent occurrence of Ser/Thr residues, and the potential macro- and microheterogeneity at each site. Therefore, we tested all search engines with our data in an iterative manner. First glycoproteins modified with the three most common mucin-type glycans, the mono- or disialylated core-1 and the disialylated core-2 O-glycans (HexNAcHexNeuAc1-2 and HexNAc2Hex2NeuAc2) were identified. The follow-up searches were performed with an extended glycan list incorporating all glycans reported in our pilot studies [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ] using a protein database restricted to glycoproteins identified in the first round. Our reasoning was that glycoproteins bearing any mucin-type glycans certainly will feature the most common structures, and at a higher level than the others. We also identified non-modified sequences and N-glycopeptides performing Byonic searches. The results were summarized for each donor.
      Careful and multifaceted investigation of all the O-glycopeptide assignments revealed that in spite of carefully chosen probability-based acceptance criteria the false identification rate is higher than expected. Thus, our focus shifted, and in this paper we will provide insights about the difficulties the research community faces when analyzing wild type O-glycopeptides. Based on this experience we have drawn up a list about the data interpretation changes desirable for more reliable assignments.
      In addition, as a result of the scrutiny invested in these data we are also able to show examples about the diversity such studies could reveal that, as far as we know, has not been presented yet by other large scale O-glycopeptide analyses.
      We hope that sharing and consequently, scrutinizing this information will lead to better understanding of the layers of complexity one has to tackle in intact O-glycopeptide analysis, and will definitely help to develop better tools for data interpretation. In fact, some of our earlier urinary LC-MS/MS data have already been used for this purpose [
      • Lu L.
      • Riley N.M.
      • Shortreed M.R.
      • Bertozzi C.R.
      • Smith L.M.
      O-Pair Search with MetaMorpheus for O-glycopeptide characterization.
      ,
      • Park G.W.
      • Lee J.W.
      • Lee H.K.
      • Shin J.H.
      • Kim J.Y.
      • Yoo J.S.
      Classification of Mucin-Type O-Glycopeptides Using Higher-Energy Collisional Dissociation in Mass Spectrometry.
      ].

      Experimental procedures

      Experimental Design and Statistical Rationale

      Urine samples were collected from 10 donors (Table S1). The studies in this work abide by the Declaration of Helsinki principles. Consent forms were approved by the Hungarian Scientific and Research Ethics Committee (approval number: 1011/16). This was a discovery phase study only, no statistically significant quantitative results were obtained. The whole experimental workflow is illustrated in Figure 1.
      Figure thumbnail gr1
      Figure 1Workflow for the MS characterization of the urinary proteome. HCD data derived from glycopeptides or from non-glycosylated sequences were separated based on the detection of the diagnostic HexNAc oxonium ion (m/z 204.0867 ± 10 ppm). O-glycopeptides were assigned in a two-round database search. Glycoproteins identified in the initial search served as a protein database for the second, restricted search. O-glycosylated sequences identified in the second search were used as the input list for the MS-Filter software. ‘CTRL’ indicates the control (healthy) group, ‘SBC’, ‘ABC’ and ‘BI’ stand for superficial bladder cancer, advanced bladder cancer and bladder inflammation, respectively.

      Sample preparation and mass spectrometry

      Ten random midstream urine samples were collected and stored at 4 °C before processing. Blood contamination was not observed for any of the samples. The previously published sample preparation protocol was followed [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ]. Briefly, the samples were centrifuged (5000g, 4oC) then supernatants (50 ml per patient) were concentrated on 10K MWCO cellulose filters to 250 μl (5000g, 4oC). Subsequently proteins were reduced, alkylated, and digested with trypsin, then subjected to a 2-round glycopeptide enrichment using a WGA affinity column collecting 3 glycopeptide fractions, the end of the flow-through peak, its shoulder and a fraction eluted by GlcNAc (Figure S1). Fractions were analyzed separately by LC-MS/MS using a Waters M-Class nanoUPLC on-line coupled to an Orbitrap Fusion Lumos Tribrid (Thermo Scientific) mass spectrometer. Samples were desalted on a trap column (Waters Acquity UPLC MClass Symmetry C18 180 μm × 20 mm column, 5-μm particle size, 100-Å pore size; flow rate 10 μl/min), and fractionated by a linear gradient of 10-30% B in 60 min (Waters Acquity UPLC M-Class BEH C18 75 μm × 250 mm column, 1.7-μm particle size, 130 Å pore size; solvent A: 0.1% formic acid/water; solvent B: 0.1% formic acid/ACN; flow rate: 300 nl/min; separating column temperature: 45 °C). MS/MS data were acquired using HCD product-ion dependent EThcD data acquisition, the presence of the diagnostic HexNAc-specific oxonium ion, m/z 204.0867 among the 20 most abundant HCD fragments triggered EThcD acquisition. HCD spectra were acquired at 28% NCE, while supplemental activation in EThcD was set to 15% NCE. 40 % of the collected WGA fractions were injected, and each fraction was analyzed twice, selecting precursors of different charge states in the consecutive LC-MS/MS experiments (z=3-5 or z=2). MS/MS intensity threshold was set to 106 in a total cycle time of 3 s. All measurements were performed in the Orbitrap analyzer with a resolution of 60000 and 15000 for MS1 and MS/MS, respectively. Dynamic exclusion was set to 30s.

      Data interpretation

      Separate HCD and EThcD peak lists were generated from the .raw files (Table S1) using Proteome Discoverer v2.4. Spectra with minimum 40 peaks were retained. The HCD data were further divided into two peak lists based on the presence of the HexNAc oxonium ion, m/z 204.087±10 ppm among the 20 most abundant peaks using MS-Filter. The resulting EThcD and the HCD peak lists were searched separately with Byonic (v3.7.4) and Protein Prospector (v6.2.1). For the O-Pair (v0.0.308) searches the original raw files were used. For MS-Filter the unfiltered HCD peak lists were used as input. Search parameters and acceptance criteria are detailed in Table S2-S9 and Figure S2. All database searches used the human subset of Swissprot protein database (release date: 2020.12.14).

      Non-glycopeptide and N-glycopeptide searches using Byonic

      HCD data lacking the abundant HexNAc oxonium ion were searched for non-glycosylated peptides only (search parameters are listed in Table S2). Table S3 provides information on the N-glycopeptide searches from EThcD and 204-filtered HCD peak lists, while Table S4 contains the N-glycan database used that was created from Byonic’s built-in N-glycan database (N-glycan 57 human plasma.txt) by removing 15 entries representing Na-adducts and truncated N-glycan structures.

      O-glycopeptide searches using Byonic, Protein Prospector and O-Pair

      A two-round database search was implemented. The initial search was performed permitting only the three most common mucin-type structures (HexNAcHexNeuAc1-2 and HexNAc2Hex2NeuAc2) (Table S5, Table S7) with each search engine. The O-glycoproteins identified in the initial searches meeting the search engine–specific cutoff criteria formed the restricted database for the second round (Table S6) using an expanded O-glycan database with 42 additional glycans (Table S8) containing oligosaccharides identified in our previous investigations [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ].

      HCD peak list processed with MS-Filter

      Peptide sequences generated from confident O-glycopeptide identifications (meeting the acceptance criteria) were used as input (Table S10) for the MS-Filter (Figure S2); only those HCD spectra were retained that featured the HexNAc oxonium ion (m/z 204.087) and a Y0, Y1 pair in a matching charge state among the top 15 peaks and within 10 ppm mass accuracy. Assigned HCD spectra had to feature an additive mass corresponding to a listed glycan in the selected glycan database (Table S9), and the precursor ion of this glycoform had to be measured within 10 ppm of the calculated value.

      Results and Discussion

      The results presented here were obtained from urine samples collected from 10 male donors classified into four categories: healthy controls (3), superficial bladder cancer patients (3), advanced bladder cancer patients (3) and one patient with bladder inflammation. Sample preparation and mass spectrometric analysis were carried out identically for all samples. From each donor 50 ml urine was concentrated then digested with trypsin. LC-MS/MS data were generated from glycopeptide mixtures enriched by affinity chromatography using WGA, and HCD-fragment ion-triggered EThcD analysis ensured the acquisition of two MS/MS spectra produced by mechanistically different fragmentation processes for each glycopeptide. The resulting data set (60 files altogether) was then processed with three different search engines (Byonic, Protein Prospector and O-Pair) and with PP’s MS-Filter software in order to improve the success rate of O-glycopeptide assignments (Figure 1).
      O-glycopeptides were identified applying a two-round database search using Byonic, PP and O-Pair.
      Separate HCD and EThcD peaklists were generated for Byonic and PP searches. Peaklists representing the same donors were merged, and HCD data were further filtered for the presence of the HexNAc-related ion m/z 204.0867. O-Pair uses the raw data, each file was searched separately. In the first search, only the most common glycans (HexNAcHexNeuAc1-2 and HexNAc2Hex2NeuAc2) were permitted, while in the second round only O-glycoproteins identified reliably in the first round by the respective search engine either from HCD or EThcD data were searched using a larger glycan database with 40 additional glycan structures present on urinary proteins [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ] (Table S11-20, Sheets D-H). Finally, an input list was assembled from the confidently identified O-glycopeptides identified by any of the above search engines in the second round to find additional confirmation in the form of Y0 and Y1 ions using MS-Filter (Table S11-20, Sheets I-J). For further details see the Experimental section.
      N-glycopeptides were identified from EThcD or 204-filtered HCD data using Byonic (Table S11-20, Sheet C). The lists of unmodified sequences were compiled from three searches: Byonic PSMs from HCD spectra not featuring the HexNAc oxonium ion m/z 204 (TableS 11-20, Sheet N), Byonic PSMs from N-glycopeptide searches (Table S11-20, Sheet O) and O-Pair results (Table S11-20, Sheet L).
      Approximately 115,000 MS/MS spectra featuring more than 40 peaks were recorded per patient (Table S21). The average identification rate was about 19%. Non-glycosylated sequences represent ∼81% of the assigned MS/MS data, and only ∼14% and ∼5% of the identifications belong to O- and N-glycopeptides, respectively. In contrast, based on the presence of HexNAc oxonium ion (m/z 204.087), ∼40% of all spectra might represent glycopeptides. The assignments meeting the reported acceptance criteria are summed up for each patient (Table S11-22) listing identifications by search methods separately as well as a compilation of all assignments. For some MS/MS data multiple assignments are reported. In certain instances, O-Pair identified 2 or 3 components from the same spectrum, a unique feature of this search engine. Although it is not obvious from its on-line manual, it may re-assign the precursor ion m/z, or identify multiple precursors from the raw data. Not surprisingly, the majority of these „extra” identifications represented nonglycosylated peptides, since these produce more backbone fragments and are easier to assign than glycopeptides. We estimate that these multiple assignments represent less than 10% of the overall O-Pair-related PSMs, although the phenomenon of mixture spectra must be a lot more widespread.

      O-glycopeptides

      From the identification rates it is already obvious that automated glycopeptide identification is still a challenging task. We used different methods to extract more information from this large dataset than can be currently achieved with a single tool. We identified O-glycopeptides from HCD-spectra (Byonic, PP, MS-Filter), from EThcD data (Byonic, PP) and from the combination thereof (O-Pair). The acceptance criteria were set to minimize the decoy hits in Byonic and PP searches. For O-Pair results we followed the developers’ recommendation. For MS-Filter only the weaker scoring double assignments were eliminated. This acceptance strategy was chosen based on our prior experience with these software. Still from our results (see below) it seems any cut-off threshold is currently a compromise. O-glycopeptides identified are listed individually according to the interpretation methods (Table S11-20, Sheets D-J) and summed up (Table S11-20, Sheet B). Our efforts resulted in 43151 O-glycopeptide PSMs (compiled in Table S22, Sheet A). With all the duplicates removed we can claim that data derived from 26205 precursor selections were assigned. Strangely enough that yielded 27065 unique identifications (Table S22, Sheet B), since 716 scans were assigned differently mostly by 2 different methods, but 5 HCD spectra were interpreted very differently by 3 tools, while O-Pair reported two glycopeptides from the same scans 15 times (respective “Scan#” highlighted in yellow in Table S22, Sheet B).
      The majority of the O-glycopeptides (56%) were assigned by only one of the software. Among the remaining identifications 22.8%, 13.1%, 6%, 1.9% and 0.2% were identified by 2, 3, 4, 5 or all 6 methods, respectively (Figure 2, Table S23, Sheet A). By overlapping assignments we mean identical peptide sequences and overall glycan composition – since the precise number and composition determination of the modifying glycans, and site localization represent additional layers of complexity and as we will present later these issues are rather far from being tackled automatically.
      Figure thumbnail gr2
      Figure 2Overlap of O-glycopeptide assignments. Panel A shows the number of O-glycopeptide PSMs delivered by each search engine and the MS-Filter program. The colors represent the number of different methods identifying the same peptide sequence and overall glycan composition. Panels B and C illustrate the degree of PSM assignment overlap globally and in an UpSet plot [
      • Conway J.R.
      • Lex A.
      • Gehlenborg N.
      UpSetR: an R package for the visualization of intersecting sets and their properties.
      ], respectively. In Panel C BYO stands for Byonic. The plot shows only the overlap groups with at least 50 elements. The full plot can be seen in , Sheet A.

      Comparison of search engines regarding intact O-glycopeptide identification

      Approximately 82% of the assigned PSMs represented glycopeptides with a molecular mass between 1500 and 4500 Da, and the vast majority of these are peptides of 8-25 amino acid residues (Table S23, Sheet B). The average peptide length was 18-19 for all of the methods, except MS-Filter with 15 amino acids only. At the same time Byonic and O-Pair on average tend to assign components of ∼10% higher molecular masses, i.e. higher glycan/peptide ratios, than MS-Filter and PP, and the difference was more marked for the EThcD assignments (Table S23, Sheet C). Figure 3 displays the shared and unique assignments in a precursor ion, charge state and glycan size-dependent manner. The HCD-based approaches delivered significantly more results than EThcD alone (Figures 2 & 3, Table S23, Sheet A). Byonic-HCD searches were on the top, followed by O-Pair, MS-Filter and PP-HCD (∼71%, 60% and 47% of the Byonic’s numbers, respectively). The EThcD assignments were trailing with ∼25% and ∼13% (PP and Byonic, respectively). The HCD-based analyses yielded the most unique assignments. MS-Filter and Byonic produce similar numbers, O-Pair with some contribution from EThcD data yields 25% less unique IDs, while the number of unique hits delivered by PP searches corresponds to ∼30% of the Byonic results. The number of unique identifications derived from EThcD are even lower, ∼13 and 7%, by PP and Byonic respectively. When the same results were delivered by two different methods, only 685 assignments (∼12% of all hits shared by 2 methods) are supported by independent HCD (here O-Pair is considered as such) and EThcD assignments and in ∼8% of such pairs only MS-Filter could ‘interpret’ the HCD data. Similarly, among the 2833 (∼51%) exclusively HCD data-supported (i.e. O-Pair not included) ‘shared by 2’ assignments, MS-Filter delivered the support in ∼40% of the cases (1093/2833). From the HCD&EThcD-based O-Pair hits 1821 (∼33%) were supported by another HCD-based method, MS-Filter delivered 222 of these. Last but not least, Byonic and PP EThcD searches arrived at the same conclusions for 245 PSMs (∼4%). Among assignments shared by 3 methods 556 identifications (17%) are still based solely on HCD spectra, and the majority, 1856 IDs (58%) were O-Pair hits, also supported exclusively by other HCD search results. Approximately 1/4th of the identifications is supported by data from both fragmentation methods individually. Obviously each assignment quartet is supported by data derived from both activation methods, but the biggest group, 502 assignments (34%) still consists of O-Pair hits supported by all the other HCD-based results (Table S23, Sheet A).
      Figure thumbnail gr3
      Figure 3Distribution of overlapping and individual O-glycopeptide assignments. Interpretation methods (listed on the right) and charge states (listed on the top) are depicted separately, green and orange spots indicate the individual (unique, i.e. identified only by the specified approach) and shared (i.e. the same peptide+glycan composition was identified by at least one other method) assignments, respectively. Glycan masses (in Da) are listed on the x axes, while precursor m/z values on the y axes.
      From these observations it seems evident that HCD data must be used in glycopeptide analysis. However, the discrepancy in spectrum interpretations by the different search engines also signals the need for significant improvements. Furthermore, we have to emphasize that only ET(hc)D enables the site localization and the differentiation between single and multiple glycosylation.
      After compiling all O-glycopeptide assignments we attempted to assess whether there are any significant differences in the urinary glycoproteome that indicate health or disease. Glycan- and peptide-level comparisons (Table S22 sheet C and D, respectively) tabulating the PSM numbers indicated some potential differences between the different donor groups. However, the manual data evaluation revealed several misassignments. The seemingly increased number of Tn and T-antigens is frequently linked to N-glycosylation consensus motif-containing peptides (Table S22, Sheet D), where either just the GlcNAc is linked to the Asn or a fucosylated GlcNAc and a Met-sulfoxide are hiding behind the HexNAcHex assignments. Since separate N-glycopeptide searches were also performed, comparison of the results indicated that double assignments do occur (Figure S3, Table S11-20, Sheet M). Similarly, a uniquely high number of HexNAc2HexNeuAc modifications were linked to one of the bladder cancer patients, the majority of it to Protein AMBP peptide, GPVPTPPDNIQVQENFNISR (Table S22, Sheet D). Careful investigation of both HCD and EThcD data revealed that the glycan composition is a combination of a truncated N-glycan (GlcNAc) and a monosialo core 1 type O-glycan (GalNAcGalNeuAc) at Asn-36 and Thr-24, respectively (Figure S4). This finding illustrates that potential N-glycopeptides also qualify as candidates for O-glycosylation. These examples also demonstrate the often ignored presence of truncated N-glycans that cannot be removed from the peptides readily using PNGase F [
      • Chu F.K.
      Requirements of cleavage of high mannose oligosaccharides in glycoproteins by peptide N-glycosidase F.
      ] and do interfere with the O-glycopeptide characterization. We also tried to compare the presence of O-acetyl sialic acids in the different samples and consistent detection of blood-type antigens. Scrutinizing those data we encountered more complex discrepancies.

      Suggestions for improvement of automated data interpretation of O-glycopeptides

      After realizing that the false identification rate is most likely much higher than suggested by the strict, probability-based cut-off values, our focus shifted to a more careful evaluation of the search results, and as an outcome of this process we attempted to establish rules that ideally should be followed in the future.
      Most search engines were developed for the identification of tryptic peptides, and gradually grew into more complex packages, permitting and evaluating non-specific cleavages as well as a wide variety of covalent modifications. Glycosylation is among the most difficult posttranslational modifications to characterize, since it requires two different activation methods to achieve the basic structural assignment of a glycopeptide, i.e. to decipher the sequence of the peptide (beam-type CID (HCD)) and to establish the composition of the individual glycans and their attachment sites (ET(hc)D).
      Moreover, the search space to be negotiated by the data interpretation software is unusually large when it comes to in-depth characterization of native glycopeptides in body fluids due to multiple reasons. Firstly, proteolytic activity is rampant in urine, therefore nonspecific enzymatic cleavages have to be considered: among the most reliable assignments, delivered by all, 5 or just 4 methods semitryptic sequences were identified in approximately 27, 33, and 55%, respectively. Secondly, a multi-entry glycan database has to be considered as variable modification. Specifically for urine we have shown that in addition to the most common mucin type core-1 and core-2 glycans at least 40 other structures may also decorate the peptides [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]. Thirdly, due to the frequent occurrence of Thr and Ser the majority of proteolytic peptides feature multiple potential glycosylation sites, therefore multiple glycans may modify a single peptide in all kinds of combinations. Finally, both N- and O-glycans might be attached to peptides, and telling apart these molecules is not as straightforward as might seem. The glycan databases for these modifications partly overlap, and distinct fragmentation properties are not considered currently by the search engines. Removing N-glycans by PNGase F prior to O-glycosylation analyses only partly addresses this issue as the enzyme is not efficient in the removal of small truncated N-glycans [
      • Chu F.K.
      Requirements of cleavage of high mannose oligosaccharides in glycoproteins by peptide N-glycosidase F.
      ], and there is no similar universal endoglycosidase for the removal of O-glycans (e.g. O-glycosylation needs also to be considered in N-glycosylation studies).
      In the current study we used software that was traditionally developed, and thus, the first step is finding the peptide that is glycosylated. This can be achieved successfully from either HCD or ET(hc)D spectra. We strongly feel these data should be used in concert. For O-glycopeptide assignments the presence of the gas-phase deglycosylated peptide ion, i.e. Y0 in the HCD spectra should be required or at least highly valued, even if the glycopeptide is assigned from ET(hc)D data. Examples for the necessary presence of Y0 are presented in Figure S5, and in the Evaluation/Comments column of Table S22. The majority of Y0 fragments were observed as singly or doubly charged, rather abundant ions. However, long, low charge-density peptide sequences might produce Y0 ions out of the monitored mass range, while highly glycosylated shorter peptides may yield less abundant Y0 fragments. Statistical assessment of the presence and the intensity of the Y0 ion as a function of peptide length or composition is out of the scope of the present study, however we encourage future software development in this direction.
      As the next step we have to agree on a minimum number of peptide fragments to accept an assignment as reliable. Furthermore, these fragment ions have to represent both ends of the peptide, and b-y pairs formed by the preferential N-terminal cleavage at Pro residues should not count as independent proof, since these do not convey additional information about the sequence. Covering both ends of the sequence is especially important for long peptides that frequently do not yield abundant Y0 fragments within the monitored mass range. The importance of this rule is illustrated with an example where a C-terminal sequence tag as well as Y0 and Y1 were detected in the HCD spectrum (Figure 4, upper panel), and O-Pair identified the glycopeptide as LTLSGLSK modified with HexNAc2Hex2NeuAc2 (see Table S22, Sheet A, file 18041707, scans 5632 and 5854), even when the EThcD interpretation by PP pointed to VATTVISK modified with two trisaccharides (Figure 4 lower panel). This misidentification was not a one-hit-wonder, HCD data most likely derived from VATTVISK were assigned to a series of different sequences: LTLSGLSK, SILSALSK, GLTVTLSK, VLTTGLSK, each featuring the same accurate mass: 818.498 and the same 3 C-terminal residues. HCD data with a more comprehensive fragment ion series enabled the unambiguous assignment of VATTVISK, as illustrated on the later eluting isomeric glycoform bearing the core-2 hexasaccharide (Figure S6).
      Figure thumbnail gr4
      Figure 4MS/MS data of VAT(NeuAcGalGalNAc)T(NeuAcGalGalNAc)VISK misidentified as LTLS(HexNAc2Hex2NeuAc2)GLSK. HCD data fits both sequences, while ions supporting only the first glycopeptide, and also providing site assignments are highlighted in blue. Ions related to glycan fragmentation (depicted in red) fit both glycopeptides. ‘pr’ labels the precursor ion, m/z 710.989(3+) and its charge-reduced form. The asterisk (*) indicates the singly charged form of a coeluting 2+ ion. ‘p’ as well as Y0 stands for the peptide. Sugar units still attached to it are listed (in red), oxonium ions are labeled with SNFG symbols.
      This example also illustrates that information from the spectrum obtained by the other activation method (e.g. confirmatory or contradictory information in the HCD and ET(hc)D data of the same precursor) could make or break the original assignment. However, presently O-Pair performs the primary identification from HCD spectra, and the corresponding EThcD data are considered for further support only and for glycan localization assignment(s). We believe that an alternative, a reverse O-Pair workflow is also desirable, since we encountered several instances where the HCD’s contribution was very limited, while EThcD provided sequence coverage as well as glycan localization information. As an example, from the HCD-EThcD (STable 22, File 18041713, scans 5552 &5553) spectra shown in Figure 5 LNAT(HexNAc2Hex2NeuAc2Ac)LR was assigned by O-Pair. However, the HCD data do not feature the corresponding Y0 ion (m/z 687.415), and the peptide sequence ions on which the identification is based, are of very weak intensity. Since the characteristic oxonium ions (316, 334) of N,O-diacetyl neuraminic acid are also absent, the glycan composition has to be incorrect. Starting with the EThcD data could have prevented this misidentification, as PP identified IPTNAR bearing the blood-type A antigen (HexNAc3Hex2FucNeuAc) showing nearly complete sequence coverage and the Y0 ion detected in HCD provided further support.
      Figure thumbnail gr5
      Figure 5HCD and EThcD data of m/z 681.300 (3+) identified as LNAT(HexNAc2Hex2NeuAcNeuAcAc)LR by O-Pair (score: 15.1, Q: 0.0025). A reverse O-Pair workflow, i.e. starting from the EThcD spectrum could have delivered the correct assignment as IPT(HexNAc3Hex2FucNeuAc)NAR (from PP search results). The size of the peptide is verified by the Y0 ion in the HCD. Fragments supporting the correct assignment are printed in blue, shared ions are in black, unique fragments for the original (incorrect) assignment are in red. In the EThcD spectrum the ion marked with * represent the charge-reduced form of a coeluting 2+ precursor.
      Obviously, once the peptide is assigned the mass of the glycan composition is also revealed, usually unambiguously. The Y0 requirement eliminates interferences between the potential peptide modifications and glycan composition changes. For example, without knowing the peptide mass accurately a 16Da mass difference may be translated into a Met oxidation or a Fuc-Hex or a NeuAc-NeuGc difference in the oligosaccharide as discussed above. Thus, the glycan composition calculated from the mass difference of Y0 and the measured mass of the molecule should be translated into individual glycans. Obviously, the higher the level of glycosylation the harder this job becomes. To make this process a bit easier, we could and should rely on diagnostic oxonium ions to ascertain the presence of certain building units in the modifying glycans. The HexNAc related m/z 204.087 ion already became the hallmark of most glycosylation studies, enabling the most efficient HCD-product- ion-dependent ET(hc)D data acquisition approach [
      • Saba J.
      • Dutta S.
      • Hemenway E.
      • Viner R.
      Increasing the Productivity of Glycopeptides Analysis by Using Higher-Energy Collision Dissociation-Accurate Mass-Product-Dependent Electron Transfer Dissociation.
      ] that was successfully applied in both N- and O-glycosylation analyses. Sialic acids also produce abundant diagnostic ions that could also be exploited during automated data interpretation. Accordingly, when the diagnostic 292 & 274 ions are not present in the HCD spectra, Neu5Ac containing glycans should not be permitted in the assignments. Similarly glycolyl, O-acetyl or O,O-diacetyl sialic acids produce diagnostic fragments [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Medzihradszky K.F.
      • Kaasik K.
      • Chalkley R.J.
      Characterizing sialic acid variants at the glycopeptide level.
      ]. High mass accuracy measurement affords unambiguous detection of all these ions. We attempted to identify data derived from O-acetyl sialic acid containing glycopeptides by filtering for the presence of m/z 316.1027, formed via water loss from the Neu5,9Ac oxonium ion. ∼4% of the O-glycopeptide identifications featured this ion among the top twenty most abundant fragment ions in HCD. However, manual evaluation of a subset of spectra indicated that some correct identifications did not make the intensity cut-off (see Table S22 sheet A, evalutions/comments). At the same time, in 47 assignments the identified glycan did not feature O-acetyl sialic acid, despite the strong presence of its diagnostic ions (Table S22 sheet A). Obviously, in a complex mixture precursor ion interference might be also blamed for the detection of such “reporter ions”, but we believe it is worthwhile to have a second look at the original assignments. At the same time, the lack of the above mentioned diagnostic fragments in the HCD spectra is a good indication that such units are not part of the modifying glycans.
      Additional glycan fragment ions, though might not be specific for any glycan structure, used in combination may further enhance the reliability of glycopeptide identifications. It is again beneficial to use HCD and EThcD data in concert. Lower m/z glycan ions tend to be more abundant in HCD, while in EThcD single bond cleavages dominate and even larger glycans up to seven monosaccharide units might survive the mild activation (15% NCE) [
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]. The fragment ions m/z 407 and 569 indicate the presence of a glycan with HexNAc2 and HexNAc2Hex connectivity, respectively, and may help to distinguish a core-2 glycan from two core-1 type modifications [
      • Park G.W.
      • Lee J.W.
      • Lee H.K.
      • Shin J.H.
      • Kim J.Y.
      • Yoo J.S.
      Classification of Mucin-Type O-Glycopeptides Using Higher-Energy Collisional Dissociation in Mass Spectrometry.
      ]. These ions tend to show up in HCD, while they are typically missing from EThcD as multiple glycosidic bond cleavages are necessary to generate them. These observations are illustrated with the earlier mentioned VATTVISK glycoforms: the peptide carrying two core-1 trisaccharides does not display the above ions (Figure 4), while the peptide decorated with the core-2 hexasaccharide, albeit weakly, does produce these ions in HCD (Figure S6). Fucose containing glycans also may produce characteristic ions. While the HexNAcHexFuc oxonium ion at m/z 512 can be observed both in HCD and EThcD, larger Fuc-containing B ions characteristic to the A and B antigens (m/z 715 for HexNAc2HexFuc and 674 for HexNAcHex2Fuc, respectively) can frequently be detected in EThcD (Figure 5 and Figure S7). Similarly, glycans with disialic acid units might yield oxonium ions at m/z 583 for Neu5Ac2 and at m/z 625 for Neu5AcNeu5,9Ac in EThcD [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ], while their respective water loss ions might be detected in HCD. Similarly, the Y ions especially in EThcD, such as Fuc, HexNAc, Hex losses from the precursor ion, may reveal the identity of terminal structures (see Figure 5 scheme). While all these bits of information may help to correctly assign the glycan composition and even the individual glycans, our efforts might be undermined by the fact that Na-, and K-adduct formation may occur, and/or monoisotopic precursor masses cannot always be determined unambiguously, i.e. the mass difference between the peptide and the precursor ion may be misleading (adduct formation) or may not be accurate (faulty peak-picking). We noticed that several abundant glycan fragments retained the metal ion in Na-adduct spectra, their usefulness in the assignments has to be evaluated further. Faulty peak-picking is harder to correct, the most typical example for this is the recurring Fuc2 vs NeuAc question. As mentioned earlier Fuc-loss indicating Y-fragments may hold the answer.
      Preferential single bond cleavages in EThcD may yield larger B ions that can confirm the identity of the glycans. For example, m/z 1313 confirms that a core-2 hexasaccharide decorates the peptide (see Figure S6). We did detect such B-fragments: core-1 tetra- and pentasaccharides (m/z 948, 990, 1032 for GalNAc(NeuAc)GalNeuAc with 0-2 Neu5,9Ac and 1239, 1281, 1323 for GalNAc(NeuAc2)GalNeuAc with 0-2 Neu5,9Ac) and core-2 hexasaccharides (m/z 1313, 1355, 1397 for GalNAc(GlcNAcGalNeuAc)GalNeuAc with 0-2 Neu5,9Ac and 1168 for GalNAc(GlcNAcGalFuc)GalNeuAc representing a H antigen capping unit), even a core-2 heptasaccharide decorated with the A antigen (m/z 1371 for GalNAc(GlcNAcGalNeuAc)Gal(Fuc)GlcNAc) could sporadically be detected [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ,
      • Pap A.
      • Klement E.
      • Hunyadi-Gulyas E.
      • Darula Z.
      • Medzihradszky K.F.
      Status Report on the High-Throughput Characterization of Complex Intact O-Glycopeptide Mixtures.
      ]. These ions are typically of low intensity hence it is unlikely that precursor ion interference would be responsible for their presence. Thus, their detection could be rewarded during automated data interpretation.
      Preferential single bond cleavages in EThcD also can provide further insights into the glycan structure. Sialic acid related B ions showed that the O-acetyl sialic acid in core-1 type glycans was Gal-linked in the tetrasaccharide, but decorated the GalNAc when present as a terminal unit in disialic acid in the pentasaccharide [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ]. Furthermore, the single bond cleavages also enabled the assignment of the O-acetyl sialic acid position in chromatographically resolved glycoforms bearing core-2 hexasaccharides [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ] and characterization of isomeric oligosaccharides displaying the blood-type antigen A on different arms of the core-2 glycan [
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]. Since in EThcD the Y-ion formation is also controlled by preferential single bond cleavages the terminal positions of single sugar units as well as multi unit assemblies can be verified from these fragments. We demonstrated that based on the fragmentation pattern isomeric core-2 glycans can be distinguished, for example, it can be determined whether the A blood-type determinant is located on the core GalNAc or on the GlcNAc linked to it [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ].
      In summary, HCD and EThcD data should be used in concert. We recommend the following for more reliable
      peptide identification:
      • 1)
        Y0 rule - fine-tuned, based on further statistical analysis (HCD)
      • 2)
        minimum 5 independent backbone fragments, covering both termini - data from both activation methods should be considered
      • 3)
        “reverse O-Pair workflow”, i.e. starting with the EThcD data
      glycan composition assignment:
      • 1)
        Y0 rule - this is the only way to exclude fortuitous peptide modifications (HCD)
      • 2)
        Diagnostic fragment ion requirement for the different sialic acids (HCD, EThcD)
      • 3)
        Reward for diagnostic building unit losses, such as HexNAc, Hex, Fuc loss from the precursor (EThcD).
      individual glycan assignment:
      • 1)
        Reward for the detection of some characteristic oligosaccharide fragments in HCD and in EThcD- should be fine-tuned by statistical analysis [
        • Park G.W.
        • Lee J.W.
        • Lee H.K.
        • Shin J.H.
        • Kim J.Y.
        • Yoo J.S.
        Classification of Mucin-Type O-Glycopeptides Using Higher-Energy Collisional Dissociation in Mass Spectrometry.
        ]
      • 2)
        EThcD Y and B fragment evaluation, considering the single bond cleavage rule (at NCE 15%)
      Finally, we have some recommendations for telling apart data acquired on N- or O-linked glycopeptides. While larger glycans can frequently be rendered to the appropriate site by knowing the glycan biosynthetic pathways, some smaller structures can easily be misinterpreted. For example, the truncated GlcNAc1-2 glycan can be misinterpreted as 1-2 Tn antigens (GalNAc), or the paucimannose, GlcNAc2Man2 can be overlooked as two T antigens (GalNAcGal) or an asialo core-2 O glycan (Gal(GalGlcNAc)GalNAc).
      The different relative intensities of Y ions observed in HCD could be exploited here. In HCD the Y1 ion is always more abundant than Y0 for N-linked glycopeptides, and respective peptide sequence ions typically carry the innermost GlcNAc. On the other hand, O-linked structures with up to 5-6 monosaccharide units tend to produce more abundant Y0 ions and peptide sequence ions are predominantly present fully deglycosylated. Therefore, glycopeptides with only N-linked structures or carrying both N- and O-glycans could be identified based on the Y1/Y0 ratio, and peptide sequence ions observed in HCD and EThcD can resolve the finer structural details. Furthermore, the oxonium ion intensity profile of GlcNAc and GalNAc is different [
      • Halim A.
      • Westerlind U.
      • Pett C.
      • Schorlemer M.
      • Rüetschi U.
      • Brinkmalm G.
      • Sihlbom C.
      • Lengqvist J.
      • Larson G.
      • Nilsson J.
      Assignment of saccharide identities through analysis of oxonium ion fragmentation profiles in LC-MS/MS of glycopeptides.
      ] and can be exploited to strengthen the identifications especially if only either GlcNAc or GalNAc is present in the glycan.
      Even without further software development, the presence of an N-glycosylation consensus motif and the ion intensity ratio of m/z 138 and 144 should be included in the results output of the software as already performed by O-Pair.

      O-glycosylation landscape of selected proteins

      The rules drawn up above are based on our observations during the manual interpretation of hundreds of MS/MS spectra. Evidently, manual evaluation of all identifications is unattainable, therefore we decided to investigate a few proteins in detail. Partly to further scrutinize the reliability of the assignments, and also to validate modification sites and characterize microheterogeneity. Probably the most exciting subset of proteins are those with blood group antigens as currently our knowledge on the occurrence of these structures on individual proteins and sites is quite limited.
      Altogether, 683 PSMs (derived from 405 precursor selections), 97 sequences from 49 proteins, represented glycopeptides carrying ABO blood-group antigens on core-2 O-glycans (HexNAc2Hex2Fuc1NeuAc1, HexNAc3Hex2Fuc1NeuAc1 or HexNAc2Hex3Fuc1NeuAc1 for the H, A or B antigen, respectively). Strangely, over half of the PSMs (347/683) signaled the presence of B antigens, although seven of the ten donors were of blood groups A or O (the blood group of two donors was unknown, see Table S1). We also observed that the B antigen carrying glycoforms of ITIH4 peptides coeluted with glycoforms bearing HexNAc2Hex2NeuAc2. Glycoforms featuring more sialic acids usually elute later than less acidic ones when formic acid is used in the mobile phase. Thus, we believe the +1329 Da glycoform here represents an ammonium adduct of that hexasaccharide (additive mass: 1312.455+17.027=1329.482) instead of the assigned HexNAc2Hex3Fuc1NeuAc1 structure (additive mass: 1329.471). Incomplete removal of the ammonium salt used during the affinity chromatography makes such adduct formation feasible. We wanted to identify a reliable set of site-specific O-linked blood-type identifications, and since we have already encountered several dubious assignments associated with longer sequences, we have decided to remove peptides longer than 16 residues from the list. Furthermore, a candidate that featured the N-glycosylation sequon was also discarded along with those assignments that were not corroborated by both HCD and EThcD data. The remaining subset consists of 10 peptides representing 7 proteins (Fractalkine, Insulin-like growth factor II, Macrophage colony-stimulating factor 1, Protein HEG homolog 1, Protein YIPF3, SPARC-like protein 1 and TGF-beta receptor type 2) (Table S22, Sheet E). All identifications representing donors with known ABO blood groups matched the expected glycan structures. PSMs from the two patients of unknown blood-types (Patients 3 and 10, Table S1) unequivocally indicated that the blood group of these patients is B. Now if we introduce back all PSMs of the ten peptides with the blood-type antigens, the list contains 188 IDs, with only one MS-Filter hit indicating a B-antigen structure incorrectly. These results indicate that the corresponding sites in the above proteins are consistently carrying the ABO blood-group epitopes, and these glycoforms are of significant abundance. Thus, these structures also have to be considered during comprehensive characterization of glycoproteins and blood typing prior to biomarker studies is highly advisable.
      Furthermore, we examined all O-glycopeptide PSMs related to five of the above proteins: SPARC-like Protein 1 (Uniprot Q14515), Insulin-like growth factor II (Uniprot P01344), Fractalkine (Uniprot P78423), TGF-beta receptor type 2 (Uniprot P37173), and Protein HEG homolog 1 (Uniprot Q9ULI3). Our observations are included in the evaluation/comments column of Table S22 sheet A. Please note that we considered all available data (primarily both HCD & EThcD spectra, and occasionally MS1 data and retention times as well) when deciding whether an assignment was correct.
      Earlier we have already demonstrated the microheterogeneity of the C-terminal peptide of YIPF3 (Uniprot Q9GZM5, [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]), while Macrophage colony-stimulating factor 1 (Uniprot P09603) did not yield unambiguous sequence confirmation for the blood-group related glycopeptides.
      We compared our findings to data listed in the Uniprot database and in two previous studies that acquired site-specific O-glycosylation data. Uniprot entries for the five proteins selected were compiled from three reports [
      • Nilsson J.
      • Rüetschi U.
      • Halim A.
      • Hesse C.
      • Carlsohn E.
      • Brinkmalm G.
      • Larson G.
      Enrichment of glycopeptides for glycan structure and attachment site identification.
      ,
      • Halim A.
      • Nilsson J.
      • Rüetschi U.
      • Hesse C.
      • Larson G.
      Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
      ,
      • Halim A.
      • Rüetschi U.
      • Larson G.
      • Nilsson J.
      LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
      ], all applying the same capture-and-release workflow to enrich sialic acid containing glycoproteins. Using CID and ECD activation the authors characterized glycosylation in human CSF [
      • Nilsson J.
      • Rüetschi U.
      • Halim A.
      • Hesse C.
      • Carlsohn E.
      • Brinkmalm G.
      • Larson G.
      Enrichment of glycopeptides for glycan structure and attachment site identification.
      ,
      • Halim A.
      • Rüetschi U.
      • Larson G.
      • Nilsson J.
      LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
      ] and urine [
      • Halim A.
      • Nilsson J.
      • Rüetschi U.
      • Hesse C.
      • Larson G.
      Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
      ]. Recently, Zhao et al also published data on the urinary O-glycoproteome [
      • Zhao X.
      • Zheng S.
      • Li Y.
      • Huang J.
      • Zhang W.
      • Xie Y.
      • Qin W.
      • Qian X.
      An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
      ]. O-glycopeptides were enriched from tryptic digests using HILIC and identified sialic acid-containing glycoforms separately from glycoforms void of this unit using HCD and EThcD data acquired on PNGase F treated samples. King et al [
      • King S.L.
      • Joshi H.J.
      • Schjoldager K.T.
      • Halim A.
      • Madsen T.D.
      • Dziegiel M.H.
      • Woetmann A.
      • Vakhrushev S.Y.
      • Wandall H.H.
      Characterizing the O-glycosylation landscape of human plasma, platelets, and endothelial cells.
      ] analyzed O-glycopeptides isolated by lectin affinity chromatography from desialylated proteolytic digests of plasma, platelets, and endothelial cell samples. Although the sample source was different in this study, the authors reported on the modification of ∼650 proteins therefore we decided to include their results as reference for the modification sites.
      In summary, the results from different sources showed limited overlap (Figure S8). For the 5 proteins selected 49 glycosylation sites were identified, but only 2 were reported by all studies, and an additional 4 were found in at least 3 studies. King et al [
      • King S.L.
      • Joshi H.J.
      • Schjoldager K.T.
      • Halim A.
      • Madsen T.D.
      • Dziegiel M.H.
      • Woetmann A.
      • Vakhrushev S.Y.
      • Wandall H.H.
      Characterizing the O-glycosylation landscape of human plasma, platelets, and endothelial cells.
      ] and the current study contributed the most unique sites, while most of the sites listed in Uniprot, based on urine [
      • Zhao X.
      • Zheng S.
      • Li Y.
      • Huang J.
      • Zhang W.
      • Xie Y.
      • Qin W.
      • Qian X.
      An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
      ,
      • Halim A.
      • Nilsson J.
      • Rüetschi U.
      • Hesse C.
      • Larson G.
      Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
      ] or CSF analysis [
      • Nilsson J.
      • Rüetschi U.
      • Halim A.
      • Hesse C.
      • Carlsohn E.
      • Brinkmalm G.
      • Larson G.
      Enrichment of glycopeptides for glycan structure and attachment site identification.
      ,
      • Halim A.
      • Rüetschi U.
      • Larson G.
      • Nilsson J.
      LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
      ] were confirmed by other studies. Figure 6, Figure 7 show the glycan structures assigned unambiguously to the listed glycosylation sites, Sheet F of Table S22 specifies those additional glycan compositions that were detected by us on certain sequence stretches but could not be resolved from the data.
      Figure thumbnail gr6
      Figure 6Comparison of the O-glycan landscape of Protein HEG homolog 1. O-glycan structures are illustrated following the SNFG recommendations. Glycosylation sites that are reported in the UniProt database are indicated with gray markers. O-glycosylation sites not reported in Uniprot are indicated with red markers. The green line at the N-terminus denotes the signal peptide. Domain- and region-specific information of the protein was collected from the UniProt database: A: EGF-like 1, B: EGF-like 2; calcium-binding.
      Figure thumbnail gr7
      Figure 7Comparison of the O-glycan landscape of four additional proteins identified with O-glycans containing ABO blood group antigens. O-glycan structures are illustrated following the SNFG recommendations. Glycosylation sites that are reported in the UniProt database are indicated with gray markers. O-glycosylation sites not reported in Uniprot are indicated with red markers. The green line at the proteins’ N-termini denotes the signal peptide. Domain- and region-specific information of the proteins was collected from the UniProt database: A) B, C, A and D (marked with a red asterisk) regions of the Insulin-like growth factor II. B) Regions of Fractalkine: A: Chemokine and involved in interaction with ITGAV:ITGB3 and ITGA4:ITGB1, B: Mucin-like stalk. C) Domain A of TGF-beta receptor type-2: Protein kinase. D) Domains of SPARC-like protein 1. A: Follistatin-like, B: Kazal-like, C: EF-hand. The A and B domains overlap. The vertical red line shows the start of the B domain.
      The O-glycosylation studies quoted above analyzed plasma, urine and CSF, and followed different protocols. Thus, not entirely surprising that the resulting findings were also different. The sialic acid-capture upon release yields all asialo structures. Thus, we lose diversity, but at the same time this simplification increases the sensitivity by combining the signals of originally different glycoforms, and the glycan database will also shrink accordingly. The sites listed in Uniprot were identified using this enrichment method, a less sensitive mass spectrometric methodology (CID-MS3, ECD), but the data interpretation was performed very carefully [
      • Nilsson J.
      • Rüetschi U.
      • Halim A.
      • Hesse C.
      • Carlsohn E.
      • Brinkmalm G.
      • Larson G.
      Enrichment of glycopeptides for glycan structure and attachment site identification.
      ,
      • Halim A.
      • Nilsson J.
      • Rüetschi U.
      • Hesse C.
      • Larson G.
      Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
      ,
      • Halim A.
      • Rüetschi U.
      • Larson G.
      • Nilsson J.
      LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
      ]. The King-study [
      • King S.L.
      • Joshi H.J.
      • Schjoldager K.T.
      • Halim A.
      • Madsen T.D.
      • Dziegiel M.H.
      • Woetmann A.
      • Vakhrushev S.Y.
      • Wandall H.H.
      Characterizing the O-glycosylation landscape of human plasma, platelets, and endothelial cells.
      ] used a selective enrichment method, sacrificing diversity for more efficient glycosylation site determinations. The closest to our approach was the analysis of urinary proteome [
      • Zhao X.
      • Zheng S.
      • Li Y.
      • Huang J.
      • Zhang W.
      • Xie Y.
      • Qin W.
      • Qian X.
      An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
      ]. HILIC enrichment of the glycopeptides, followed by HCD and EThcD analysis could have allowed the identification of most structures we found. The glycan database they built did not feature all the glycan structures we reported and searched for. In addition, the vast majority of their assignments are based solely on HCD data that usually does not permit accurate individual glycan and localization site assignments. In addition, considering nonspecific cleavages would have been essential for the identification of certain glycosylation sites. For example, the N-terminal peptides of the TGF beta and SPARC-like proteins are not tryptic, but the products of enzymatic processing.

      Summary and Conclusions

      WGA affinity chromatography fractions enriched in both N- and O-glycopeptides from the urine of 10 individuals were analyzed acquiring HCD and HexNAc oxonium ion-triggered EThcD spectra. Using these data we established some rules about EThcD fragmentation and reported the presence of more than 30 unexpected sialo-glycans, among them even some isomer pairs [
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ]. These discoveries have proven that a preliminary ‘wild card’ or ‘open mass addition’ search might reveal novel or unexpected components that would be otherwise overlooked. For example, O-acetylated neuraminic acids reported by us, would not survive the oligosaccharide release, thus, their presence would not be suspected even after a glycan-pool analysis.
      In this manuscript we present an in-depth analysis of all the data files focusing on O-glycosylation. We share our raw files as well as our data interpretation methods and lists. We are aware that our compilations contain several incorrect assignments as pointed out above, but these were not discarded because they serve as illustration of the occurrence, frequency and potential reasons for misinterpretation. Unfortunately, these may occur regularly in any large scale glycopeptide analysis. A community study on a much simpler, and mostly N-glycopeptide mixture already demonstrated that even with the same software very different results can be achieved, and offered guidelines for setting up search parameters for different purposes [
      • Kawahara R.
      • Chernykh A.
      • Alagesan K.
      • Bern M.
      • Cao W.
      • Chalkley R.J.
      • Cheng K.
      • Choo M.S.
      • Edwards N.
      • Goldman R.
      • Hoffmann M.
      • Hu Y.
      • Huang Y.
      • Kim J.Y.
      • Kletter D.
      • Liquet B.
      • Liu M.
      • Mechref Y.
      • Meng B.
      • Neelamegham S.
      • Nguyen-Khuong T.
      • Nilsson J.
      • Pap A.
      • Park G.W.
      • Parker B.L.
      • Pegg C.L.
      • Penninger J.M.
      • Phung T.K.
      • Pioch M.
      • Rapp E.
      • Sakalli E.
      • Sanda M.
      • Schulz B.L.
      • Scott N.E.
      • Sofronov G.
      • Stadlmann J.
      • Vakhrushev S.Y.
      • Woo C.M.
      • Wu H.Y.
      • Yang P.
      • Ying W.
      • Zhang H.
      • Zhang Y.
      • Zhao J.
      • Zaia J.
      • Haslam S.M.
      • Palmisano G.
      • Yoo J.S.
      • Larson G.
      • Khoo K.H.
      • Medzihradszky K.F.
      • Kolarich D.
      • Packer N.H.
      • Thaysen-Andersen M.
      Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis.
      ], although the big question how to estimate FPR for glycopeptides has not been solved yet.
      In our study we have chosen rather limiting parameters for database searches and strict acceptance criteria. PTM analysis in general leads to significant search space expansion. O-glycosylation is even more problematic in this aspect, since it represents multiple different structures, multiple potential sites within the same peptide sequence, no consensus sequences, and distinctive fragmentation interfering with the assignment of the underlying amino acid sequence. Considering an increased glycan database and permitting each glycan as modifier multiple times increases the search space exponentially, especially when other search space widening factors characterize the samples, such as non-specific cleavages, N- and O-glycosylation on the same sequence, glycan compositions that may represent single or multiple glycans, glycan combinations that may represent different individual glycans, or positional isomers or glycan isomers or the combinations thereof. Thus, we used our newly gained knowledge about the glycan pool in an iterative fashion. Permitting the extended glycan list on a prefiltered protein database seems reasonable and speeds up the process. Limiting the number of modifications also helps but will eliminate correct identifications as well. We used Byonic for the interpretation of both HCD and EThcD data aiming the identification of unmodified sequences as well as N- and O-glycopeptides as this is a commercially available, popular software used by the glycoscience research community. We explored additional tools for the most comprehensive O-glycopeptide characterization. Protein Prospector was used to assign O-glycopeptides from both HCD and EThcD data. In addition, we used the MS-Filter program to identify potential glycoforms based on the presence of diagnostic Y0 and Y1 fragments. Lastly, we tested the recently developed O-Pair that uses the data provided by the two activation methods in concert. We have to emphasize that despite the high number of glycopeptide identifications, HCD by itself is not suitable for O-glycopeptide characterization [
      • Darula Z.
      • Medzihradszky K.F.
      Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go.
      ,
      • Riley N.M.
      • Malaker S.A.
      • Driessen M.D.
      • Bertozzi C.R.
      Optimal Dissociation Methods Differ for N- and O-Glycopeptides.
      ], since the number, size, composition and site localization of the modifying glycans cannot be determined. This applies even when an endo-glycoprotease is used for the generation of O-glycopeptides [
      • Riley N.M.
      • Malaker S.A.
      • Bertozzi C.R.
      Electron-Based Dissociation Is Needed for O-Glycopeptides Derived from OpeRATOR Proteolysis.
      ,
      • Vainauskas S.
      • ne Guntz H.
      • McLeod E.
      • McClung C.
      • Ruse C.
      • Shi X.
      • Taron C.H.
      A Broad-Specificity O-Glycoprotease That Enables Improved Analysis of Glycoproteins and Glycopeptides Containing Intact Complex O-Glycans.
      ] as all enzymes may miss cleavages especially in densely glycosylated sequence stretches and certain glycans may prevent the proteolysis. In addition, in certain cases partial digestion could be beneficial even with these enzymes. For example, to characterize macro-heterogeneity in combination with site-specific variations. Unfortunately, EThcD that may deliver much more comprehensive information about glycopeptides is a much less efficient activation method. Thus, we have to use both sets of fragmentation data together. O-Pair’s attempt at this job is a very promising one, but far from complete. We have already discussed the advantages and drawbacks of the individual methods. Here we would like to draw attention to some common issues. The assignment of long glycopeptides is definitely a challenge and it would require further investigation of what software tools or novel analytical methods could improve this situation. Similarly, the presence of both N- and O-glycosylation in a mixture or even on the same peptide represents an unresolved issue, although some initial steps have been taken to address the latter problem. Last but not least, we think that a scoring approach similar to evaluating cross-linked peptides by Protein Prospector could be more efficient. The amino acid sequence should be assigned based on the peptide fragments, but the glycan B and Y fragments also have to be evaluated. The assignment would be considered reliable only if both ‘halves’ of the molecule received a convincing score. As pointed out in the discussion the presence of certain diagnostic glycan fragments should be required for even considering structures containing the corresponding sugar units. The retention times of the different glycoforms also could be used to strengthen or weaken certain assignments, or as shown above to indicate potential non-covalent adduct formation. Obviously further datasets and detailed investigations are necessary to establish the appropriate rules. We feel that besides innovative computer programs human intervention is still necessary to assess the reliability of the new data interpretation methods. We hope that our data and our observations will aid the development of such new tools.

      Data availability

      LC-MS raw files, peak lists and the Byonic and O-Pair result files have been uploaded to Massive (https://massive.ucsd.edu/ProteoSAFe/static/massive.jsp). Project identifier: MSV000090536. Download via FTP: ftp://[email protected]
      Username: MSV000090536, Password: a
      O-Pair search results (.psmtsv) can be viewed using O-Pair. Byonic search results (.byrslt) can be viewed in Byonic Viewer.
      The search keys for the MS-Viewer files showing the results of Protein Prospector and MS-Filter searches are listed in Table S26.

      Supplemental data

      This article contains supplemental data [
      • Zhao X.
      • Zheng S.
      • Li Y.
      • Huang J.
      • Zhang W.
      • Xie Y.
      • Qin W.
      • Qian X.
      An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
      ,
      • Darula Z.
      • Pap Á.
      • Medzihradszky K.F.
      Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
      ,
      • Pap A.
      • Tasnadi E.
      • Medzihradszky K.F.
      • Darula Z.
      Novel O-linked sialoglycan structures in human urinary glycoproteins.
      ,
      • Baker P.R.
      • Medzihradszky K.F.
      • Chalkley R.J.
      Improving software performance for peptide electron transfer dissociation data analysis by implementation of charge state- and sequence-dependent scoring.
      ,
      • Chalkley R.J.
      • Medzihradszky K.F.
      • Darula Z.
      • Pap A.
      • Baker P.R.
      The effectiveness of filtering glycopeptide peak list files for Y ions.
      ,
      • Domon B.
      • Costello C.E.
      A systematic nomenclature for carbohydrate fragmentations in FAB-MS/MS spectra of glycoconjugates.
      ,
      • Nilsson J.
      • Rüetschi U.
      • Halim A.
      • Hesse C.
      • Carlsohn E.
      • Brinkmalm G.
      • Larson G.
      Enrichment of glycopeptides for glycan structure and attachment site identification.
      ,
      • Halim A.
      • Nilsson J.
      • Rüetschi U.
      • Hesse C.
      • Larson G.
      Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
      ,
      • Halim A.
      • Rüetschi U.
      • Larson G.
      • Nilsson J.
      LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
      ,
      • King S.L.
      • Joshi H.J.
      • Schjoldager K.T.
      • Halim A.
      • Madsen T.D.
      • Dziegiel M.H.
      • Woetmann A.
      • Vakhrushev S.Y.
      • Wandall H.H.
      Characterizing the O-glycosylation landscape of human plasma, platelets, and endothelial cells.
      ].

      Acknowledgements

      This work was supported by the following grants: the Economic Development and Innovation Operative Programmes GINOP-2.3.2-15-2016-00001 and GINOP-2.3.2-15-2016-00020. We thank the ELKH Cloud for housing our Protein Prospector server. HCEMM has received funding from the EU’s Horizon 2020 research and innovation program under grant agreement No. 739593.

      References

        • Harris R.J.
        • Leonard C.K.
        • Guzzetta A.W.
        • Spellman M.W.
        Tissue plasminogen activator has an O-linked fucose attached to threonine-61 in the epidermal growth factor domain.
        Biochemistry. 1991; 30: 2311-2314
        • Harris R.J.
        • van Halbeek H.
        • Glushka J.
        • Basa L.J.
        • Ling V.T.
        • Smith K.J.
        • Spellman M.W.
        Identification and structural analysis of the tetrasaccharide NeuAc alpha(2-->6)Gal beta(1-->4)GlcNAc beta(1-->3)Fuc alpha 1-->O-linked to serine 61 of human factor IX.
        Biochemistry. 1993; 32: 6539-6547
        • Hofsteenge J.
        • Müller D.R.
        • de Beer T.
        • Löffler A.
        • Richter W.J.
        • Vliegenthart J.F.
        New type of linkage between a carbohydrate and a protein: C-glycosylation of a specific tryptophan residue in human RNase Us.
        Biochemistry. 1994; 33: 13524-13530
        • Hashii N.
        • Suzuki J.
        • Hanamatsu H.
        • Furukawa J.I.
        • Ishii-Watabe A.
        In-depth site-specific O-Glycosylation analysis of therapeutic Fc-fusion protein by electron-transfer/higher-energy collisional dissociation mass spectrometry.
        Biologicals. 2019; 58: 35-43
        • Stavenhagen K.
        • Gahoual R.
        • Dominguez Vega E.
        • Palmese A.
        • Ederveen A.L.H.
        • Cutillo F.
        • Palinsky W.
        • Bierau H.
        • Wuhrer M.
        Site-specific N- and O-glycosylation analysis of atacicept.
        MAbs. 2019; 11: 1053-1063
        • Napoletano C.
        • Steentoff C.
        • Battisti F.
        • Ye Z.
        • Rahimi H.
        • Zizzari I.G.
        • Dionisi M.
        • Cerbelli B.
        • Tomao F.
        • French D.
        • d'Amati G.
        • Panici P.B.
        • Vakhrushev S.
        • Clausen H.
        • Nuti M.
        • Rughetti A.
        Investigating Patterns of Immune Interaction in Ovarian Cancer: Probing the O-glycoproteome by the Macrophage Galactose-Like C-type Lectin (MGL).
        Cancers. 2020; 12: E2841
        • Pirro M.
        • Schoof E.
        • van Vliet S.J.
        • Rombouts Y.
        • Stella A.
        • de Ru A.
        • Mohammed Y.
        • Wuhrer M.
        • van Veelen P.A.
        • Hensbergen P.J.
        Glycoproteomic Analysis of MGL-Binding Proteins on Acute T-Cell Leukemia Cells.
        J. Proteome Res. 2019; 18: 1125-1132
        • Chernykh A.
        • Kawahara R.
        • Thaysen-Andersen M.
        Towards structure-focused glycoproteomics.
        Biochemical Society transactions. 2021; 49: 161-186
        • Darula Z.
        • Medzihradszky K.F.
        Affinity enrichment and characterization of mucin core-1 type glycopeptides from bovine serum.
        Mol. Cell. Proteomics. 2009; 8: 2515-2526
        • Darula Z.
        • Sherman J.
        • Medzihradszky K.F.
        How to dig deeper? Improved enrichment methods for mucin core-1 type glycopeptides.
        Mol. Cell. Proteomics. 2012; 11 (O111.016774)https://doi.org/10.1074/mcp.O111.016774
        • Darula Z.
        • Sarnyai F.
        • Medzihradszky K.F.
        O-glycosylation sites identified from mucin core-1 type glycopeptides from human serum.
        Glycoconj. J. 2016; 33: 435-445
        • Darula Z.
        • Medzihradszky K.F.
        Analysis of Mammalian O-Glycopeptides-We Have Made a Good Start, but There is a Long Way to Go.
        Mol. Cell. Proteomics. 2018; 17: 2-17
        • Kim J.
        • Ryu C.
        • Ha J.
        • Lee J.
        • Kim D.
        • Ji M.
        • Park C.S.
        • Lee J.
        • Kim D.K.
        • Kim H.H.
        Structural and Quantitative Characterization of Mucin-Type O-Glycans and the Identification of O-Glycosylation Sites in Bovine Submaxillary Mucin.
        Biomolecules. 2020; 10: 636https://doi.org/10.3390/biom10040636
        • Wang S.
        • Qin H.
        • Mao J.
        • Fang Z.
        • Chen Y.
        • Zhang X.
        • Hu L.
        • Ye M.
        Profiling of Endogenously Intact N-Linked and O-Linked Glycopeptides from Human Serum Using an Integrated Platform.
        J. Proteome Res. 2020; 19: 1423-1434
        • Zhao X.
        • Zheng S.
        • Li Y.
        • Huang J.
        • Zhang W.
        • Xie Y.
        • Qin W.
        • Qian X.
        An Integrated Mass Spectroscopy Data Processing Strategy for Fast Identification, In-Depth, and Reproducible Quantification of Protein O-Glycosylation in a Large Cohort of Human Urine Samples.
        Anal. Chem. 2020; 92: 690-698
        • Kawahara R.
        • Ortega F.
        • Rosa-Fernandes L.
        • Guimarães V.
        • Quina D.
        • Nahas W.
        • Schwämmle V.
        • Srougi M.
        • Leite K.R.M.
        • Thaysen-Andersen M.
        • Larsen M.R.
        • Palmisano G.
        Distinct urinary glycoprotein signatures in prostate cancer patients.
        Oncotarget. 2018; 9: 33077-33097
        • Riley N.M.
        • Malaker S.A.
        • Driessen M.D.
        • Bertozzi C.R.
        Optimal Dissociation Methods Differ for N- and O-Glycopeptides.
        J. Proteome Res. 2020; 19: 3286-3301
        • Darula Z.
        • Pap Á.
        • Medzihradszky K.F.
        Extended Sialylated O-Glycan Repertoire of Human Urinary Glycoproteins Discovered and Characterized Using Electron-Transfer/Higher-Energy Collision Dissociation.
        J. Proteome Res. 2019; 18: 280-291
        • Pap A.
        • Tasnadi E.
        • Medzihradszky K.F.
        • Darula Z.
        Novel O-linked sialoglycan structures in human urinary glycoproteins.
        Mol. Omics. 2020; 16: 156-164
        • Rossez Y.
        • Maes E.
        • Lefebvre Darroman T.
        • Gosset P.
        • Ecobichon C.
        • Joncquel Chevalier Curt M.
        • Boneca I.G.
        • Michalski J.C.
        • Robbe-Masselot C.
        Almost all human gastric mucin O-glycans harbor blood group A, B or H antigens and are potential binding sites for Helicobacter pylori.
        Glycobiology. 2012; 22: 1193-1206
        • Jin C.
        • Kenny D.T.
        • Skoog E.C.
        • Padra M.
        • Adamczyk B.
        • Vitizeva V.
        • Thorell A.
        • Venkatakrishnan V.
        • Lindén S.K.
        • Karlsson N.G.
        Structural Diversity of Human Gastric Mucin Glycans.
        Mol. Cell. Proteomics. 2017; 16: 743-758
        • Tanaka-Okamoto M.
        • Hanzawa K.
        • Mukai M.
        • Takahashi H.
        • Ohue M.
        • Miyamoto Y.
        Identification of internally sialylated carbohydrate tumor marker candidates, including Sda/CAD antigens, by focused glycomic analyses utilizing the substrate specificity of neuraminidase.
        Glycobiology. 2018; 28: 247-260
        • Bern M.
        • Kil Y.J.
        • Becker C.
        Byonic: advanced peptide and protein identification software.
        Curr Protoc Bioinformatics. 2012; 13https://doi.org/10.1002/0471250953.bi1320s40
        • Baker P.R.
        • Medzihradszky K.F.
        • Chalkley R.J.
        Improving software performance for peptide electron transfer dissociation data analysis by implementation of charge state- and sequence-dependent scoring.
        Mol. Cell. Proteomics. 2010; 9: 1795-1803
        • Lu L.
        • Riley N.M.
        • Shortreed M.R.
        • Bertozzi C.R.
        • Smith L.M.
        O-Pair Search with MetaMorpheus for O-glycopeptide characterization.
        Nat. Methods. 2020; 17: 1133-1138
        • Chalkley R.J.
        • Medzihradszky K.F.
        • Darula Z.
        • Pap A.
        • Baker P.R.
        The effectiveness of filtering glycopeptide peak list files for Y ions.
        Mol. Omics. 2020; 16: 147-155
        • Baker P.R.
        • Trinidad J.C.
        • Chalkley R.J.
        Modification site localization scoring integrated into a search engine.
        Mol. Cell. Proteomics. 2011; 10 (M111.008078)https://doi.org/10.1074/mcp.M111.008078
        • Domon B.
        • Costello C.E.
        A systematic nomenclature for carbohydrate fragmentations in FAB-MS/MS spectra of glycoconjugates.
        Glycoconjugate J. 1988; 5: 397-409
        • Park G.W.
        • Lee J.W.
        • Lee H.K.
        • Shin J.H.
        • Kim J.Y.
        • Yoo J.S.
        Classification of Mucin-Type O-Glycopeptides Using Higher-Energy Collisional Dissociation in Mass Spectrometry.
        Anal. Chem. 2020; 92: 9772-9781
        • Conway J.R.
        • Lex A.
        • Gehlenborg N.
        UpSetR: an R package for the visualization of intersecting sets and their properties.
        Bioinformatics (Oxford, England). 2017; 33: 2938-2940
        • Chu F.K.
        Requirements of cleavage of high mannose oligosaccharides in glycoproteins by peptide N-glycosidase F.
        J. Biol. Chem. 1986; 261: 172-177
        • Saba J.
        • Dutta S.
        • Hemenway E.
        • Viner R.
        Increasing the Productivity of Glycopeptides Analysis by Using Higher-Energy Collision Dissociation-Accurate Mass-Product-Dependent Electron Transfer Dissociation.
        Int. J. Proteomics. 2012; 2012: 560391
        • Medzihradszky K.F.
        • Kaasik K.
        • Chalkley R.J.
        Characterizing sialic acid variants at the glycopeptide level.
        Anal. Chem. 2015; 87: 3064-3071
        • Pap A.
        • Klement E.
        • Hunyadi-Gulyas E.
        • Darula Z.
        • Medzihradszky K.F.
        Status Report on the High-Throughput Characterization of Complex Intact O-Glycopeptide Mixtures.
        J. Am. Soc. Mass Spectrom. 2018; 29: 1210-1220
        • Halim A.
        • Westerlind U.
        • Pett C.
        • Schorlemer M.
        • Rüetschi U.
        • Brinkmalm G.
        • Sihlbom C.
        • Lengqvist J.
        • Larson G.
        • Nilsson J.
        Assignment of saccharide identities through analysis of oxonium ion fragmentation profiles in LC-MS/MS of glycopeptides.
        J. Proteome Res. 2014; 13: 6024-6032
        • Nilsson J.
        • Rüetschi U.
        • Halim A.
        • Hesse C.
        • Carlsohn E.
        • Brinkmalm G.
        • Larson G.
        Enrichment of glycopeptides for glycan structure and attachment site identification.
        Nat. Methods. 2009; 6: 809-811
        • Halim A.
        • Nilsson J.
        • Rüetschi U.
        • Hesse C.
        • Larson G.
        Human urinary glycoproteomics; attachment site specific analysis of N- and O-linked glycosylations by CID and ECD.
        Mol. Cell. Proteomics. 2012; 11 (M111.013649)
        • Halim A.
        • Rüetschi U.
        • Larson G.
        • Nilsson J.
        LC-MS/MS characterization of O-glycosylation sites and glycan structures of human cerebrospinal fluid glycoproteins.
        J. Proteome Res. 2013; 12: 573-584
        • King S.L.
        • Joshi H.J.
        • Schjoldager K.T.
        • Halim A.
        • Madsen T.D.
        • Dziegiel M.H.
        • Woetmann A.
        • Vakhrushev S.Y.
        • Wandall H.H.
        Characterizing the O-glycosylation landscape of human plasma, platelets, and endothelial cells.
        Blood Adv. 2017; 1: 429-442
        • Kawahara R.
        • Chernykh A.
        • Alagesan K.
        • Bern M.
        • Cao W.
        • Chalkley R.J.
        • Cheng K.
        • Choo M.S.
        • Edwards N.
        • Goldman R.
        • Hoffmann M.
        • Hu Y.
        • Huang Y.
        • Kim J.Y.
        • Kletter D.
        • Liquet B.
        • Liu M.
        • Mechref Y.
        • Meng B.
        • Neelamegham S.
        • Nguyen-Khuong T.
        • Nilsson J.
        • Pap A.
        • Park G.W.
        • Parker B.L.
        • Pegg C.L.
        • Penninger J.M.
        • Phung T.K.
        • Pioch M.
        • Rapp E.
        • Sakalli E.
        • Sanda M.
        • Schulz B.L.
        • Scott N.E.
        • Sofronov G.
        • Stadlmann J.
        • Vakhrushev S.Y.
        • Woo C.M.
        • Wu H.Y.
        • Yang P.
        • Ying W.
        • Zhang H.
        • Zhang Y.
        • Zhao J.
        • Zaia J.
        • Haslam S.M.
        • Palmisano G.
        • Yoo J.S.
        • Larson G.
        • Khoo K.H.
        • Medzihradszky K.F.
        • Kolarich D.
        • Packer N.H.
        • Thaysen-Andersen M.
        Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis.
        Nat. Methods. 2021; 18: 1304-1316
        • Riley N.M.
        • Malaker S.A.
        • Bertozzi C.R.
        Electron-Based Dissociation Is Needed for O-Glycopeptides Derived from OpeRATOR Proteolysis.
        Anal. Chem. 2020; 92: 14878-14884
        • Vainauskas S.
        • ne Guntz H.
        • McLeod E.
        • McClung C.
        • Ruse C.
        • Shi X.
        • Taron C.H.
        A Broad-Specificity O-Glycoprotease That Enables Improved Analysis of Glycoproteins and Glycopeptides Containing Intact Complex O-Glycans.
        Anal. Chem. 2022; 94: 1060-1069