Advertisement

Large-scale Identification of N-linked Intact Glycopeptides in Human Serum using HILIC Enrichment and Spectral Library Search*

  • Qingbo Shu
    Footnotes
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China

    National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Mengjie Li
    Footnotes
    Affiliations
    Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Lian Shu
    Footnotes
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112

    University of Chinese Academy of Sciences, Beijing 100049, China
    Search for articles by this author
  • Zhiwu An
    Affiliations
    Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Jifeng Wang
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
    Search for articles by this author
  • Hao Lv
    Affiliations
    Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112

    Research Center for Basic Sciences of Medicine, Basic Medical College, Guizhou Medical University, Guiyang 550025, China
    Search for articles by this author
  • Ming Yang
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Tanxi Cai
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Tony Hu
    Affiliations
    National Center for Mathematics and Interdisciplinary Sciences, Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
    Search for articles by this author
  • Yan Fu
    Correspondence
    To whom correspondence may be addressed
    Affiliations
    Computer Network Information Center, Chinese Academy of Sciences, Beijing 100101, China

    Center for Cellular and Molecular Diagnostics, Department of Biochemistry and Molecular Biology, School of Medicine, Tulane University, New Orleans, Louisiana 70112
    Search for articles by this author
  • Fuquan Yang
    Correspondence
    To whom correspondence may be addressed
    Affiliations
    Laboratory of Protein and Peptide Pharmaceuticals & Proteomics Laboratory, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China

    University of Chinese Academy of Sciences, Beijing 100049, China
    Search for articles by this author
  • Author Footnotes
    * This work was supported by grants from the National Key R&D Program of China (2018YFA0507801 and 2018YFA0507103), the National Natural Science Foundation of China (Grant nos. 91640112, 21607170 and 31670185), the Strategic Priority Research Programs of the Chinese Academy of Sciences (XDA12030202), and the NCMIS CAS. The authors declare that they have no conflicts of interest with the contents of this article.
    This article contains supplemental Figures, Tables, Documents, and Discussion.
    ‖‖ These authors contributed equally to this work.
Open AccessPublished:February 26, 2020DOI:https://doi.org/10.1074/mcp.RA119.001791
      Large-scale identification of N-linked intact glycopeptides by liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) in human serum is challenging because of the wide dynamic range of serum protein abundances, the lack of a complete serum N-glycan database and the existence of proteoforms. In this regard, a spectral library search method was presented for the identification of N-linked intact glycopeptides from N-linked glycoproteins in human serum with target-decoy and motif-specific false discovery rate (FDR) control. Serum proteins were firstly separated into low-abundance and high-abundance proteins by acetonitrile (ACN) precipitation. After digestion, the N-linked intact glycopeptides were enriched by hydrophilic interaction liquid chromatography (HILIC) and a portion of the enriched N-linked intact glycopeptides were processed by Peptide-N-Glycosidase F (PNGase F) to generate N-linked deglycopeptides. Both N-linked intact glycopeptides and deglycopeptides were analyzed by LC-MS/MS. From N-linked deglycopeptides data sets, 764 N-linked glycoproteins, 1699 N-linked glycosites and 3328 unique N-linked deglycopeptides were identified. Four types of N-linked glycosylation motifs (NXS/T/C/V, X≠P) were used to recognize the N-linked deglycopeptides. The spectra of these N-linked deglycopeptides were utilized for N-linked deglycopeptides library construction and identification of N-linked intact glycopeptides. A database containing 739 N-glycan masses was constructed and utilized during spectral library search for the identification of N-linked intact glycopeptides. In total, 526 N-linked glycoproteins, 1036 N-linked glycosites, 22,677 N-linked intact glycopeptides and 738 N-glycan masses were identified under 1% FDR, representing the most in-depth serum N-glycoproteome identified by LC-MS/MS at N-linked intact glycopeptide level.

      Graphical Abstract

      N-linked glycoproteins in human serum have been utilized in diagnosing of diseases for decades, such as prostate specific antigen (PSA)
      The abbreviations used are:
      PSA
      prostate specific antigen
      ACN
      acetonitrile
      CA-125
      cancer antigen 125
      CD
      cluster of differentiation
      CDG
      congenital disorders of glycosylation
      CDT
      carbohydrate-deficient transferrin
      CFH
      complement factor H
      DDA
      data-dependent acquisition
      DIA
      data-independent acquisition
      DTT
      1, 4-dithiothreitol
      FA
      formic acid
      FDGP
      fractionated deglycopeptides
      FGP
      fractionated glycopeptides
      FDR
      false discovery rate
      GAG
      glycosaminoglycan
      GPSM
      glycopeptidespectrum match
      HILIC
      hydrophilic interaction liquid chromatography
      IAM
      2-iodoacetamide
      LC
      liquid chromatography
      LC-MS/MS
      liquid chromatography coupled tandem mass spectrometry
      MS
      mass spectrometry
      OSTs
      oligosaccharyltransferases
      PDGP
      deglycopeptides from precipitate
      PGP
      glycopeptides from precipitate
      PTMs
      post-translational modifications
      RPLC
      reversed-phase liquid chromatography
      SDGP
      deglycopeptides from supernatant
      SGP
      glycopeptides from supernatant
      TFA
      trifluoroacetic acid
      UDGP
      deglycopeptides from un-fractionated serum sample
      UGP
      glycopeptides from un-fractionated serum sample.
      1The abbreviations used are:PSA
      prostate specific antigen
      ACN
      acetonitrile
      CA-125
      cancer antigen 125
      CD
      cluster of differentiation
      CDG
      congenital disorders of glycosylation
      CDT
      carbohydrate-deficient transferrin
      CFH
      complement factor H
      DDA
      data-dependent acquisition
      DIA
      data-independent acquisition
      DTT
      1, 4-dithiothreitol
      FA
      formic acid
      FDGP
      fractionated deglycopeptides
      FGP
      fractionated glycopeptides
      FDR
      false discovery rate
      GAG
      glycosaminoglycan
      GPSM
      glycopeptidespectrum match
      HILIC
      hydrophilic interaction liquid chromatography
      IAM
      2-iodoacetamide
      LC
      liquid chromatography
      LC-MS/MS
      liquid chromatography coupled tandem mass spectrometry
      MS
      mass spectrometry
      OSTs
      oligosaccharyltransferases
      PDGP
      deglycopeptides from precipitate
      PGP
      glycopeptides from precipitate
      PTMs
      post-translational modifications
      RPLC
      reversed-phase liquid chromatography
      SDGP
      deglycopeptides from supernatant
      SGP
      glycopeptides from supernatant
      TFA
      trifluoroacetic acid
      UDGP
      deglycopeptides from un-fractionated serum sample
      UGP
      glycopeptides from un-fractionated serum sample.
      and cancer antigen 125 (CA-125) (
      • Kailemia M.J.
      • Park D.
      • Lebrilla C.B.
      Glycans and glycoproteins as specific biomarkers for cancer.
      ). Accurate identification of N-linked intact glycopeptides is thus a prerequisite to monitor their changes under different disease status. LC-MS/MS has been widely applied to identifying and quantifying N-linked glycopeptides. In this field, data-dependent acquisition (DDA) is usually utilized. In DDA mode, precursor ions are selected for MS/MS acquisition according to their relative intensities on MS1. To isolate a precursor ion, its m/z value serves as the center point of an isolation window, and all ions in this window are fragmented at the same time. Hence, a list of precursor m/z recorded by the instrument will include the most intense isotopic peak of this precursor ion in its elution profile. A lot of software tools are developed to extract MS/MS spectra of different formats of raw data produced by different mass spectrometers. They inspect the isolation windows corresponding to each MS/MS spectrum and export the most possible precursor, such as MSConvert (
      • Kessner D.
      • Chambers M.
      • Burke R.
      • Agus D.
      • Mallick P.
      ProteoWizard: open source software for rapid proteomics tools development.
      ), and infer precursor m/z simply by exporting the m/z value recorded on each MS/MS spectrum. If the monoisotopic peak is selected for fragmentation during DDA and recorded as the precursor m/z on MS/MS spectrum, this streamline strategy can find the monoisotopic peak easily. However, the monoisotopic peak of N-linked intact glycopeptide is rarely the most intense peak among its isotope cluster, which is less likely selected as precursor m/z in DDA compared to its other isotopic peaks. Therefore, a software tool that can precisely recognize the monoisotopic peaks of N-linked intact glycopeptides is usually needed in the glycoproteomic study. In addition, conjugation with glycan increases hydrophilicity of peptides. They are more difficult to be separated by commonly used reversed-phase liquid chromatography (RPLC) than non-glycosylated peptides. Hence, co-elution of N-linked intact glycopeptides in LC and co-fragmentation of their precursor ions in MS are not uncommon in LC-MS analysis of glycopeptides. This could distort isotopic distribution of each precursor ion and make it difficult to determine its respective monoisotopic peak or precursor m/z. Software tools that enable accurate determination of precursor m/z of N-linked intact glycopeptides in MS1 spectra are still under developing. To improve the accuracy of precursor m/z, a correction step called precursor ion selection was employed in a recent study after data conversion from vendor-specific binary files to open-format files (
      • Sun S.
      • Shah P.
      • Eshghi S.T.
      • Yang W.
      • Trikannad N.
      • Yang S.
      • Chen L.
      • Aiyetan P.
      • Hoti N.
      • Zhang Z.
      • Chan D.W.
      • Zhang H.
      Comprehensive analysis of protein glycosylation by solid-phase extraction of N-linked glycans and glycosite-containing peptides.
      ). In that study, a software tool called GPQuest with spectral library search strategy was utilized in N-linked intact glycopeptide identification from human cell lysates. Instead of post-correction of precursor m/z, another tool pParse aims to find the precursor m/z precisely and further separate co-eluted peptides based on machine learning (
      • Yuan Z.F.
      • Liu C.
      • Wang H.P.
      • Sun R.X.
      • Fu Y.
      • Zhang J.F.
      • Wang L.H.
      • Chi H.
      • Li Y.
      • Xiu L.Y.
      • Wang W.P.
      • He S.M.
      pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra.
      ). It was embedded into pGlyco 2.0 and was successfully applied to mouse tissue N-linked intact glycopeptide identification through database search (
      • Liu M.Q.
      • Zeng W.F.
      • Fang P.
      • Cao W.Q.
      • Liu C.
      • Yan G.Q.
      • Zhang Y.
      • Peng C.
      • Wu J.Q.
      • Zhang X.J.
      • Tu H.J.
      • Chi H.
      • Sun R.X.
      • Cao Y.
      • Dong M.Q.
      • Jiang B.Y.
      • Huang J.M.
      • Shen H.L.
      • Wong C.C.L.
      • He S.M.
      • Yang P.Y.
      pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification.
      ). Both studies achieved in-depth identification of N-linked intact glycopeptides in biological samples, though they were different in their identification strategies. Different from the proteomes of cell and tissue lysates, human serum proteome has a wider dynamic range in protein abundance. Hence, it is a big challenge to deeply identify the N-linked intact glycopeptides in human serum. To date, the largest spectral library of plasma N-glycopeptides contains 4347 N-linked deglycopeptides corresponding to 1151 plasma glycoproteins (
      • Sajic T.
      • Liu Y.
      • Arvaniti E.
      • Surinova S.
      • Williams E.G.
      • Schiess R.
      • Huttenhain R.
      • Sethi A.
      • Pan S.
      • Brentnall T.A.
      • Chen R.
      • Blattmann P.
      • Friedrich B.
      • Nimeus E.
      • Malander S.
      • Omlin A.
      • Gillessen S.
      • Claassen M.
      • Aebersold R.
      Similarities and Differences of Blood N-Glycoproteins in Five Solid Carcinomas at Localized Clinical Stage Analyzed by SWATH-MS.
      ). This was achieved by sophisticate fractionation of plasma proteins and hydrazide chemistry enrichment of N-linked intact glycopeptides, followed by hydrolyzing of N-glycans using PNGase F to generate N-linked deglycopeptides and LC-MS/MS identification. Because of the covalent binding of glycan to hydrazide beads in hydrazide chemistry enrichment, this enrichment strategy is more suitable to large-scale glycosite mapping instead of N-linked intact glycopeptide identification. Notably, there are some false positive N-linked glycosites caused by spontaneous deamidation of asparagine. To decrease the false positive identification of glycosites, a two-step enzymatic cleavage strategy was recently developed to specifically identify N-linked glycosites and deglycopeptides, in which carboxyl groups of aspartic acid (D) were modified with aniline before N-linked glycans were released by PNGase F digestion, and the generated N-linked deglycopeptides contained new aspartic residues which were able to be cleaved by Asp-N (
      • Sun S.
      • Shah P.
      • Eshghi S.T.
      • Yang W.
      • Trikannad N.
      • Yang S.
      • Chen L.
      • Aiyetan P.
      • Hoti N.
      • Zhang Z.
      • Chan D.W.
      • Zhang H.
      Comprehensive analysis of protein glycosylation by solid-phase extraction of N-linked glycans and glycosite-containing peptides.
      ). Together with the software GPQuest, this two-step cleavage strategy was applied to human serum and 331 N-linked glycoproteins were identified (
      • Sun S.
      • Hu Y.
      • Jia L.
      • Eshghi S.T.
      • Liu Y.
      • Shah P.
      • Zhang H.
      Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein.
      ). By ignoring the difference between plasma and serum proteome, the coverage of serum N-glycoproteome achieved by this two-step cleavage strategy was 28.8% (331/1151).
      In this study, we proposed a complete workflow to increase the depth of serum N-glycoproteome through LC-MS/MS. First, serum proteins were separated into low-abundance and high-abundance ones by ACN precipitation. After tryptic digestion, HILIC was used to enrich glycopeptides, considering its high specificity in glycopeptide enrichment (
      • Sun S.
      • Hu Y.
      • Jia L.
      • Eshghi S.T.
      • Liu Y.
      • Shah P.
      • Zhang H.
      Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein.
      ,
      • Zhang C.
      • Ye Z.
      • Xue P.
      • Shu Q.
      • Zhou Y.
      • Ji Y.
      • Fu Y.
      • Wang J.
      • Yang F.
      Evaluation of Different N-Glycopeptide Enrichment Methods for N-Glycosylation Sites Mapping in Mouse Brain.
      ,
      • Palmisano G.
      • Larsen M.R.
      • Packer N.H.
      • Thaysen-Andersen M.
      Structural analysis of glycoprotein sialylation -part II: LC-MS based detection.
      ). Second, PNGase F was used to produce N-linked deglycopeptides from a portion of the enriched N-linked intact glycopeptide samples. The N-linked deglycopeptides were further fractionated through high-pH RPLC, analyzed using LC-MS/MS and identified through protein sequence database search using pFind 2.8.8 (
      • Wang L.H.
      • Li D.Q.
      • Fu Y.
      • Wang H.P.
      • Zhang J.F.
      • Yuan Z.F.
      • Sun R.X.
      • Zeng R.
      • He S.M.
      • Gao W.
      pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.
      ). From N-linked deglycopeptides data sets, 764 N-linked glycoproteins, 1699 N-linked glycosites and 3328 unique N-linked deglycopeptides were identified. The coverage of serum N-glycoproteome identified by this strategy was increased to 66.4% (764/1151). A spectral library of identified deglycopeptides with theoretical Y0–Y5 ions was generated by pMatchGlyco (version 1.2) (
      • An Z.W.
      • Shu Q.B.
      • Lv H.
      • Shu L.
      • Wang J.F.
      • Yang F.Q.
      • Fu Y.
      N-linked glycopeptide identification based on open mass spectral library search.
      ). N-linked intact glycopeptides were also analyzed using LC-MS/MS. To reduce co-elution of N-linked intact glycopeptides on LC column, high-pH RPLC was used for N-linked intact glycopeptide fractionation. Furthermore, pParse's co-elution function was activated to accurately determine the precursor m/z of N-linked intact glycopeptides. Through open spectral library search and target-decoy FDR control, 526 N-linked glycoproteins were identified in serum using pMatchGlyco. Furthermore, 1036 N-linked glycosites, 22,677 N-linked intact glycopeptides and 738 N-glycan masses were identified under 1% FDR, representing the most in-depth serum N-glycoproteome identified by LC-MS/MS at both intact N-linked glycopeptide level and N-linked glycoprotein level.

      DISCUSSION

      Compared with published workflows for N-linked intact glycopeptide identification, the widespread presence of truncated proteins or variable proteoforms in serum has been taken into account in the proposed workflow. Semi-tryptic digestion and multiple variable PTMs were set during database search. To the best of our knowledge, these settings have not been considered simultaneously in published workflows, as they will increase the searching space. We used a set of glycoprotein standards to validate the setting of semi-tryptic digestion, and the results showed semi-tryptic digestion dramatically increased the number of confident GPSMs of these protein standards' glycopeptides. In the case of OVAL, it was necessary to set semi-tryptic digestion in order to identify its N-linked deglycopeptides. In addition, 16 kinds of variable PTMs were considered when setting tryptic digestion. By comparing the GPSMs, glycosites and N-glycan masses obtained from these standards and the two human serum data sets under four different sets of search parameters, it showed more than 80% GPSMs were shared by setting 16 variable PTMs and by setting 3 variable PTMs (supplemental Fig. S7). Therefore, setting 16 variable PTMs didn't introduce excessive false positive ones. We combined the database search results from setting semi-tryptic digestion plus 3 variable PTMs and from setting tryptic-digestion plus 16 variable PTMs that were introduced as five subgroups. The combined results were filtered with 1% FDR at peptide level based on the peptides identified as having an Asn deamidation modification, and the remaining peptides were used to construct serum N-linked deglycopeptide spectral library.
      In this regard, the FDR of deglycopeptide identifications was effectively controlled, and the tryptic peptides with some of the 16 PTMs and their unmodified versions will undergo matching and scoring in parallel. This reduces the chance of mismatching because of the lack of really existed peptides containing the 16 PTMs.
      It is critical to validate the GPSMs in large-scale N-linked intact glycopeptide identification. By filtering the GPSMs that have their precursor m/z assigned as isotopic peaks, the major part of false positive GPSMs caused by inaccurate determination of precursor m/z during data format conversion was removed. Through introducing entrapment glycan masses into search space, the accuracy of our method was validated. The degree of separation between matched true and entrapment glycan masses in the scattering plot is a good indicator of our data quality. It also helps to recognize those low-confidence N-glycan masses and glycopeptides identified, which is critical in completing the human serum N-glycan database.
      Target-decoy method was used to perform FDR control at the peptide level for deglycopeptide identification. In theory, if the peptide sequence is correct, the glycan mass should be correct. In the proposed method, the target-decoy strategy was applied for FDR control of peptide identifications of GPSMs. Therefore, the peptides of GPSMs after FDR filtration should be reliable. If the peptide part (including sequence and PTMs) of a GPSM is correct, the mass of glycan part should also be correct if the precursor mass of the N-linked intact glycopeptide is accurate. However, there are three conditions that can lead to incorrect identification of the glycan part of a GPSM. Firstly, the identified peptide has the correct sequence but incorrect PTMs because its PTMs are unusual and not included in database search. The mass shift introduced by these unusual PTMs may be incorrectly explained as part of a glycan's mass, which leads to incorrect identification of the glycan. However, we have detected 16 abundant PTMs and included them into database search, the proportion of N-linked intact glycopeptides with unusual PTMs is expected to be small, and the incorrect glycan identification caused by this way is neglectable. Secondly, the precursor mass assignment is incorrect, and the mass error causes incorrect glycan identification. Finally, the glycan may carry some modifications, resulting in the change of the glycan mass, and there happens to be a glycan mass that equals to the changed glycan mass. To evaluate the rate of glycan mass errors caused by the above three conditions, we generated entrapment glycan masses and incorporated them into the glycan database (See supplementary Discussion for more details).
      However, because of the presence of incorrect precursor masses and unknown PTMs, even if the peptide sequence is correct, the glycan mass may be erroneous. At GPSM level, the proportion of incorrect glycan mass identification was estimated to be less than 10%. Due to the limited knowledge of human serum N-glycome, this estimation is not strict because it is possible to include truly existed glycan mass into entrapment glycan masses. Therefore, the entrapment glycan masses were used for method validation instead of FDR control.
      N-glycosylation happened at the non-canonical motif NXV(X≠P) were reported in human serum recently (
      • Sun S.
      • Hu Y.
      • Jia L.
      • Eshghi S.T.
      • Liu Y.
      • Shah P.
      • Zhang H.
      Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein.
      ). However, stringent FDR estimation is needed in order to decrease false positives caused by including this newly identified motif, which is not addressed by employing database search strategy in N-linked intact glycopeptide identification to date. Here we included this motif in N-linked intact glycopeptide identification and adopted group-FDR strategy to separately estimate FDR in different motif subclasses. In the UDGP and FDGP data sets, we also identified 143 deglycopeptides with NXV motif. By including them into the spectral library, 26 N-linked glycoproteins carrying 29 N-linked glycosites were identified with NXV motif (supplemental Table S27). They included the previously reported glycosite N63 of alpha-1B-glycoprotein and N68 of albumin (
      • Sun S.
      • Hu Y.
      • Jia L.
      • Eshghi S.T.
      • Liu Y.
      • Shah P.
      • Zhang H.
      Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein.
      ), suggesting that our method was reliable. However, it is unclear how this newly identified motif is different from the canonical N-glycosylation motifs in terms of its recognition and processing by OSTs.
      It is well established that rare genetic defects in the assembly, attachment, and processing of glycans will produce disorders that affect multiple systems and lead to congenital disorders of glycosylation (CDG). Many serum glycoproteins are affected in CDG. We suggest that the proposed method can be used for CDG diagnosis and phenotyping. Currently, serotransferrin is used as the analyte marker to diagnose CDG because of its relatively high abundance (
      • Babovic-Vuksanovic D.
      • O'Brien J.F.
      Laboratory diagnosis of congenital disorders of glycosylation type I by analysis of transferrin glycoforms.
      ). In the serum of patient with CDG-1a, the loss of an entire oligosaccharide moiety, i.e. Hex5HexNAc4NANA2 from serotransferrin was identified by MS. In our result, this glycan is identified on all the five glycosites of serotransferrin. Compared with the established methods based on immuno-capture and MS, our method avoids immuno-capture and provides more details of serotransferrin glycoform in a site-specific manner. With the glycopetides identified here, a spectral library of serotransferrin glycopeptide can be constructed and applied to DIA workflow for its glycoform quantification. This will contribute to precise diagnostics of CDG.
      As part of human innate immune system, serotransferrin sequesters irons and restricts the iron-dependent growth of pathogenic bacteria. However, pathogenic Neisseria membrane protein TbpA can bind to serotransferrin, and obtain iron from it. The N-glycans on the surface of serotransferrin provides binding surface for TbpA. As a result, distinct N-glycan structures on the interacting interface may vary in their binding affinity with TbpA. Further study on the role of serotransferrin carried N-glycans in the pathogenesis of Neisseria will help us elucidating the biological role of N-glycosylation in host-pathogen interaction. This can be achieved by in-depth analysis of serotransferrin glycoform and the binding affinity of each glycan to TbpA.
      Accurate identification of each N-glycan composite is a prerequisite for functional investigation of site-specific glycoform of serum glycoproteins. The proposed LC-MS/MS method for N-linked glycopeptide identification and site-specific glycoform construction was validated in human serum sample. It enables quantitative analysis of site-specific glycoform of serum glycoprotein, which helps to develop better approaches for diagnosis and treatment of CDG as well as other human diseases.

      DATA AVAILABILITY

      The mass spectrometry proteomics data (UDGP, FDGP, UGP, FGP and data from commercial glycoproteins as well as serum spiked with HRP and Ovalbumin) and annotated N-linked intact glycopeptide spectra have been deposited to the ProteomeXchange Consortium via the PRIDE (
      • Vizcaino J.A.
      • Csordas A.
      • del-Toro N.
      • Dianes J.A.
      • Griss J.
      • Lavidas I.
      • Mayer G.
      • Perez-Riverol Y.
      • Reisinger F.
      • Ternent T.
      • Xu Q.W.
      • Wang R.
      • Hermjakob H.
      2016 update of the PRIDE database and its related tools.
      ) partner repository with the data set identifier PXD015622 [http://www.ebi.ac.uk/pride]. The data set OVCAR3 was previously published (
      • Sun S.
      • Shah P.
      • Eshghi S.T.
      • Yang W.
      • Trikannad N.
      • Yang S.
      • Chen L.
      • Aiyetan P.
      • Hoti N.
      • Zhang Z.
      • Chan D.W.
      • Zhang H.
      Comprehensive analysis of protein glycosylation by solid-phase extraction of N-linked glycans and glycosite-containing peptides.
      ) and can be found with the data set identifier PXD001571. The pMatchGlyco software (version 1.2) and matlab code named “SeparateFDR_MiniMax” for separate-FDR estimation and precursor m/z related GPSM filtering can be downloaded at http://fugroup.amss.ac.cn/software/pMatchGlyco/pMatchGlyco.html.

      Acknowledgments

      We thank Wenfeng Zeng for his constructive suggestions during data analysis, Yanlong Ji and Yue Zhou for their help in HILIC enrichment and MS data acquisition, Shisheng Sun and Hui Zhang for their kind help in providing GPQuest software and OVCAR3 data set and valuable discussion, Junjie Hou, Tingyi Song and Hannu Peltoniemi for providing codes, the software tools MAGIC and GlycopeptideID for testing.

      REFERENCES

        • Kailemia M.J.
        • Park D.
        • Lebrilla C.B.
        Glycans and glycoproteins as specific biomarkers for cancer.
        Anal. Bioanal. Chem. 2017; 409: 395-410
        • Kessner D.
        • Chambers M.
        • Burke R.
        • Agus D.
        • Mallick P.
        ProteoWizard: open source software for rapid proteomics tools development.
        Bioinformatics. 2008; 24: 2534-2536
        • Sun S.
        • Shah P.
        • Eshghi S.T.
        • Yang W.
        • Trikannad N.
        • Yang S.
        • Chen L.
        • Aiyetan P.
        • Hoti N.
        • Zhang Z.
        • Chan D.W.
        • Zhang H.
        Comprehensive analysis of protein glycosylation by solid-phase extraction of N-linked glycans and glycosite-containing peptides.
        Nat. Biotechnol. 2016; 34: 84-88
        • Yuan Z.F.
        • Liu C.
        • Wang H.P.
        • Sun R.X.
        • Fu Y.
        • Zhang J.F.
        • Wang L.H.
        • Chi H.
        • Li Y.
        • Xiu L.Y.
        • Wang W.P.
        • He S.M.
        pParse: a method for accurate determination of monoisotopic peaks in high-resolution mass spectra.
        Proteomics. 2012; 12: 226-235
        • Liu M.Q.
        • Zeng W.F.
        • Fang P.
        • Cao W.Q.
        • Liu C.
        • Yan G.Q.
        • Zhang Y.
        • Peng C.
        • Wu J.Q.
        • Zhang X.J.
        • Tu H.J.
        • Chi H.
        • Sun R.X.
        • Cao Y.
        • Dong M.Q.
        • Jiang B.Y.
        • Huang J.M.
        • Shen H.L.
        • Wong C.C.L.
        • He S.M.
        • Yang P.Y.
        pGlyco 2.0 enables precision N-glycoproteomics with comprehensive quality control and one-step mass spectrometry for intact glycopeptide identification.
        Nat. Commun. 2017; 8: 438
        • Sajic T.
        • Liu Y.
        • Arvaniti E.
        • Surinova S.
        • Williams E.G.
        • Schiess R.
        • Huttenhain R.
        • Sethi A.
        • Pan S.
        • Brentnall T.A.
        • Chen R.
        • Blattmann P.
        • Friedrich B.
        • Nimeus E.
        • Malander S.
        • Omlin A.
        • Gillessen S.
        • Claassen M.
        • Aebersold R.
        Similarities and Differences of Blood N-Glycoproteins in Five Solid Carcinomas at Localized Clinical Stage Analyzed by SWATH-MS.
        Cell Rep. 2018; 23: 2819-2831 e2815
        • Sun S.
        • Hu Y.
        • Jia L.
        • Eshghi S.T.
        • Liu Y.
        • Shah P.
        • Zhang H.
        Site-Specific Profiling of Serum Glycoproteins Using N-Linked Glycan and Glycosite Analysis Revealing Atypical N-Glycosylation Sites on Albumin and alpha-1B-Glycoprotein.
        Anal. Chem. 2018; 90: 6292-6299
        • Zhang C.
        • Ye Z.
        • Xue P.
        • Shu Q.
        • Zhou Y.
        • Ji Y.
        • Fu Y.
        • Wang J.
        • Yang F.
        Evaluation of Different N-Glycopeptide Enrichment Methods for N-Glycosylation Sites Mapping in Mouse Brain.
        J. Proteome Res. 2016; 15: 2960-2968
        • Palmisano G.
        • Larsen M.R.
        • Packer N.H.
        • Thaysen-Andersen M.
        Structural analysis of glycoprotein sialylation -part II: LC-MS based detection.
        RSC Adv. 2013; 3: 22706-22726
        • Wang L.H.
        • Li D.Q.
        • Fu Y.
        • Wang H.P.
        • Zhang J.F.
        • Yuan Z.F.
        • Sun R.X.
        • Zeng R.
        • He S.M.
        • Gao W.
        pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry.
        Rapid Commun. Mass Spectrom. 2007; 21: 2985-2991
        • An Z.W.
        • Shu Q.B.
        • Lv H.
        • Shu L.
        • Wang J.F.
        • Yang F.Q.
        • Fu Y.
        N-linked glycopeptide identification based on open mass spectral library search.
        BioMed Res. Int. 2018; : 1564136
        • Zielinska D.F.
        • Gnad F.
        • Wisniewski J.R.
        • Mann M.
        Precision mapping of an in vivo N-glycoproteome reveals rigid topological and sequence constraints.
        Cell. 2010; 141: 897-907
        • Skates S.J.
        • Horick N.K.
        • Moy J.M.
        • Minihan A.M.
        • Seiden M.V.
        • Marks J.R.
        • Sluss P.
        • Cramer D.W.
        Pooling of case specimens to create standard serum sets for screening cancer biomarkers.
        Cancer Epidemiol., Biomarkers Prev. 2007; 16: 334-341
        • Ji Y.
        • Wei S.
        • Hou J.
        • Zhang C.
        • Xue P.
        • Wang J.
        • Chen X.
        • Guo X.
        • Yang F.
        Integrated proteomic and N-glycoproteomic analyses of doxorubicin sensitive and resistant ovarian cancer cells reveal glycoprotein alteration in protein abundance and glycosylation.
        Oncotarget. 2017; 8: 13413-13427
        • Ranzinger R.
        • Herget S.
        • von der Lieth C.W.
        • Frank M.
        GlycomeDB–a unified database for carbohydrate structures.
        Nucleic Acids Res. 2011; 39: D373-D376
        • Yang B.Y.
        • Gray J.S.
        • Montgomery R.
        The glycans of horseradish peroxidase.
        Carbohydr. Res. 1996; 287: 203-212
        • Elias J.E.
        • Gygi S.P.
        Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.
        Nat. Methods. 2007; 4: 207-214
        • Fu Y.
        • Qian X.
        Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry.
        Mol. Cell. Proteomics. 2014; 13: 1359-1368
        • An Z.
        • Zhai L.
        • Ying W.
        • Qian X.
        • Gong F.
        • Tan M.
        • Fu Y.
        PTMiner: localization and quality control of protein modifications detected in an open search and its application to comprehensive post-translational modification characterization in human proteome.
        Mol. Cell. Proteomics. 2019; 18: 391-405
        • Baldwin M.A.
        • Falick A.M.
        • Gibson B.W.
        • Prusiner S.B.
        • Stahl N.
        • Burlingame A.L.
        Tandem mass-spectrometry of peptides with N-terminal glutamine - studies on a prion protein peptide.
        J. Am. Soc. Mass Spectrom. 1990; 1: 258-264
        • Wang T.
        • Cai Z.P.
        • Gu X.Q.
        • Ma H.Y.
        • Du Y.M.
        • Huang K.
        • Voglmeir J.
        • Liu L.
        Discovery and characterization of a novel extremely acidic bacterial N-glycanase with combined advantages of PNGase F and A.
        Biosci. Rep. 2014; 34: e00149
        • Remily-Wood E.
        • Dirscherl H.
        • Koomen J.M.
        Acid hydrolysis of proteins in matrix assisted laser desorption ionization matrices.
        J. Am. Soc. Mass Spectrom. 2009; 20: 2106-2115
        • Kanehisa M.
        • Furumichi M.
        • Tanabe M.
        • Sato Y.
        • Morishima K.
        KEGG: new perspectives on genomes, pathways, diseases and drugs.
        Nucleic Acids Res. 2017; 45: D353-D361
        • Szklarczyk D.
        • Morris J.H.
        • Cook H.
        • Kuhn M.
        • Wyder S.
        • Simonovic M.
        • Santos A.
        • Doncheva N.T.
        • Roth A.
        • Bork P.
        • Jensen L.J.
        • von Mering C.
        The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.
        Nucleic Acids Res. 2017; 45: D362-D368
        • Fabregat A.
        • Sidiropoulos K.
        • Garapati P.
        • Gillespie M.
        • Hausmann K.
        • Haw R.
        • Jassal B.
        • Jupe S.
        • Korninger F.
        • McKay S.
        • Matthews L.
        • May B.
        • Milacic M.
        • Rothfels K.
        • Shamovsky V.
        • Webber M.
        • Weiser J.
        • Williams M.
        • Wu G.
        • Stein L.
        • Hermjakob H.
        • D'Eustachio P.
        The reactome pathway knowledgebase.
        Nucleic Acids Res. 2016; 44: D481-D487
        • Perkins S.J.
        • Fung K.W.
        • Khan S.
        Molecular interactions between complement factor H and its heparin and heparan sulfate ligands.
        Front. Immunol. 2014; 5: 126
        • Senbanjo L.T.
        • Chellaiah M.A.
        CD44: A multifunctional cell surface adhesion receptor is a regulator of progression and metastasis of cancer cells.
        Front. Cell Dev. Biol. 2017; 5: 18
        • Saldova R.
        • AsadiShehni A.
        • Haakensen V.D.
        • Steinfeld I.
        • Hilliard M.
        • Kifer I.
        • Helland A.
        • Yakhini Z.
        • Borresen-Dale A.L.
        • Rudd P.M.
        Association of N-glycosylation with breast carcinoma and systemic features using high-resolution quantitative UPLC.
        J. Proteome Res. 2014; 13: 2314-2327
        • Wang J.R.
        • Gao W.N.
        • Grimm R.
        • Jiang S.B.
        • Liang Y.
        • Ye H.
        • Li Z.G.
        • Yau L.F.
        • Huang H.
        • Liu J.
        • Jiang M.
        • Meng Q.
        • Tong T.T.
        • Huang H.H.
        • Lee S.
        • Zeng X.
        • Liu L.
        • Jiang Z.H.
        A method to identify trace sulfated IgG N-glycans as biomarkers for rheumatoid arthritis.
        Nat. Commun. 2017; 8: 631
        • Stanley P.
        • Taniguchi N.
        • Aebi M.
        N-glycans.
        in: Varki A. Cummings R.D. Esko J.D. Stanley P. Hart G.W. Aebi M. Darvill A.G. Kinoshita T. Packer N.H. Prestegard J.H. Schnaar R.L. Seeberger P.H. Essentials of Glycobiology. Cold Spring Harbor, NY2015: 99-111
        • Lowenthal M.S.
        • Davis K.S.
        • Formolo T.
        • Kilpatrick L.E.
        • Phinney K.W.
        Identification of novel N-glycosylation sites at noncanonical protein consensus motifs.
        J. Proteome Res. 2016; 15: 2087-2101
        • Noinaj N.
        • Easley N.C.
        • Oke M.
        • Mizuno N.
        • Gumbart J.
        • Boura E.
        • Steere A.N.
        • Zak O.
        • Aisen P.
        • Tajkhorshid E.
        • Evans R.W.
        • Gorringe A.R.
        • Mason A.B.
        • Steven A.C.
        • Buchanan S.K.
        Structural basis for iron piracy by pathogenic Neisseria.
        Nature. 2012; 483: 53-58
        • Helander A.
        • Eriksson G.
        • Stibler H.
        • Jeppsson J.O.
        Interference of transferrin isoform types with carbohydrate-deficient transferrin quantification in the identification of alcohol abuse.
        Clin. Chem. 2001; 47: 1225-1233
        • Chang I.J.
        • He M.
        • Lam C.T.
        Congenital disorders of glycosylation.
        Ann. Transl. Med. 2018; 6: 477
        • MacLean B.
        • Tomazela D.M.
        • Shulman N.
        • Chambers M.
        • Finney G.L.
        • Frewen B.
        • Kern R.
        • Tabb D.L.
        • Liebler D.C.
        • MacCoss M.J.
        Skyline: an open source document editor for creating and analyzing targeted proteomics experiments.
        Bioinformatics. 2010; 26: 966-968
        • Babovic-Vuksanovic D.
        • O'Brien J.F.
        Laboratory diagnosis of congenital disorders of glycosylation type I by analysis of transferrin glycoforms.
        Mol. Diagn. Ther. 2007; 11: 303-311
        • Vizcaino J.A.
        • Csordas A.
        • del-Toro N.
        • Dianes J.A.
        • Griss J.
        • Lavidas I.
        • Mayer G.
        • Perez-Riverol Y.
        • Reisinger F.
        • Ternent T.
        • Xu Q.W.
        • Wang R.
        • Hermjakob H.
        2016 update of the PRIDE database and its related tools.
        Nucleic Acids Res. 2016; 44: D447-D456