Simultaneous Characterization of Glyco- and Phosphoproteomes of Mouse Brain Membrane Proteome with Electrostatic Repulsion Hydrophilic Interaction Chromatography*

Characterization of glyco- and phosphoproteins as well as their modification sites poses many challenges, the greatest being loss of their signals during mass spectrometric detection due to substoichiometric amounts and the ion suppression effect caused by peptides of high abundance. We report here an optimized protocol using electrostatic repulsion hydrophilic interaction chromatography for the simultaneous enrichment of glyco- and phosphopeptides from mouse brain membrane protein digest. With this protocol, we successfully identified 544 unique glycoproteins and 922 glycosylation sites, which were significantly higher than those from the commonly used hydrazide chemistry method (192 glycoproteins and 345 glycosylation sites). Moreover, a total of 383 phosphoproteins and 915 phosphorylation sites were recovered from the sample, suggesting that this protocol has the potential to enrich both glycopeptides and phosphopeptides simultaneously. Of the total 995 glycosylation sites identified from both methods, 96% were considered new as they were either annotated as putative or not documented in the newly released Swiss-Prot database. Thus, this study could be of significant value in complementing the current glycoprotein database and provides a unique opportunity to study the complex interaction of two different post-translational modifications in health and disease without being affected by interexperimental variations.

Protein glycosylation and phosphorylation are two important post-translational modifications. In mammals, it has been estimated that nearly 50% of all proteins are glycosylated (1), and at least one-third of all proteins are phosphorylated (2). The modification of a protein has an important role in determining its stability, activity, localization, and interactions with other proteins. For example, N-linked glycosylation of a pro-tein enhances its stability and targets it mainly to extracellular locations such as the plasma membrane (3), whereas phosphorylation of tyrosine kinases initiates signal cascades in both proproliferative and antiapoptotic cellular responses (4).
Membrane proteins are essential for cells to maintain the integrity of their structure and perform signal transduction in response to extracellular stimulation. For most membrane proteins, receptor activities of their extracellular domains are often mediated via N-linked glycosylation, whereas the cytoplasmic domains can be phosphorylated reversibly and function as signal transducers. Alteration of these modifications correlates with cellular differentiation, implantation, and tissue development (5,6). Such alteration is required for the induction of many forms of synaptic plasticity such as long term potentiation and depression in learning and memory formation. For example, aberrant glycosylation and abnormal phosphorylation have been found to be associated with many neurological disorders such as Alzheimer disease (7)(8)(9), Parkinson disease (10,11), Guillain-Barré and Miller-Fisher syndromes (12), and even muscle-eye-brain disease (13). Understanding the brain glyco-and phosphoproteomes is therefore not only essential for studying the biology of these diseases but also aiding drug discovery and translational research as many of the glycoproteins have the potential to be viable drug targets due to the ease of their accessibility on the cell surface.
Study of the glyco-and phosphoproteomes of cell membrane presents a number of challenges mainly due to their low abundance, their large dynamic range, and the inherent hydrophobicity of membrane proteins. It has been estimated that less than 5% of the peptides in a typical complex protein digest are either N-linked glycosylated (14) or phosphorylated (15). Therefore, development of efficient protocols for the enrichment of glycopeptides and phosphopeptides is essential for their subsequent detection and identification.
Methods used for the enrichment of glycopeptides and glycoproteins have varied with time and among investigators. Lectin-based affinity enrichment is one of the earliest and most widely used approaches for the analysis of glycoproteins and their associated carbohydrates (16 -19). Generally, lectins are highly specific for a particular type of carbohydrate moiety, such as wheat germ lectin, which mainly recognizes GlcNAc, and concanavalin A lectin specific for mannose (20). Combining the use of two or more types of lectins with potentially complementary binding properties is often required to recover a significant number of glycoproteins from a complex sample (18,21,22). Another commonly used method is utilizing hydrophilic affinity physicochemical chromatography. This method is based on hydrogen bonding between the hydroxyl group of carbohydrate moieties in glycopeptides and hydrophilic materials such as Sepharose or cellulose (23,24). Thus, this enrichment has a much wider range in terms of the types of glycoproteins recovered as compared with the lectinbased method. In addition, Ding et al. (25) demonstrated that tryptic glycopeptides can be eluted as a set after the tryptic non-glycopeptides in the pure hydrophilic interaction liquid chromatography mode by increasing the hydrophobicity of peptides with trifluoroacetate as an ion-pairing agent. Recently, a method utilizing hydrazide chemistry has gained increasing popularity for the study of the N-linked glycoproteome (26,27). It has been applied to high throughput global analysis of N-glycoproteins from various samples such as human serum (28), saliva (29), platelet (30), liver (31), and cancer tissue (32). Although this approach is theoretically able to capture all types of glycoproteins or glycopeptides, a total of only 1522 unique glycosylation sites was collectively identified in nearly 10 studies, representing about 3% of the total predicted glycosylation sites in the human proteome (14). In recent attempts to identify more glycoproteins in a sample and expand coverage of the N-linked glycoproteins, different groups have used different methods in parallel to simultaneously analyze a sample. Cao et al. (33) utilized the hydrazide method and hydrophilic affinity to identify glycosylation sites in secreted proteins and reported a total of 300 glycosylation sites with 159 and 261 from each of the methods, respectively. Lee et al. (22) used the hydrazide method and three lectins for a study of rat liver glycoproteins. They identified a total of 335 glycoproteins with 202 from the lectin method and 210 from the hydrazide method. These studies demonstrated that current methods are complementary; hence, combined use of them could enhance glycoprotein recovery, although the overall efficiency remains relatively low.
As with the need to develop protocols for glycopeptide enrichment, many methods for phosphopeptide enrichment have also been described. These include phosphoramidate chemistry (34), immunoprecipitation with phosphospecific antibodies (35), IMAC (36), strong cation exchange (SCX) 1 chromatography (37), and titanium dioxide (TiO 2 ) chromatography (38). Each method has its unique advantages and shortcomings and analyzing a sample either by using different methods in parallel or combining different methods into one would often enrich more phosphopeptides and therefore identify more phosphoproteins. Indeed, Villé n et al. (39) were able to identify more than 5,600 non-redundant phosphorylation sites on 2,300 proteins from mouse liver when using SCX chromatography followed by IMAC affinity purification. Similarly, when coupling SCX with TiO 2 chromatography, Olsen et al. (40) reported a total of 6,600 phosphorylation sites on 2,200 HeLa cell proteins. In addition, the TiO 2 and the IMAC method were found to be complementary (41), and using both methods in parallel to analyze a sample generated a combined set of information that surpassed the outcome derived using one method. However, an effective method for simultaneous enrichment of both glyco-and phosphopeptides is highly desirable.
Recently, a novel mode of chromatography termed electrostatic repulsion hydrophilic interaction chromatography (ER-LIC) has been introduced for enrichment of phosphopeptides based on both their electrostatic and hydrophilic properties (42). With the low pH and high organic content of the mobile phase, the majority of peptides with carboxyl groups at aspartic acid and glutamic acid residues and the C terminus are largely un-ionized and thus poorly retained by the weak anion exchange (WAX) column, whereas phosphopeptides and highly hydrophilic peptides will interact strongly with the column and are retained. A salt and aqueous gradient can then be used to gradually elute phosphopeptides from the column. Typically, buffer A (10 mM sodium methyl phosphonate and 70% acetonitrile, pH 2.0) and buffer B (200 mM triethylamine phosphate with 60% acetonitrile, pH 2.0) are used to create a gradient for the enrichment and fractionation of the phosphopeptides from a cell lysate digest (43). This enrichment method has been found to be comparable with the hydrazide method in the identification of glycoproteins when a platelet digest was used as a starting material (44). The ERLIC enrichment is mainly based on the negatively charged sialyl group in glycopeptides (44). However, it might be able to enrich other hydrophilic glycopeptides when an oligosaccharide side chain is a large and hydrophilic domain that causes a significant increase in retention time of the peptide in any hydrophilic interaction liquid chromatography-based mode of ERLIC.
We report here an improved protocol using ERLIC for the simultaneous enrichment of glyco-and phosphopeptides from mouse brain membrane preparation. With this protocol, the yields of glycoproteins (544) and glycosylation sites (922) were significantly higher than those from the hydrazide chemistry method (192 glycoproteins and 345 glycosylation sites). In total, 995 glycosylation sites were identified, 96% of which were considered new as they were either annotated as putative or not documented in the newly released Swiss-Prot database. Moreover, a total of 383 phosphoproteins including 915 phosphorylation sites was recovered from the ERLIC approach, suggesting that this protocol has potential for the simultaneous study of both glyco-and phosphoproteomes.

MATERIALS AND METHODS
Mice and Tissue Preparation-All animal procedures were performed according to the protocols approved by Nanyang Technological University for Biological Studies and Animal Care and Use Committees. A single C57BL/6J inbred strain mouse purchased from Centre for Animal Care, National University of Singapore was used in this study. At the age of 8 weeks, the mouse was euthanized with excess isoflurane anesthesia. The whole intact brain was collected, snap frozen in liquid nitrogen, and then stored at Ϫ80°C.
Membrane Protein Extraction and Digestion-Three frozen brains (ϳ1.5 g, wet weight) were ground in liquid nitrogen in a prechilled mortar with pestle. The fine powder was transferred to a 2-ml Eppendorf tube and HES buffer (20 mM HEPES, pH 7.4, 1 mM EDTA, 250 mM sucrose) supplemented with protease inhibitor (10 l/mg of tissue) (Roche Diagnostics). The sample was sonicated three times on ice using a Vibra Cell TM high intensity ultrasonic processor (Jencon, Leighton Buzzard, Bedfordshire, UK). The remaining debris and unbroken cells were removed by centrifugation at 1000 ϫ g at 4°C for 10 min. The supernatant was transferred to a new tube and centrifuged at 100,000 ϫ g at 4°C for 45 min. The pellet containing membrane fractions was washed once with Na 2 CO 3 (0.1 M, pH 11) and twice with Milli-Q water, respectively, followed by centrifugation at 100,000 ϫ g at 4°C. The membrane pellet was then dissolved in 8 M urea solution, and protein content was determined with a 2-D Quant kit (GE Healthcare) according to the manufacturer's instructions. Approximately 6 mg of protein was aliquoted into six tubes and used for subsequent experiments. The proteins were reduced with 10 mM tris(carboxyethylphosphine) hydrochloride for 30 min at 37°C and alkylated with 40 mM iodoacetamide for 1 h at room temperature. Proteins were then diluted 7-fold with 50 mM NH 4 HCO 3 prior to digestion with trypsin (Promega, Madison, WI) overnight at 37°C in a 1:50 trypsin-to-protein mass ratio. The protein digests were desalted using Sep-Pak C 18 cartridges (Waters) and dried in a SpeedVac (Thermo Electron, Waltham, MA).
Enrichment of N-Linked Glycopeptides and Phosphopeptides Using ERLIC-ERLIC buffer A (10 mM sodium methyl phosphonate with 70% acetonitrile, pH 2.0) was prepared by addition of NaOH to a solution of methylphosphonic acid (Sigma-Aldrich) in water followed by addition of acetonitrile. Buffer B (200 mM triethylamine phosphate with 25% acetonitrile, pH 2.0) was prepared by addition of triethylamine (Sigma-Aldrich) to a solution of phosphoric acid in water followed by addition of acetonitrile. A total of 3 mg of protein digest was used for three replicate ERLIC runs. For each run, ϳ1 mg of digest reconstituted in 200 l of buffer A was loaded into a PolyWAX LP TM column (4.6 ϫ 200 mm, 5-m particle size, 300-Å pore size; PolyLC, Columbia, MD) on a Prominence TM HPLC unit (Shimadzu, Kyoto, Japan). The sample was fractionated using a gradient of 100% buffer A for 10 min, 0 -30% buffer B for 25 min, 30 -100% buffer B for 5 min, and finally 100% buffer B for 10 min at a constant flow rate of 1 ml/min for a total of 50 min. The eluted fractions were monitored via a UV detector at 214 nm wavelength. Fractions were collected at 1-min intervals and dried in a SpeedVac. The consecutive fractions were combined, and finally a total of seven fractions was obtained prior to desalting using Sep-Pak C 18 solid phase extraction cartridges (Waters). The samples were then dried, reconstituted into 30 l of 50 mM NH 4 HCO 3 , and treated with 500 units/l peptide-N-glycosidase F (New England Biolabs, Ipswich, MA) at 37°C overnight. All samples were acidified with addition of 1.5 l of 100% formic acid and dried in a SpeedVac.
Enrichment of N-Linked Glycopeptides Using Hydrazide Chemistry-As with the ERLIC enrichment, a total of 3 mg of protein digest was used for experiments in triplicate with each using 1 mg of digest. The glycopeptides were enriched using the published protocols (26,27) with some modifications. Briefly, the dried protein digests were dissolved in 200 l of coupling buffer (100 mM sodium acetate, 150 mM sodium chloride, pH 5.5) and oxidized by incubation with sodium periodate (final concentration of 10 mM) in the dark for 30 min at room temperature. The reaction was stopped by incubating with a final concentration of 20 mM sodium sulfite solution for 10 min. The oxidized peptides were then coupled with 50 l of prewashed Affi-Gel hydrazide gel (Bio-Rad) overnight at room temperature with end-overend rotation. After the coupling, the gel was washed twice sequentially with 1 ml of H 2 O, 1.5 M sodium chloride, methanol, acetonitrile, and 50 mM NH 4 HCO 3 to remove nonspecific binding. The glycopeptides were then enzymatically released from the gel by incubation with 200 l of peptide-N-glycosidase F in 50 mM NH 4 HCO 3 (500 units/l) at 37°C overnight. After centrifugation at 500 ϫ g, the supernatant was collected, and the gel was washed with 200 l of 50 mM NH 4 HCO 3 , 50% acetonitrile, and 80% acetonitrile sequentially. The supernatant and all washings were combined, acidified with formic acid, and dried in a SpeedVac.
Mass Spectrometric Analysis-Each dried fraction was reconstituted in 100 l of 0.1% formic acid and analyzed twice using an LTQ-FT Ultra mass spectrometer (Thermo Electron) coupled with a Prominence TM HPLC unit (Shimadzu). For each analysis, 50 l of the samples was injected from an autosampler (Shimadzu) and concentrated in a Zorbax peptide trap (Agilent, Palo Alto, CA). The peptide separation was performed in a capillary column (200-m inner diameter ϫ 10 cm) packed with C 18 AQ (5-m particles, 300-Å pore size; Michrom Bioresources, Auburn, CA). Mobile phase A (0.1% formic acid in H 2 O) and mobile phase B (0.1% formic acid in acetonitrile) were used to establish the 90-min gradient comprising 3 min of 0 -5% B and then 52 min of 5-25% B followed by 19 min of 25-80% B, maintenance at 80% B for 8 min, and finally re-equilibration at 5% B for 8 min. The HPLC system was operated at a constant flow rate of 30 l/min, and a splitter was used to create a flow rate of ϳ500 nl/min at the electrospray emitter (Michrom Bioresources). The sample was injected into an LTQ-FT through an ADVANCE TM CaptiveSpray TM source (Michrom Bioresources) with an electrospray potential of 1.5 kV. The gas flow was set at 2, ion transfer tube temperature was 180°C, and collision gas pressure was 0.85 millitorr. The LTQ-FT was set to perform data acquisition in the positive ion mode as described previously (43). Briefly, a full MS scan (350 -2000 m/z range) was acquired in the FT-ICR cell at a resolution of 100,000 and a maximum ion accumulation time of 1000 ms. The automatic gain control target for FT was set at 1eϩ06, and precursor ion charge state screening was activated. The linear ion trap was used to collect peptides and to measure peptide fragments generated by CID. The default automatic gain control setting was used (full MS target at 3.0eϩ04, MS n at 1eϩ04) in the linear ion trap. The 10 most intense ions above a 500-count threshold were selected for fragmentation in CID (MS 2 ), which was performed concurrently with a maximum ion accumulation time of 200 ms. An MS 3 scan was followed after each MS 2 scan when neutral losses of 98 Da for 1ϩ, 49 Da for 2ϩ, or 32.7 Da for 3ϩ ions were detected. Dynamic exclusion was activated for this process with a repeat count of 1, exclusion duration of 20 s, and Ϯ5-ppm mass tolerance. For CID, the activation Q was set at 0.25, isolation width (m/z) was 2.0, activation time was 30 ms, and normalized collision energy was 35%.
Database Searching-All MS and MS/MS data were searched using both Sequest in Bioworks Browser (version 3.3, Thermo Fisher Scientific Inc.) and Mascot (version 2.2.04, Matrix Science, Boston, MA) search engines. The IPI mouse protein database (45) (version 3.55; 55,956 sequences) and its reversed complement were combined and used for the searches. For the Sequest search, the peak lists (dta files) were first generated from the raw data by Bioworks Browser (version 3.3, Thermo Fisher Scientific Inc.) for the search. For the Mascot search, the raw data were first converted into the dta format using the extract_msn (version 4.0) in the Bioworks Browser. These dta files were then converted into Mascot generic file format using an in-house program prior to the Mascot search as described (46). In both searches, enzyme limits were set at full tryptic cleavage at both ends; a maximum of two missed cleavages was allowed; mass tolerances of 10 ppm for peptide precursors was used; carboxamidomethylation (ϩ57.02) at cysteine residues was set as a fixed modification; and oxidation (ϩ15.99) at methionine, phosphorylation at serine, threonine, or tyrosine (ϩ79.96), and deamidation (ϩ0.98) at asparagine or glutamine were set as variable modifications. A mass tolerance of 0.5 Da was set for fragment ions in both Mascot and Sequest searches. To achieve high confidence identification, peptide matches were filtered with an expectation value of less than 0.05 in the Mascot search and peptide probability cutoff of 0.05, ⌬Cn Ͼ 0.08, and XCorr cutoffs of 1.5, 2.0, and 2.5 for 1ϩ, 2ϩ, and 3ϩ charged peptides, respectively, in the Sequest search. After filtering, the false positive rates (FPR ϭ 2 ϫ N rev /N total where N rev is the number of peptides identified from the reversed sequences and N total is the number of total peptides identified) for the ERLICs and the hydrazides were estimated to be 0.82 and 1.3% via Mascot search and 0.75 and 1.7%, respectively, via Sequest by the "target-decoy" database search strategy (47). Peptides identified with a consensus NX(S/T) (with X not proline) and a modification of deamidation at the Asn were regarded as N-linked glycopeptides. The deamidation (Asn to Asp) can be wrongly assigned from database searches if an isotopic peak of a precursor is incorrectly assigned as a "monoisotopic peak." This occurs occasionally when a precursor ion signal is weak. To eliminate such false positive assignments, we integrated the area of the peak located immediately before the assigned monoisotopic peak and filtered out the assignments when the peak area over the monoisotopic peak was above 1%. Furthermore, a set of non-redundant glycoproteins was reported by inclusion of only the protein isoform with a higher protein score assigned by the database search engines when the protein isoforms share the same glycopeptides. The peptide/protein lists obtained were either exported to Microsoft Excel or processed using in-house scripts for further analysis.
Protein Classification and Functional Annotation-Subcellular and functional categories were based on the annotations of gene ontology (GO) using the MGI GO_Slim Chart Tool. Signal peptides were predicted using SignalP 3.0 (48). The number of transmembrane helices of all identified glycoproteins was predicted using TMHMM 2.0 (49). Cell surface, secreted, and transmembrane proteins were classified based on SignalP and TMHMM information and were grouped as extracellular proteins as described (14). Pathway analysis was performed based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway collection (50).

RESULTS AND DISCUSSION
Experimental Design-Recovery of glyco-and phosphoproteins as well as their modification sites poses many challenges, the greatest being the loss of the low abundance glyco-and phosphopeptides during isolation and detection. To maximize the recovery of glyco-and phosphoproteins from the mouse brain membrane preparation, we adopted a unique combination of individually successful approaches such as subfractionation to obtain a purified membrane fraction (51) followed by parallel use of two enrichment approaches with potentially high efficiency (hydrazide chemistry and ERLIC). This was finally coupled to duplicate MS analysis and two types of database searches (Table I) that ensured statistical consistency and better coverage for identification.
The glycopeptide capture based on hydrazide chemistry has been standardized and widely used (26,27), whereas ERLIC is a relatively new method. Initial adaptations of the ERLIC protocol for phosphopeptide binding and elution to the capture of glycopeptides yielded a number of glycoproteins comparable to that of the hydrazide method (44). As phosphopeptides and glycopeptides differ in their charge and hydrophilicity, we hypothesized that the retention time and optimal elution conditions for glycopeptides would be different from those for phosphopeptides. Indeed, we found that many glycopeptides did not elute until a low organic content (ϳ30%) was reached. The broader gradient window for the elution of glycopeptides (70 -30% organic content) as compared with phosphopeptides (70 -60%) is probably due to the higher complexity and heterogeneity of carbohydrate moieties. Thus, a new buffer system consisting of buffer A (10 mM sodium methyl phosphonate and 70% acetonitrile, pH 2.0) and buffer B (200 mM triethylamine phosphate with 25% acetonitrile, pH 2.0) was used for subsequent glyco-and phosphopeptide enrichment and fractionation. Fig. 1 shows the ERLIC chromatograms for the fractionation of the mouse brain membrane digest. As this study focused on the identification of post-translational modifications, the bulk of peptides eluted in the first 7 min was not collected. In each ERLIC run, a total of 41 fractions from 7 to 48 min was collected, but only seven fractions were finally submitted for LC-MS/MS analysis by combining five consecutive fractions with the exception of the last fraction (Fig. 1, a  and b) as we anticipated the peptide complexity to be substantially reduced such that it was below the detection sensitivity of one-dimensional LC-MS/MS.
Identification of N-Linked Glycoproteins and Glycosylation Sites-Protein identification was achieved by using two search engines, Mascot and Sequest, for the interpretation of the MS/MS spectra. The false positive rate of overall peptide Mascot, Sequest assignment in both searches was controlled within 1%. The numbers of total peptides, glycopeptides, and proteins identified by the two engines were comparable. Identifying the correct isoform of the protein is a common problem of shotgun proteomics studies. This poses a greater challenge in glycoproteomics as a glycopeptide can possibly be assigned to multiple proteins with each having supporting unique peptides. For example, Nrcam isoform 2 and isoform 3 share four glycopeptides but also have respective unique peptides (K2EDAHADPEIQPMKEDDGTFGEYSDAEDHKPLK2K in isoform 2 and K2EKEDAHADPEIQPMKEDDGTFGEYR2S in isoform 3). In this case, we reported only the glycoprotein isoform with a higher protein score assigned by the database search engine to achieve a set of non-redundant glycoproteins. The statistics of identified proteins, peptides, and post-translational modification sites are summarized in supplemental Table 1.
Using the Mascot search engine, a total of 738 unique glycosylation sites assigned to 446 glycoproteins was identified from the three ERLIC replicates (Fig. 2). In contrast, only 259 unique glycosylation sites and 153 glycoproteins were recovered from the three hydrazide replicates. The non-redundant MS/MS spectra of the glycopeptides enriched by the ERLIC and the hydrazide chemistry in Mascot peptide view format are shown in supplemental Data 1 and 4. The numbers of identified glycosylation sites and glycoproteins in the ERLIC approach (mean Ϯ S.D., 560 Ϯ 70 sites; 352 Ϯ 30 glycopro-teins) were significantly higher than those from the hydrazide method (216 Ϯ 9 sites; 131 Ϯ 4 glycoproteins) (Fig. 2c). In the ERLIC approach, 52.4% of glycosylation sites were common in all three replicates, and an average of 65.5% was found to be overlapping between any two replicates. About 58.3% of the glycoproteins were identified from all three replicates with an average of 70.1% overlapping between any two replicates (Fig. 2a). A slightly higher proportion of glycosylation sites (68.3%) and glycoproteins (72.7%) was observed in all three replicates from the hydrazide method (Fig. 2b). Clearly, repeated analyses of a single sample enhanced the number of glycosylation sites and glycoproteins identified from the sample as is true of shotgun proteomics in general. Similar results were obtained via a Sequest search: a total of 827 glycosylation sites (528, 704, and 604 from each ERLIC) and 495 glycoproteins (348, 430, and 373, respectively) were identified from the three ERLIC replicates, whereas 283 glycosylation sites (232, 219, and 220 from each hydrazide) and 174 glycoproteins (148, 136, and 140, respectively) were recovered from the three hydrazide replicates. We found that the ERLIC and hydrazide results contained 12.1 and 9.5% non-NX(S/T) motif sites, respectively; these sites are likely generated by nonspecific deamidation during the sample preparation.
The advantage of using a combination of search engines is shown in Fig. 3. The use of a second search engine (Sequest) yielded 184 additional glycosylation sites and 98 additional glycoproteins in the ERLICs and an additional 92 sites and 38 FIG. 1. ERLIC chromatograms. a, a blank run shows the LC gradient and background UV absorption chromatogram at 214 nm. b, 1 mg of mouse brain tryptic digest. Forty fractions from 7 to 50 min were collected and then combined to seven final fractions as shown for LC-MS/MS analysis. proteins in the hydrazide run, respectively. Fig. 4 shows the representative spectra that were confidently assigned by one search engine but not the other. Aminopeptidase N (CD13), present in various brain structures (52), was identified with three unique glycopeptides with good Mascot scores of 61, 54, and 50, respectively. In Sequest, the corresponding scores for the three peptides were XCorr 3.29 (R2gNATLVNEADKLR2S, 2ϩ, p 4.8eϪ8, ⌬Cn 0.029 where gN is deamidated Asn), XCorr 3.969 (R2FTCgNQTTDVIIIHSK2K, 3ϩ, p 1.0eϪ9, ⌬Cn 0.030), and XCorr 4.882 (K2SGQEDHYWLDVEKgNQSAK2F, 3ϩ, p  1.77eϪ06, ⌬Cn 0.011), respectively. Although the three peptides have relatively high Sequest scores, all ⌬Cn values below the set threshold of 0.08 resulted in the failure to identify this protein. On the other hand, CD63, abundantly expressed in neural tissue, was identified from Sequest with two significant peptides but not from Mascot because the scores of both peptides fell below the identity score. The confident identification of a peptide by only one search engine probably reflects the different scoring and probability calculations used by the engines, which are highly dependent on the charge state, residue composition and peptide length, and signal-to-noise ratios of MS/MS spectra (53).
In total, 995 glycosylation sites assigned to 562 non-redundant glycoproteins were identified in this study (Fig. 5). Information for each identified glycoprotein including protein accession number (IPI), protein description, identified glycopeptides, and unique glycosylation sites is listed in supplemental Table 2. Of these, the hydrazide chemistry followed by one-dimensional LC-MS/MS yielded 345 glycosylation sites and 192 glycoproteins, which is comparable to other studies. Lee et al. (22) reported 210 glycoproteins from rat liver membrane preparation. Zhang et al. (32) identified 445 glycosylation sites from prostate cancer tissue. Enrichment of glycopeptides from more starting material followed by analysis using two-dimensional LC-MS/MS could yield more glycoproteins and glycosylation sites. With this approach, Chen et al. (31) were able to identify 622 glycosylation sites from 5 mg of tryptic human liver protein digest, and when using two other enzymes, thermolysin and pepsin, in parallel experiments, they identified an additional 317 sites. In our ERLIC approach, enrichment and fractionation were achieved in a single step. The analysis of fractions by one-dimensional LC-MS/MS yielded 922 glycosylation sites and 544 glycoproteins from a relatively small amount of samples (3 mg). This represents 92.7% of the total identified glycosylation sites and 96.8% of the total glycoproteins. The hydrazide method provided an additional 73 glycosylation sites (7.3%) and 18 glycoproteins (3.2%). The higher percentage of new glycosylation sites over new glycoproteins from the hydrazide chemistry method suggests that higher glycosylation site coverage of each glycoprotein could be achieved when combining the ERLIC and hydrazide methods.
We next matched these 995 glycosylation sites with the newly released Swiss-Prot knowledge database for rodent species (released on May 26, 2009). It is worth mentioning that the positions of the glycosylation sites identified from the IPI database were different from those in the Swiss-Prot database in some of the glycoproteins. For example, CD98 heavy chain (IPI00114641) was identified with four glycosylation sites at Asn-172, Asn-265, Asn-391, and Asn-405, whereas in the Swiss-Prot database, these sites become Asn-166, Asn-259, Asn-385, and Asn-399 because of sequence cleavage or protein truncation/isoform. Such changes in site position were manually verified based on sequence match and included in supplemental Table 2. Of the 995 glycosylation sites, only 39 are documented as valid N-linked glycosylation sites with experimental proof of which two are high mannose type; 62.0% (617) of them are annotated as potential (601 sites), probable (five sites), or by similarity (11 sites); and 34.1% (339) of them were not documented as N-linked glycosylation sites (Fig. 6a). This shows that glycoproteins and glycosylation sites are largely underrepresented in the current databases, and the use of a glycoproteomics approach allows not only a large scale validation of potential glycosylation sites but also identification of novel glycosylation sites (Fig. 6b).
Identification of Membrane Phosphoproteins-Phosphorylation of membrane proteins often initiates signal transduction pathways or attenuates plasma membrane transport processes (54). However, study of these proteins is challenging due to the inherent hydrophobicity of membrane proteins and low abundance of phosphopeptides. Here, we prepared protein digests from purified membrane proteins and enriched hydrophilic and negatively charged peptides using ERLIC. We showed that in addition to the identification of a high number of glycoproteins a total of 383 phosphoproteins was also identified with 915 unique phosphorylation sites identified (supplemental Table 3 peptide view in the supplemental Data 3, the phosphorylation site of most phosphopeptides is sandwiched by fragments still bearing the modification as well as the corresponding neutral loss (Ϫ98 Da) fragments; e.g. in the phosphopeptide QAD-VPAAVTDAAATpTPAAEDAATK (where pT is phosphothreonine) shown in MS/MS Spectrum 1, the phosphorylation site is sandwiched with fragments y4, y5, y6, y7, y8, y12, y12 Ϫ 98, y14, and y14 Ϫ 98. The reliability of the phosphorylation site assignment can be determined by these fragments.
Notably, about 41.2% of these phosphoproteins were also identified as being glycosylated, demonstrating the potential of our protocol for simultaneous identification of both posttranslational modifications in a single protein. For example, L1cam (neural cell adhesion molecule (NCAM) L1 precursor, IPI00115762) is predicted to have 21 potential N-linked glycosylation sites and six phosphorylation sites. Of these, we were able to identify 11 glycosylation and four phosphorylation sites. Glycosylation and phosphorylation may modulate protein function separately or in a cooperative way. For example, modulation of L1 function by NCAM occurs through the recognition of L1 carbohydrate, whereas phosphorylation of its tyrosine and serine regulates cytoplasmic interactions, L1 mobility, and internalization (55). Simultaneous monitoring of the two types of modifications might provide more insights into the functions of the protein of interest.
Distribution of Glyco-and Phosphopeptides in ERLIC- Fig.  7a shows that the average number of peptides identified from each fraction decreased slightly from fraction 1 to 7, but the ratio of glyco-and phosphopeptides over other peptides changed from about 20 to 80% (Fig. 7b), confirming that these two modified peptides had an elution profile different from that of other peptides under our two-buffer elution system. In addition, we observed that phosphopeptides were eluted earlier with more peptides in the earlier fractions and fewer in the later fractions. In contrast, glycopeptides eluted later with fewer glycopeptides in the earlier fractions and more in the later fractions. This suggests that the use of ERLIC could not only simultaneously enrich glyco-and phosphopeptides but also differentially fractionate peptides into glycopeptides and phosphopeptides by using an optimized elution gradient.
Functional Annotation of Identified Glycoproteins-Of the 562 identified glycoproteins, 329 proteins have a predicted signal sequence, and 405 proteins were predicted to contain transmembrane domains. Most of them contained either one or two transmembrane domains, but as many as 198 proteins with Ն3 transmembrane domains were also identified. Based on these data, 91.1% of the proteins were further classified as extracellular proteins consisting of cell surface, secreted, and transmembrane proteins. This is consistent with the subcellular classification of these proteins using GO from MGI. As shown in Fig. 8a, the higher abundance groups consist of plasma membrane/other membrane (71%), endoplasmic reticulum/Golgi (9%), and extracellular matrix proteins (9%), whereas intracellular proteins only account for 11% of total identified glycoproteins of which less than 1% are of cytosolic origin. The assignment of N-glycoproteins, at least some of them, to intracellular origin might be due to GO annotation error or the presence of the extracellular form of the protein (14). Thus, enrichment of glycopeptides is not only essential to the study of protein glycosylation but is also an effective approach to identify membrane proteins, which are underrepresented in the database because of their hydrophobicity.
Glycoproteins in the brain are involved in various processes such as cell migration, neurite outgrowth and fasciculation, synapse formation and stabilization, and modulation of synaptic efficacy (56). As expected, the majority of the identified proteins were involved in transport (17%), development process (15%), signal transduction (12%), or cell adhesion (12%) (Fig. 8b). Only a modest number of proteins were associated with processes involved in the rare events of neural regeneration such as cell death (4%), cell cycle and proliferation (3%), and RNA metabolism (1%). As shown in Fig. 8c, the major molecular functions of the identified glycoproteins were binding activity (29%), catalytic activity (19%), and transporter activity (17%). Further pathway analysis of identified glycoproteins by KEGG mapped eight significant pathways (p Ͻ 0.05) (Fig. 8d). As neurons constitute a core component of the brain, as expected, most of the identified proteins were from pathways directly involved in neuron activity such as neuroactive ligand-receptor interaction, calcium signaling, axon guidance, and long term potentiation.
Identification of Disease-related Glycoproteins-Glycoproteins and their attached glycans have pivotal roles in nervous system development, regeneration, and synaptic plasticity. Alteration of N-glycans in humans results in many congenital and chronic neurological disorders such as epilepsy, ataxia, FIG. 6. Annotation of identified glycosylation sites. a, matches of identified glycosylation sites with the Swiss-Prot knowledge database of rodent. b, spectrum for identification of a novel glycosylation site in R2GETASLLCgNISVR2G (m/z ϭ 710.857, z ϭ 2ϩ) in Igsf8 (immunoglobulin superfamily member 8 precursor, IPI00321348). Igsf8 has three potential sites; we found all of them as well as a novel site, Asn-461 (shown). c, spectrum for identification of a glycosylation site in K2DYGgNYTCVATNK2L (m/z ϭ 703.797, z ϭ 2ϩ) in Opcml (opioid-binding cell adhesion molecule, IPI00463489). This protein was not documented in the Swiss-Prot database. cerebellar and cerebral atrophy, and abnormal eye movements (56). The severity of these diseases is largely dependent on the extent of the N-glycan impairment.
The mouse brain glycoproteins have an N-glycan profile similar to that of glycoproteins from human (57), which suggests that information derived from the study of mouse brain could be of significance in human. In this study, we found a number of glycoproteins that are known to be related to various neurological disorders (supplemental Table 4). The major prion protein (Prnp) has two potential N-glycosylation sites, mutations in which have been associated with aging and prion disease such as Creutzfeldt-Jakob disease (58,59). The tripeptidyl-peptidase 1 (TPP1) has a role in regulation of intracellular lipopigment storage material. A defect in the Nglycosylation of Asp-286 results in accumulation of those materials and is clinically characterized with neuronal ceroid lipofuscinoses (60,61). In Alzheimer disease, the clinical feature is the formation of extracellular amyloid plaque in the brain, which is putatively caused by a mutation in the amyloid ␤ precursor protein (APP) and/or its regulators (62). Recently, alteration of the glycosylation of APP was implicated in Alzheimer disease (63). Here, we identified not only APP but also its regulators such as membrane metalloendopeptidase (Mme) and protein phosphatase 3, catalytic subunit, ␣ isoform (Ppp3ca). Mme has a role in the degradation of excess and misfolded APP peptide (64), and the activity of protein phosphatase tightly regulates APP secretion (65). In addition, we found six proteins that are related to diabetes, a disease characterized by insulin resistance and hyperinsulinemia. Insulin signaling has an important role in neuronal growth and differentiation (66), and a decrease in the number or defects in the function of insulin receptors in the brain has been implicated in the development of type 2 diabetes (67). As a result, type 2 diabetes has been proposed as a brain disorder (68). The identification of abundant diabetes-related proteins in the mouse brain might provide support of this classification.
Many other disease-related proteins such as Niemann-Pick C1 protein (NPC1) (Niemann-Pick type C disease), solute carrier family 18, member 2 (Slc18a2) (Parkinson disease), and L1 cell adhesion molecule (MASA (mental retardation, aphasia, shuffling gait, and adducted thumbs) syndrome) were also identified. Most of these diseases are caused not only by abnormal protein expression but also by aberrant glycosylation. Importantly, in the ERLIC approach, the glycopeptide and its glycan are recovered together, permitting a further detailed glycan analysis. This is in contrast to the hydrazide chemistry method where glycan information is lost during the recovery of glycopeptides.
Conclusion-The geriatric proportion of the population is growing faster worldwide, and it has been estimated that by the year 2020 over 70% of the global burden of diseases in developing and newly industrialized countries will be contributed by degenerative illness apart from cardiovascular diseases, cancer, and others (69). Aberrant glycosylation and phosphorylation have been implicated in numerous acute and chronic neurological disorders (7-9) for which effective noninvasive diagnostic tools or successful therapies have yet to become available. Hence, understanding the molecular mechanism at the level of proteins and their post-translational modifications could facilitate the identification of therapeutic targets or potential biomarkers for these disorders. Here, we studied the mouse brain glycoproteome using an optimized ERLIC as well as the hydrazide chemistry approach. We report a total of 562 glycoproteins and 995 glycosylation sites, representing the largest data set of brain-derived glycoproteome documented so far. About 96% of the identified glycosylation sites are new as they are either not recorded or annotated as putative glycosylation sites in the curated Swiss-Prot database. Of the 562 glycoproteins, 13.7% (77) of them are known to be related to various neurological disorders, providing a potential value to understand the mechanism of brain disease and to identify novel disease biomarkers.
We showed here that the ERLIC approach is more efficient in the identification of glycopeptides and glycoproteins than the hydrazide chemistry method (Fig. 5). A bonus is that ERLIC enriches glycopeptides without destroying their attached glycan, which could permit further glycan analysis. Additionally, ERLIC, which has been shown to be efficient in enriching for phosphopeptides (42,43), can be adapted to simultaneously enrich for both glycopeptides and phosphopeptides that together represent proteins with the two most important post-translational modifications. Some of the identified proteins had both glycosylation and phosphoryla-tion modifications. For example, we identified 11 glycosylation sites and four phosphorylation sites in NCAM L1 precursor. Simultaneous identification of two types of posttranslational modifications is not only useful for studying the function of a single protein but also helps to elucidate systemic changes of the whole proteome at different cell states. In summary, we demonstrate that ERLIC is a simple and robust approach for not only the recovery of glycoproteins but also the simultaneous recovery of both glycoproteins and phosphoproteins. This approach enabled us to identify the largest set of glycoproteins from mouse brain so far, which could be of significant value in complementing the current glycoprotein database. We expect that ERLIC could help to uncover more glycoproteins and novel glycosylation sites from other samples such as body fluid, cells, or tissues.
Supplemental Data-Detailed information on all identified proteins is listed in supplemental Table 5 (from the Mascot search) and supplemental Table 6 (from the Sequest search). All MS 2 spectra of glyco-and phosphopeptide assignments are clearly shown in supplemental spectra hosted on https://proteomecommons.org/tranche/ (titled "Simultaneous Characterization of Glyco-and Phospho-proteomes of Mouse Brain Membrane Proteome Using ERLIC Chromatography") and downloadable using the following hash: LCcjHCB0H/g-7PpRF7Q5uAo0vh2MIEWGi56q8jy/dqQ3OBOk6JNsXXU-PoyUEgctUSϩBmB1PpyRSOYRc54Mu46im1RxeoAAAAAA-AACwAϭϭ. Alternatively, the non-redundant MS 2 spectra of glyco-and phosphopeptide assignments are provided in Supplemental Data 1-4.