A Strategy for Precise and Large Scale Identification of Core Fucosylated Glycoproteins*S

Core fucosylation (CF) patterns of some glycoproteins are more sensitive and specific than evaluation of their total respective protein levels for diagnosis of many diseases, such as cancers. Global profiling and quantitative characterization of CF glycoproteins may reveal potent biomarkers for clinical applications. However, current techniques are unable to reveal CF glycoproteins precisely on a large scale. Here we developed a robust strategy that integrates molecular weight cutoff, neutral loss-dependent MS3, database-independent candidate spectrum filtering, and optimization to effectively identify CF glycoproteins. The rationale for spectrum treatment was innovatively based on computation of the mass distribution in spectra of CF glycopeptides. The efficacy of this strategy was demonstrated by implementation for plasma from healthy subjects and subjects with hepatocellular carcinoma. Over 100 CF glycoproteins and CF sites were identified, and over 10,000 mass spectra of CF glycopeptide were found. The scale of identification results indicates great progress for finding biomarkers with a particular and attractive prospect, and the candidate spectra will be a useful resource for the improvement of database searching methods for glycopeptides.

Glycoproteins are implicated in a wide range of biological processes such as fertilization, development, the immune response, cell signaling, and apoptosis. Altered glycosylation patterns can affect the conformations of glycoproteins and their functions and interactions with other molecules (1,2).
Abnormal glycosylation has been demonstrated in many pathological processes. Targeted glycosylation research is considered increasingly important as a way to find novel therapeutic approaches (2,3), and core fucosylation (CF) 1 glycoproteomics has attracted particularly great attention (4,5). Previous reports show that CF glycoproteins are involved in many important physiological processes, such as transforming growth factor-␤1 (6) and epidermal growth factor signaling pathways (7). They also play key roles in many pathological processes, such as hepatocellular carcinoma (HCC) (8,9), pancreatic cancer (10,11), lung cancer (6,12), ovarian cancer (13), and prostate cancer (14). Moreover the CF patterns of several glycoproteins have been reported to serve as more sensitive and specific biomarkers than their total respective protein levels (8,9,15,16). The combination of a biomarker panel of CF glycoproteins is expected to serve as a more reliable diagnostic standard (13).
Glycoproteomics research has been conducted for several years and has led to the generation of many effective evaluation methods. Most of these methods use lectin or the chemical reagent hydrazide to enrich glycopeptides. The oligosaccharide chains are then completely released by treatment of the glycopeptides with peptide-N-glycosidase F. Finally the deglycosylated peptides and the deglycosylation sites are identified by tandem mass spectrometric analysis (17,18). Although impressive results have been attained, this commonly used strategy is not an ideal choice for CF glycoproteins research. First, the enrichment specificity of lectin is not satisfactory (19) as hydrazide chemical reactions irreversibly destroy glycan structures, particularly fucose tags. Second, the deglycosylation site is determined by the 0.9840-Da mass shift caused by the asparagine to aspartic acid transfer; its confidence can be compromised by deamination of the Asn. Besides that, the CF site can no longer be distinguished from other glycosylation sites in the same glycoprotein. Thus, the ideal way to precisely identify CF glycoproteins on a large scale is to provide direct evidence for the existence of CF modification. Traditional approaches, such as lectin blots, are not sufficiently powerful to meet this requirement. Instead recent advancements in high end MS-based techniques have ignited the hope to reach this challenging goal (20,21).
Our group has developed an innovative and systematic strategy for the precise and large scale identification of CF glycoproteins. Several steps were taken leading up to the development of our strategy. 1) We established a novel enrichment step for CF glycopeptides, combining the use of lectin for CF glycoprotein enrichment with ultrafiltration for further enrichment of glycopeptide. Glycopeptide enrichment by ultrafiltration based on molecular weight cutoff technology has the added merit of integrating enrichment, desalting, and concentration into a one-step operation. 2) We established a neutral loss-dependent MS 3 scan method that specifically captures partially deglycosylated CF glycopeptides (with fucosyl-N-acetylglucosamines residue retained). In MS 3 , the intensity distribution of the fragment peaks is much more homogeneous, and there are fewer theoretical fragment ions and interfering peaks than in MS 2 . 3) We established a novel database-independent candidate spectrum-filtering method for selecting partially deglycosylated CF glycopeptides and a spectrum optimization method. By introducing several strict and appropriate criteria into a scoring system, high quality candidate spectra can be selected before searching the database, which not only increases the database search efficiency but also improves the identification credibility. Further-more by statistically analyzing candidate spectra, some important glycan-related fragmentation patterns were revealed. Based on these observations, many kinds of interfering peaks due to glycan fragmentation that are always very intensive and would decrease the accuracy of peptide scoring can be localized and removed from the spectra. This treatment can effectively increase the number of identifications through database searching or de novo analysis.
The efficacy of this strategy was testified by implementing it on both healthy and HCC plasma. Respectively, 105 and 106 CF sites were identified from 72 and 79 glycoproteins, including 19 annotated potential glycosylation sites and 25 novel ones. This study holds promise for the large scale determination of core fucosylated biomarker panels from clinical samples, either body fluids or tissue biopsies.

EXPERIMENTAL PROCEDURES
Materials-The apotransferrin, fetuin, ribonuclease B, endoglycosidase F3, formic acid, TFA, ␣-cyano-4-hydroxycinnamic acid, and Lens culinaris lectin (agarose conjugate, saline suspension) were purchased from Sigma, methyl-␣-D-mannopyranoside was purchased from Fluka (St. Louis, MO), and sodium-3-[(2-methyl-2-undecyl-1,3dioxolan-4-yl)methoxy]-1-propanesulfonate (RapiGest TM SF) was purchased from Waters. Sequencing grade porcine trypsin was purchased from Promega (Madison, WI); IgG was purified by use of a HiTrap Protein G HP column from GE Healthcare. The PD-10 desalting column was also from GE Healthcare. Deionized water was produced by a Milli-Q A10 system from Millipore (Bedford, MA). HPLCgrade quality ACN was purchased from J. T. Baker Inc. Iodoacetamide and DTT were obtained from ACROS. The Handee mini spin column kit was purchased from Pierce. The C 18 ZipTip and Microcon YM-3 were purchased from Millipore. Recombinant human erythropoietin (rhEPO) was a gift from the National Institute for the Control of Pharmaceutical and Biological Products. Healthy human plasma (0.8 ml for each experiment) was obtained from a healthy donor. Samples of hepatocellular carcinoma plasma were mixed from eight patients with 0.1 ml from each person.
IgG Extraction-Plasma was supplemented with IgG binding buffer (20 mM sodium phosphate, pH 7.0), and then IgG was depleted by trapping on a column of HiTrap Protein G. The unbound samples were desalted by a PD-10 column.
Lectin Affinity-Samples were supplemented with 1.6 ml of lectin binding buffer (20 mM Tris-buffered saline, 0.3 M NaCl, 1 mM MnCl 2 , 1 mM CaCl 2 , pH 7.4). The samples were incubated for 16 h at 4°C with L. culinaris lectin in a spin column (about 300 l of lectin-agarose and 400 l of sample in each column). After unbound proteins were removed by washes with binding buffer, the CF glycoproteins were eluted with elution buffer (binding buffer supplemented with 200 mM ␣-Dmethylmannoside), then desalted (by PD-10 column), and lyophilized.
Reduction, Alkylation, and Trypsin Digestion-Samples were dissolved in 200 l of solution that contained 8 M urea and 5 mM DTT and were reduced at 37°C for 4 h. Then iodoacetamide was added to the solution (final concentration, 15 mM), which was then further incubated for 1 h in darkness at room temperature. Afterward 50 mM NH 4 HCO 3 was added to reduce the concentration of urea below 1 M, and sequencing grade trypsin was added at a ratio of enzyme to protein of 1:50. The mixture was then vortexed and incubated at 37°C overnight. 0.1% RapiGest SF was used instead of urea for protein denaturation in the repeat experiment of healthy and HCC plasma. TFA was added to the digested protein samples (final TFA concentration was 0.5%, pH Ͻ 2), and the samples were incubated at 37°C for 45 min. Finally the acid-treated samples were centrifuged at 13,000 rpm for 10 min, and the supernatants were collected. Enrichment, Desalting, and Concentration of Glycopeptides-Tryptic digests were pipetted into Microcon YM-3 centrifugal filter devices. The absolute amount of glycoprotein in the digests was between 200 and 300 g for each filter device, and the sample volume was diluted to 500 l for each filter device. The samples were centrifuged at 8000 ϫ g to reduce the sample volume from 500 l to about 20 l; this required about 3 h. Then 450 l of deionized water were added to the reservoir and centrifuged at 8000 ϫ g for 3 h; this was repeated twice. After that, the retentate fraction was transferred to a vial, and the reservoir was thrice washed with 20% ACN. All of the retentate fractions and wash solutions were pooled and lyophilized.
Endoglycosidase F3 Digestion-Glycopeptides were resuspended in 100 l of sodium acetate solution (50 mM, pH 4.5) and then incubated with endoglycosidase F3 overnight at 37°C. Ammonium acetate (50 mM, pH 4.5) was used instead of the sodium acetate in the repeat experiments of healthy and HCC plasma.
Strong Cation Exchange (SCX) Peptide Fractionation-10% enriched samples were directly analyzed with RP HPLC-MS two times. Other enriched CF glycopeptides were reconstituted with 300 l of 5 mM ammonium chloride, pH 3.0, 25% acetonitrile and fractionated by SCX chromatography on a BioBasic SCX 250 ϫ 4.6-mm column (Thermo Fisher). The particle size of the column was 5 m and pore size was 300 Å. The separations were performed at a flow rate of 0.5 ml/min using the Elite HPLC system, and mobile phases consisted of 5 mM ammonium chloride, pH 3.0, 25% acetonitrile (A) and 500 mM ammonium chloride, pH 3.0, 25% acetonitrile (B). After loading 300 l of sample onto the column, the gradient was maintained at 100% A for 10 min. Peptides were then separated using a gradient of 0 -15% B over 1 min followed by a gradient of 15-50% B over 49 min. Then the gradient was changed to 50 -100% over 5 min. The gradient was then held at 100% B for 5 min. A total of 15 fractions were collected, and each fraction was dried under vacuum.
FIG. 2. The neutral loss peaks in MS 2 spectra of partially deglycosylated CF glycopeptides. The intensities of the highest peaks are several times higher than that of the second most intense peak in all of these MS 2 spectra in the ion trap, resulting from loss of the fucose residue in CID. a, b, and c are MS 2 spectra from the same partially deglycosylated CF glycopeptide, EEQYJSTYR (from human IgG). Intensities of the base peaks were 1.86e5, 2.10e4, and 2.53e3, respectively. d and e are MS 2 spectra of simplified CF glycopeptides GQA-LLVJSSQPWEPLQLHVDK (intensity, 3.21e4; from rhEPO) and QQQHLFG-SJVTDC # SGNFC # LFR (intensity, 7.59e4; from apotransferrin). The MS 2 spectra in FT-ICR were collected to check the identities of the strongest peaks: f for IgG, g for d, and h for e. J, CF site; #, carbamidomethylation.
RP HPLC-MS n Analysis-RP HPLC-MS n experiments were performed on an LTQ-FT mass spectrometer (Thermo Fisher) equipped with a nanospray source and Agilent 1100 high performance liquid chromatography system (Agilent Technologies). Peptide mixes were separated on a fused silica microcapillary column with an internal diameter of 75 m and an in-house prepared needle tip with an internal diameter of ϳ15 m. Columns were packed to a length of 10 cm with a C 18 reversed phase resin (GEAgel C 18 SP-300-ODS-AP; particle size, 5 m; pore size, 300 Å; Jinouya, Beijing, China). Separation was achieved using a mobile phase from 1.95% ACN, 97.95% H 2 O, 0.1% FA (phase A) and 79.95% ACN, 19.95% H 2 O, 0.1% FA (phase B), and the linear gradient was from 5 to 50% buffer B for 80 min at a flow rate of 300 nl/min. The LTQ-FT mass spectrometer was operated in the data-dependent mode. A full-scan survey MS experiment (m/z range from 400 to 2000; automatic gain control target, 5e5 ions; resolution at 400 m/z, 100,000; maximum ion accumulation time, 750 ms) was acquired by the FT-ICR mass spectrometer, and the five most abundant ions detected in the full scan were analyzed by MS 2 scan events (automatic gain control target, 1e4 ions; maximum ion accumulation time, 200 ms). The scan model of MS 2 was set as the profile. An MS 3 spectrum was automatically collected when one of the three most intense peaks from the MS 2 spectrum corresponded to a neutral loss event of 73.0290 m/z, 48.6860 m/z, or 36.5145 m/z (charges of parent ions were not collected). The normalized collision energy was 35.
On-line Two-dimensional LC-MS n -The autosampler was used to inject samples onto the SCX column (BioX-SCX, 5 cm) after which they were eluted onto a trap column using a stepwise gradient of 0, 20, 30, 40, 50, 60, 70, 80, 90, and 100% SCX-B. Peptides on the trap column were desalted and then eluted onto the RP column and into the mass spectrometer (the same method as RP HPLC-MS n analysis, but the linear gradient was from 5 to 50% buffer B for 120 min). Mobile phase buffer for SCX-A was 10 mM citric ammonia buffer, pH 3.0, and mobile phase buffer for SCX-B was 50 mM citric ammonia buffer, pH 8.5. Experiments of HCC samples were analyzed by this system (Eksigent NanoLC-2D) and repeated one time.
Database Search and Analysis-Dta files were generated by Bioworks 3.2 with default parameters and then treated by spectrumfiltering and spectrum optimization tools in pFind 2.1 Studio. The candidate spectra of MS 3 were searched against UniProt Knowledgebase Release 12.6 (human, 76,137 entries; UniProt Knowledgebase Release 12.6 consists of UniProtKB/Swiss-Prot Release 54.6 of December 4, 2007 and UniProtKB/TrEMBL Release 37.6 of December 4, 2007) using the pFind 2.1 search engine. The database was modified by substituting the letter N in glycosylation sequence NX(S/T/C) with J, which was defined to have the same mass as Asn (21), and then the target and reversed decoy database were combined for the search. Carbamidomethylation was considered for all Cys residues. Variable modifications contained oxidation of Met residues, carbamidomethylation and carbamylation (carbamylation was only considered as a FIG. 3. MS 2 and MS 3 spectra of fucosyl-GlcNAc-attached peptides. The peak intensity distribution of the MS 3 spectrum is much more homogeneous than that of MS 2 , so better peptide sequence information can be obtained; the direct assignment of CF glycosites can be deduced from the b-type and y-type ion series attached with a GlcNAc residue in MS 3 . a and b are MS 2 and MS 3 spectra of GLC # VJASAVSR from insulin-like growth factor-binding protein 3, respectively. The peaks of b-type and y-type ions with or without GlcNAc residues appear synchronously and frequently, such as y 7 ϩ and b 6 ϩ . c and d are MS 2 and MS 3 spectra of a candidate that was analyzed de novo, respectively. The resulting de novo sequence GVEIJR (because the m/z of ion b 1 is too low to detect, the sequence of the first two residues can also be "VG," and "I" can also be "L" because of their same mass) was not found in the peptide database of tryptic digests (J located in the sequon NX(S/T/C) where X is any amino acid except proline). variable modification in experiments that used urea as the protein denature reagent) of peptide N-terminal and Lys residues, and a 203.0794-Da variable addition to J residues. At most, two missed tryptic cleavage sites were allowed. Tolerance of parent ions was Ϯ20 ppm, and tolerance of fragment ions was Ϯ0.5 m/z for the primary search. The final identified results had a 1% false-positive rate (22), and the tolerance for parent ions was Ϯ10 ppm. MALDI-TOF MS Analysis-After desalting with the C 18 ZipTip, all of the samples were mixed 1:9 with 5 mg/ml ␣-cyano-4-hydroxycinnamic acid in 50% acetonitrile supplemented with 0.1% TFA, and 0.5 l of sample was applied to the MALDI target plate. The mass spectra were obtained using a 4800 Proteomics Analyzer MALDI-TOF/TOF instrument (Applied Biosystems). Prior to analysis, the mass spectrometer was externally calibrated with seven peptides obtained from tryptic digest of myoglobin. The m/z range of the MS scan was from 600 to 4000. Mass spectra were acquired in the positive reflector mode.

RESULTS AND DISCUSSION
Core-fucosylated Glycopeptide Enrichment from Plasma-Robust and convenient operation procedures were established to obtain partially deglycosylated CF glycopeptides. After IgG depletion, plasma proteins were mixed with L. culinaris lectin to enrich for the CF glycoproteins. Binding proteins were digested by trypsin, and the resulting glycopeptides were enriched through a molecular weight cutoff technique. N-Linked glycopeptides usually have larger molecular weights than non-glycopeptides (19,23); therefore, an ultrafiltration membrane with a molecular mass limit of 3000 Da was utilized to enrich for glycopeptides. This step integrates enrichment, desalting, and concentration into one operation. Glycopeptides were then treated with endoglycosidase F3, which specifically cleaves the glycosidic bond between the two proximal N-acetylglucosamines (GlcNAc) and leaves the fucosyl-GlcNAc residues on the peptides. Endoglycosidase F3 was chosen here for treating CF glycoprotein because a large number of the glycans of plasma glycoproteins have biantennary structure, which is a more efficient substrate for endoglycosidase F3 (24). For other structures, such as tetraantennary and other bulky glycans, the reactivity of endoglycosidase F3 is poor, so there may need to be additional evaluation to choose the proper glycosidase for other kinds of samples like tissue biopsies.
A tryptic peptide mixture from four standard glycoproteins, apotransferrin, fetuin, rhEPO, and ribonuclease B, was used to illustrate the efficiency of the ultrafiltration method (Fig. 1). Half of this tryptic peptide mixture was directly treated with peptide-N-glycosidase F (untreated sample); the other half was separated by ultrafiltration into a retentate fraction (high molecular weight) and a filtrate fraction (low molecular weight), and then both fractions were treated with peptide-Nglycosidase F. The deglycosylated glycopeptides were detected by the ϩ0.984-Da mass drift on Asn to Asp.
In total, eight N-glycopeptides were reported for four glycoproteins. Six of these glycopeptides were directly found in untreated samples by MALDI-TOF MS. However, in addition to these six glycopeptides, one more glycopeptide (CGLVPVLAENYN*K from apotransferrin; N* represents the annotated glycosite) was detected in the retentate fraction. The relative intensities of all deglycosylated glycopeptides were heightened compared with the untreated sample. In the untreated sample, the failure to detect CGLVPVLA-ENYN*K is ascribed to suppression by a non-glycopeptide with similar mass. In the filtrate fraction, the relative intensity of deglycosylated glycopeptides decreased to a very low level, illustrating that few glycopeptides were lost. One reported glycopeptide was not detected in the three fractions (N*LTK from ribonuclease B). One possible reason is that its sequence is too short to detect.
Development of Neutral Loss-dependent MS 3 Scan Method-A neutral loss-dependent MS 3 method specifically designed for partially deglycosylated CF glycopeptides was developed. During CID, the glycosidic bond that links the two remaining sugars is prone to breakage compared with the other bonds (25). In our experiments on three partially deglycosylated CF glycopeptides, the highest peaks in the MS 2 FIG. 4. The process of the strategy for CF glycoprotein identification. CF glycoprotein identification was achieved through enrichment of CF glycopeptides, partial deglycosylation of CF glycopeptides, HPLC neutral loss-dependent MS 3 , candidate spectrum filtering, spectrum optimization, and database searching. F 1 identifies the intensity ratio of the second strongest peaks (logogram: second strong peak (SSP), which does not contain different states for S 2 , such as a different charge state or states of H 2 O and NH 3 loss) to S 2 ; F 2 identifies the difference between the calculated and experimental m/z of S 2 ; F 3 identifies the intensity ratio of the second strongest peak (logogram: SSPЈ) to S 3 within the range of the S 3 monoisotopic peak Ϯ3 m/z. Information on different charge state ions of S 3 is considered, and the better result is recorded. Additionally the absolute intensities of S 2 and S 3 are required to be higher than 500 and 50, respectively. As shown, different scores correspond to different signal qualities. The confidence of the spectrum is sorted into five ranks by total score. OE, fucose residue; f, GlcNAc residue. 2D, two-dimensional; Endo, endoglycosidase; LCH, L. culinaris lectin. spectra all resulted from subtraction of 146 Da (mass of the fucose residue) from the parent ions that had the same charge state as the corresponding parent ions (Fig. 2). Based on this trait, a neutral loss-dependent MS 3 scan method was utilized as an automatic event in the LTQ-FT mass spectrometer: MS 3 spectra were automatically collected when one of the three most intense peaks from the MS 2 spectrum corresponded to a neutral loss event of the fucose residue mass. MS 3 spectra were generated from fragmentation of the GlcNAc-attached peptides. Compared with the MS 2 spectra, which were generated from fragmentation of the fucosyl-GlcNAc-attached peptides, the MS 3 spectra have three remarkable advantages. 1) They have better spectrum quality: the peak intensity distribution of the MS 3 spectrum is much more homogeneous. This is beneficial because there are more fragment ion signals with good signal to noise ratios. 2) They have simpler spectrum information: the number of theoretical fragment ions in the MS 3 spectrum is fewer. This makes the algorithm for peak matching simpler and easier. 3) They have clearer spectrum signals: two parent ion selections (from MS to MS 2 and from MS 2 to MS 3 ) reduce the probability of collecting interference signals adjacent to parent ions in the full scan (Fig. 3). In addition, direct assignment of CF glycosites can be deduced from the b-type and y-type ions series attached with a GlcNAc residue, providing much higher confidence levels of glycosite assignment compared with the 0.984-Da mass shift method. It should be noted that the retained intact GlcNAc residues were found to be lost from the b and y ions (Fig. 3); therefore, these kinds of special product ions must be considered in addition to GlcNAc attached b and y ions when searching the database. This observation was taken into account for peptide scoring in the pFind 2.1 search engine (26 -28). Compared with other popular software tools, pFind discovered more results (supplemental Data 1).
Development of Candidate Spectrum-filtering and Spectrum Optimization Methods-Due to the complexity of real samples and the massive spectra generated in these large scale glycopeptide analyses, more professional and specialized processing methods are absolutely necessary. Here a database-independent method for discovery of spectra of partially deglycosylated CF glycopeptides was developed. Two kinds of ions in MS 2 were scrutinized and used to judge whether the precursor was a CF glycopeptide: ions of a peptide attached to a GlcNAc residue (symbol ion 2, logogram: S 2 , attained from the breakage of the glycosidic bond between the remaining two monosaccharide residues) and ions of a pure peptide (symbol ion 3, logogram: S 3 , obtained from fragmentation between the GlcNAc and the Asn residue of the peptide). By introduction of the highly accurate parent ion mass from a full scan (recorded in FT-ICR), we can calculate the m/z of symbol ions. Next according to the quality of the symbol ions in MS 2 , several criteria were established to sort out the spectra. First of all the strongest peak in MS 2 FIG. 5. Frequency histogram of intact and partial GlcNAc loss peaks in candidate MS 3 spectra of charge 2. The m/z values of S 2 were set as 0 m/z. Offsets with high peak frequencies reveal potential masses of neutral losses that frequently occur on peptide-attached GlcNAc residues. The possible loss groups are shown in the table.  must be S 2 (Ϯ0.5 m/z errors) with the same charge state as the parent ion. Additional information of symbol ions is then used to further evaluate their confidence into five ranks (Fig.  4). The spectra in the top two ranks are retained, and their relevant MS 3 spectra are regarded as candidates. This strict spectrum-filtering method greatly improved the credibility of identification. Furthermore by statistically analyzing candidate spectra, many important neutral loss signals, which result from GlcNAc-related fragmentation, were revealed. These fragmentation patterns are always accompanied by very strong signals and had not been reported previously (Fig. 5).
In addition, diagnostic ions of GlcNAc residues were observed in MS 3 spectra (Fig. 3). Based upon these observations, these interfering peaks from GlcNAc fragmentation that are very intense and would decrease the accuracy of peptide scoring were localized and subtracted from the spectra. This novel optimization method can effectively increase the identification efficacy. Both the spectrum-filtering and the spectrum-optimizing processes have been performed automatically in pFind Studio. In addition, the unidentified candidates can be analyzed de novo. This can supply novel information, which is not in the database (Fig. 3).

Identified Results and Their Illumination for Further Clinical
Research-The efficacy of our strategy was first demonstrated by implementation on healthy human plasma (IgGextracted); 115 different CF glycopeptides (105 CF sites) from 72 glycoproteins were identified. To further demonstrate its feasibility for clinical samples, we applied this strategy to plasma from HCC patients; 108 different CF glycopeptides (106 CF sites) from 79 glycoproteins were identified. Altogether 25 novel glycosylation sites and 19 annotated potential sites were identified from these two experiments (Table I). The scale of our results shows that these innovative methods provide a breakthrough in CF glycoproteomics research and may meet the needs of clinical medicine. Although the comparison between two types of samples was not a designated outcome of this study, it still gave us illuminations in several aspects. First, the CF sites of many glycoproteins whose CF levels have been reported as altered in patients with HCC were confirmed in our research, such as ␣ 1 -antitrypsin (one site), ␣ 2 -HS-glycoprotein (one site), ␣ 2 -macroglobulin (two sites), apolipoprotein D (one site), ␤ 2 -glycoprotein 1 (one site), ceruloplasmin (four sites), fibrinogen ␥ chain (one site), haptoglobin (three sites), histidine-rich glycoprotein (one site), Ig ␣-2 chain C region (one site), Ig ␥-1 chain C region (one site), and serotransferrin (one site) (9,15). Direct evidence of a CF site by MS would not only help to enhance the reliability of the CF modification as a biomarker but may also lead to further clinical research at a deeper modification site level instead of the protein level. As shown previously, the CF patterns of some glycoproteins may be used as biomarkers because they are more sensitive and specific than evaluation of the respective total protein levels (19). The question of whether the specific CF site would be the more effective "marker" is interesting. This question could not be answered previously because of the limitations of the traditional techniques, but it can be tackled by application of this strategy. Second, a specific marker, CF GP-73, was reported to be more sensitive and specific for HCC diagnosis than ␣-fetoprotein (15). This marker was specifically identified in the HCC samples in our research, whereas hemopexin (two CF sites identified), IgM (two sites), and kininogen (three sites) were identified in both of our two experiments. These glycoproteins have not previously been reported in healthy plasma (9). These results remind us that although CF glycoproteomics research has significantly advanced during recent years and impressive results have been obtained in clinical research more extensive research is needed. This further research inevitably depends on the acquisition of massive qualitative and quantitative data on CF glycoproteins and CF sites. Recently fucosylated haptoglobin was reported as a novel marker for pancreatic cancer, and site-specific increases in fucosylation were observed (29). However, the specificity of this marker is still not ideal for diagnosis; evaluation of the CF levels of a combination of glycoproteins would permit more reliable discrimination among different disease stages. In our research, all three tryptic CF glycopeptides of haptoglobin were identified. Moreover our strategy possesses the merit that stable isotope labeling techniques can be embedded for quantitative research. The relative abundance of CF glycoproteins in some diseases, such as pancreatic cancer, could be quantified with the strategy. It should be mentioned that because lectin enrichment strategy was used in the early step the quantitation information obtained would only represent the relative difference in CF glycoprotein abundance, whereas the ratios between glycans with and without core fucose could not be reached as reported in other researches (8,9).
In conclusion, this study holds promise for the large scale identification of CF glycoproteins, which can serve as a tool for the discovery of novel biomarker panels from clinical samples, such as body fluids or tissue biopsies. In addition, it is our hope that both identified and unidentified candidate spectra (over 10,000) will be a useful resource for the improvement of database searching methods for glycopeptides. Spectra data sets of this sort are rare and should arouse the interest of scientists in both glycoproteomics and bioinformatics research fields.