|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 6:141-149, 2007.
© 2007 by The American Society for Biochemistry and Molecular Biology, Inc.


From the Institute for Systems Biology, Seattle, Washington 98103
| ABSTRACT |
|---|
|
|
|---|
-fetoprotein in germ cell tumors; prostate-specific antigen in prostate cancer; and CA-125 in ovarian cancer (4, 5). As systems biology begins to revolutionize our understanding of biology and biomedical sciences (6, 7), the ability to efficiently and comprehensively profile glycoproteins in biological samples of interest (such as cell extracts and body fluids) is critical to many biological and clinical researchers. Tandem mass spectrometry with its superior sensitivity, accuracy, and throughput in protein and peptide identification is currently the most sophisticated and powerful tool for global proteomics studies including glycoproteome analysis. Because the enormous dynamic range of protein concentrations in biological samples is far beyond the analysis range of most techniques (106 in mammalian cells and 1010 in blood), low abundant proteins are masked by dominant proteins in global proteomics analysis (6, 8). Indeed just 22 proteins constitute about 99% of the blood protein mass (albumin alone is more than 50% of the mass). Front end enrichment and fractionation methods prior to MS analysis are necessary to enhance the detection sensitivity to low abundance proteins, a category that holds promising diagnostic and biological information (9). An effective enrichment of glycosylated proteins is important to decrease sample complexity and helps to unfold the glycoproteome comprehensively (10). Two strategies have emerged to enrich glycoproteins and/or glycopeptides: one is the "top down" strategy in which glycoproteins are enriched at the protein level and then digested into peptides (e.g. the lectin affinity capture (11) and glycoprotein chemical capture (5) approaches); the other is the "bottom up" strategy in which glycoproteins are digested first into peptides and then enriched directly (e.g. glycopeptide enrichment by chromatography (1216)).
Despite the versatility of current glycoenrichment approaches, for complex biological samples such as sera and cell lysates, it is cumbersome to unravel glycoproteome completely. For instance, the top down strategy suffers from solubility problems and steric hindrance when capturing proteins in their native forms. Moreover proteolysis of complex protein mixtures with trypsin (a commonly used proteolytic enzyme for tandem MS analysis) typically produces 20 or more peptides per protein; this results in increased sample complexity and is thereof not suitable for the analysis of low abundance proteins in complex samples. Further enrichment of glycosylated peptides after glycoprotein capture has been studied both by lectin affinity capture (17) and glycoprotein chemical capture (18) approaches. Although lectin affinity capture is the most widely used approach due to its ease of implementation, the binding selectivity of lectins to specific conformations of different carbohydrate moieties has limited the utility of lectin in global glycoprotein analysis (19, 20). The glycoprotein chemical capture approach developed by Zhang et al. (5) is generally applicable to all types of glycoproteins, but the complicated steps to implementation (12) and the relative low yields (depicted in Fig. 1, A and B) lead to this approach not being used as widely as the lectin capture approach. In the bottom up strategy, proteins are digested into peptides, and glycosylated peptides are separated from their unglycosylated counterparts by chromatography (1216). Although this approach is direct, simple, and rapid, the separation based on different physical and chemical properties usually results in only a modest enrichment (12, 14).
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
Cell Culture and Microsomal Fraction Extraction
IGROV-1/CP cisplatin-resistant ovarian cancer cells were grown in RPMI 1640 medium (Invitrogen) containing 10% fetal bovine serum, 100 units/ml penicillin, and 100 units/ml streptomycin at 37 °C. A crude microsomal fraction of IGROV-1/CP was prepared as described previously (22, 23). The microsomal pellet contained plasma membranes, Golgi apparatus, endoplasmic reticulum, mitochondria, lysosomes, and all other membrane-bound vesicles separated from soluble cytosol. The Bradford protein assay was used to quantify the concentration of the extracted proteins. About 0.50.8 mg of crude microsomal membrane proteins was used to proceed with the glycopeptide capture.
Tryptic Digestion of Samples
In our glycopeptide capture approach, biological samples were subjected to denaturation and trypsin digestion first. In a typical procedure, a biological sample was reconstituted in a denaturing buffer of 5 mm EDTA, 40 mm tris, 10 mm TCEP, 0.5% Rapigest at pH 8.3 and heated at 100 °C for 10 min. After allowing the sample to cool to room temperature, urea was added to 8 m, and the solution was incubated at 37 °C for 30 min. To prevent disulfide bond formation, cysteine residues were modified by alkylation with iodoacetamide. Iodoacetamide was added to the sample solution in at least a 6-fold molar excess over the free sulfhydryls in the sample. For an unknown protein mixture, an estimation of 6 cysteines per protein was used for calculating the molar concentration of sulfhydryls. A 30-min incubation in the dark, at room temperature, with end-over-end rotation was carried out for cysteine derivatization. The reaction was quenched by the addition of DTT at half of the molar concentration of the iodoacetamide for 10 min. After iodoacetamide deactivation, the sample solution was diluted 10-fold with 40 mm Tris buffer, pH 8.3, and 1 mg of trypsin/2050 mg of protein was added into the sample solution, and the sample mixture was digested at 37 °C overnight. To avoid a large volume for trypsin digestion, the denatured sample was kept at 46 mg/ml. Rapigest was degraded by acidifying the trypsinized sample mixture to pH
1 with HCl and incubation at 37 °C for 1 h. The hydrophobic residues of Rapigest were precipitated out and removed from the sample by centrifugation, and the supernatant was passed over a C18 column to remove extra urea, DTT, and Tris. Tryptic peptides were eluted from the column with 80% acetonitrile in 0.1% TFA and dried in a SpeedVac® (Thermo Savant, Holbrook, NY) concentrator.
Glycopeptide Capture
Dried tryptic peptides were dissolved in a coupling buffer (100 mm sodium acetate, 150 mm NaCl, pH 5.5) at a concentration of 2 mg/100 µl of buffer. The non-dissolved solids were removed by centrifugation, and the supernatant was ready for the following reactions. First, to oxidize the cis-diol groups of carbohydrates to aldehydes, sodium periodate at 10 mm final concentration was introduced into the peptide solution, and the sample was incubated in the dark at room temperature for 30 min with end-over-end rotation. Second, sodium sulfite was added to 20 mm final concentration and incubated for 10 min to deactivate the excess oxidant in the peptide solution. The coupling reaction was initiated by introducing hydrazide resin into the quenched peptide solution at 20 mg/ml resin, and extra coupling buffer was added to make a solid to liquid ratio of 1:5. The coupling reaction was performed at 37 °C overnight with end-over-end rotation.
After the coupling reaction, the resin was washed twice thoroughly and sequentially with deionized water, 1.5 n NaCl, methanol, and acetonitrile and was followed by a buffer exchange step to 100 mm NH4HCO3 (made fresh), pH
8.0. Enzymatic cleavage of the N-linked peptides from the sugar moiety was carried out at 37 °C overnight by PNGase F at a concentration of 1 µl of PNGase F/26 mg of crude proteins. The supernatant, containing the released deglycosylated peptides, was collected by centrifugation and combined with the supernatant of an 80% acetonitrile wash. The peptide solution was dried, reconstituted with 1% acetonitrile in 0.1% formic acid, and subjected to MS analysis.
Analysis of Peptides by Mass Spectrometry
Peptide samples were analyzed by either a MALDI-TOF/TOF tandem mass spectrometer (ABI 4700 Proteomics Analyzer, Applied Biosystems, Foster City, CA) or by nano-LC-ESI-MS/MS using an LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose, CA). For MALDI-TOF/TOF analysis, the peptide sample was purified with a ZipTip (Millipore) and reconstituted with 0.4% acetic acid prior to analysis. A 1:1 dilution of peptide solution with MALDI matrix solution (Agilent Technologies) was used for MALDI spotting.
For LTQ mass spectrometry analysis, an in-house fabricated nanoelectrospray source and an HP1100 solvent delivery system (Agilent Technologies) were coupled to LTQ. Samples were automatically delivered by a FAMOS autosampler (LC Packings, San Francisco, CA) to a 100-µm-internal diameter fused silica capillary precolumn packed with 2 cm of 200-Å pore size Magic C18AQTM material (Michrom Bioresources, Auburn, CA) as described elsewhere (24).
The samples were washed with solvent A (5% acetonitrile in 0.1% formic acid) on the precolumn, eluted with a gradient of 1035% solvent B (100% acetonitrile) over 30 min to a 75-µm x 10-cm fused silica capillary column packed with 100-Å pore size Magic C18AQ material (Michrom Bioresources), and then injected into the mass spectrometer at a constant column tip flow rate of
300 nl/min. Eluting peptides were analyzed by nano-LC-MS and data-dependent nano-LC-MS/MS acquisition, selecting the three most abundant precursor ions for MS/MS with a dynamic exclusion setting of 1 (25).
Database Search of Mass Spectra
Mass spectra were converted to mzXML format through in-house developed software, and the spectra having fewer than six ions with intensity less than 100 were discarded (26, 27). The converted mzXML files were searched against the appropriate databases (see below). The mass spectra derived from the five multiglycosylated proteins were searched against a customized database comprised of the protein sequences of five glycoproteins in addition to trypsin, keratins (a common contamination of sample preparation), and a reversed yeast database. We took 218 entries of human keratins from the NCI non-redundant protein database released on December 13th, 2005, distributed on the Internet via anonymous FTP from ftp.ncifcrf.gov under the auspices of the National Cancer Institutes Advanced Biomedical Computing Center), and the reversed protein sequences were from a yeast protein databases with 7556 entries. The mass spectra of peptides from the ovarian cancer cell line membrane fraction were searched against the human International Protein Index (IPI) database (IPI human v3.16 fasta with 62,322 entries). SEQUESTTM (Thermo Finnigan) was used for database searches with search parameters containing the following modifications: carbamidomethylated cysteine (+57), oxidized methionines (+16), and the asparagine in the consensus sequence to aspartic acid modification after PNGase F deglycosylation (+1) (28). PeptideProphetTM and Protein Prophet with single tryptic end and N-glycosylation constraints were used to evaluate the quality of peptide and protein identification (29). Single tryptic end constraint was used to account for incomplete trypsin digestion due to different digestion efficiency by trypsin at putative tryptic sites (30). The mass tolerance for precursor mass is ±3.0, and the mass tolerance for MS/MS is 0.5 (31).
| RESULTS |
|---|
|
|
|---|
Application of the Glycopeptide Capture Approach to a Monoglycosylated Protein
To validate our glycopeptide capture approach, we first analyzed a mono-N-glycosylated protein, chicken avidin. Fig. 2A shows the MS spectrum of the enriched avidin deglycosylated glycopeptide collected by a MALDI-TOF/TOF mass spectrometer (ABI 4700 Proteomics Analyzer, Applied Biosystems). The two major peaks were from the same peptide (K.WTNDLGSNMTIGAVNSR.G where the consensus sequence of N-glycosylation sites is bold and underlined and the periods indicate the peptide cleavage sites) as determined by MS/MS fragmentation (please see Supplemental Fig. 1 for the CID spectra). The mass of 1852 is contributed from the peptide with methionine oxidized to methionine sulfoxide (with a mass increase of 16 Da) (5) and the asparagine from the consensus sequence modified to aspartic acid (with a mass increase of 1 Da) after PNGase F deglycosylation (28). The mass of 1791 is attributed from the same peptide as shown by the CID spectrum with a cleavage between the
carbon and the sulfur of the methionine side chain, which most likely happened during the mass spectrometry direction (3335). The successful identification of the formally glycosylated peptide from chicken avidin indicated that the capture strategy we developed here was effective and that the sodium sulfite we introduced to reduce periodate did not interfere with the capture procedure. To evaluate the capture efficiency, we analyzed the non-captured avidin peptide mixture by MALDI-TOF/TOF as well. Because glycosylated peptides give low and complex signals in MS spectra due to the heterogeneity of the glycan structure, we used PNGase F to remove all linked glycans prior to MS analysis and focused on the presence or absence of the deglycosylated glycopeptides from the non-captured fraction of chicken avidin. Fig. 2B shows the MS spectrum of the non-captured avidin peptides after pursuing the glycopeptide capture and PNGase F deglycosylation. The absence of glycopeptide signal indicates that the efficiency of our glycopeptide capture process is high based on the MALDI-TOF/TOF analysis.
|
1-antitrypsin (human), conalbumin (chicken), ribonuclease B (bovine), and ovalbumin (chicken) (all purchased from Sigma). Table I lists the representative N-linked glycopeptides captured and identified by our approach using a nano-LC-MS/MS analysis on an LTQ linear ion trap mass spectrometer. All of the proteins were identified with a Protein Prophet value of 1.0. Strikingly all the previously identified glycosylation sites within the five glycoproteins were captured and identified except for some sites from invertase (Table I). For invertase that has 13 N-glycosylation sites, we identified a total of eight N-linked sites (shown in Table I). The remaining five N-glycosylation sites reside in large tryptic peptides with molecular masses above 3000 Da that were absent from our LTQ results. The absence of some of the large tryptic peptides with N-linked glycosylation sites is likely caused by insufficient ionization of these peptides.
|
Application of Glycopeptide Capture Approach to Ovarian Cell Microsomal Fractions
Analysis of membrane proteins by MS is challenging because the proteins easily aggregate and are difficult to dissolve in aqueous solutions (22, 37). To assess the applicability of the glycopeptide capture approach to membrane proteins, we analyzed the microsomal fraction from a cisplatin-resistant ovarian cancer cell line (IGROV-1/CP) that is rich in membrane proteins. The capture strategy was carried out on two microsomal fractions with 500 and 800 µg of crude protein, respectively, and one-fifth of the final captured peptides were analyzed by a single nano-LC-ESI-MS/MS analysis. Two MS analyses for each of the two capture procedures were performed. In a single MS analysis, we unambiguously identified 311 unique peptides that mapped to 156 unique proteins. Fig. 3 shows the Pep3D result (38, 39) of the identified peptides with peptide probability value greater than 0.9. Among the 156 identified proteins, 68 proteins were identified with more than one peptide; and among the 311 identified peptides, 286 peptides have the NX(T/S) consensus sequence. The glycopeptide selectivity of our approach is as high as 91% based on the number of peptides with the N-linked consensus sequence compared with the total number of identified peptides. Combining the results of all four MS runs (two biological replicates), we identified a total of 302 proteins from the microsomal fractions of IGROV-1/CP (see the supplemental table) with an average identification rate of 136 ± 19 (n = 4) proteins and glycopeptide selectivity (the number of peptides identified with the N-glycoconsensus sequence divided by the total number of identified peptides) of 91.0 ± 1.6% (n = 4) per MS analysis.
|
To classify the identified proteins by cellular function and to explore the biological significance of the glycoproteins we identified, we annotated our data using GoMiner (discover.nci.nih.gov/gominer) (40). For the analysis, Entrez Gene names were retrieved from the European Molecular Biology Laboratory International Protein Index (IPI) number, and redundant identifications were removed before GoMiner analysis. Among 302 proteins with Entrez Gene names, 251 proteins have been annotated as cellular components, and 244 proteins have been annotated for molecular function. As expected, the majority of identified proteins are membrane proteins (170 of 251). The major molecular functions among the identified proteins include ligand binding, catalytic activity, signal transduction activity, transporter activity, etc. (shown in Fig. 4). The results of this analysis are concordant with our knowledge of the main cellular location and functions of glycoproteins in the microsomal fractions.
|
-1 chain of laminin protein (5), whereas using the glycopeptide capture approach, another five additional glycosylation sites were identified (see the supplemental table).
|
| DISCUSSION |
|---|
|
|
|---|
First, digestion of proteins into peptides improves solubility of large membrane proteins and exposes all of the glycosylation sites to ensure equal accessibility to external capture reagents. Analyses of known structures of glycoproteins indicate that about one-third of glycosylation sites are buried inside of proteins (41). Therefore, the steric hindrance raised from protein topology can diminish the capture efficiency of many sites of glycosylation. Cleaving globular proteins to smaller peptides circumvents this shortcoming.
Second, capturing glycosylated peptides can effectively reduce the complexity of the sample and increase the confidence of using MS-based protein identifications. Although the protein capture strategy can effectively enrich glycoproteins from complicated samples, the peptides (20 or more tryptic peptides per protein) generated from proteolysis prior to MS analysis increase the sample complexity again and counteract the enrichment effect at the protein level. Given the fact that proteins can be identified by individual signature proteolytic peptides with MS and that identification from multiple peptides improves the confidence of protein assignment (21), it is ideal to use multiple peptides to identify a protein. Because glycoproteins are glycosylated at multiple sites in general (17, 5) and because the glycopeptides constitute only 25% of the full glycoprotein, enriching glycopeptides not only decreases sample complexity effectively but also provides multiple peptides for unambiguous protein identification. Using 0.9 as the protein probability cutoff score, on average the error rate was as small as 0.006 in all four MS runs, and the number of incorrectly identified peptides was 1 of 136 by statistical analysis (21).
Third, our capture approach using hydrazide chemistry provides good selectivity of glycopeptides over the non-glycosylated peptides. To date, different chromatographic separation techniques have been reported to enrich glycopeptides by the diverse physical and chemical properties of the glycopeptides (1216). The selectivity, however, is very limited (12, 14). To overcome this problem, we take advantage of hydrazide chemistry, which allows us to selectively capture glycopeptides via a covalent bond formed between hydrazide and the aldehyde groups from oligosaccharides. Such chemistry is ubiquitous to all glycan structures, and the covalent bond formed tolerates extreme wash conditions. Virtually all non-covalently attached peptides can be removed from the solid support. Using our chemical capture approach, we have achieved a glycopeptide selectivity of 91% on the microsomal fraction of the ovarian cancer cells.
Fourth, the utility of sodium sulfite as a quencher in our capture approach in place of the SPE step in the glycoprotein chemical capture approach, which removes the excess sodium periodate, allows the overall capture procedure to be completed in a single vessel. This modification prevents sample loss as well as saving labor and time. Sample loss is a nontrivial problem when the proteomics research is focused on low abundance proteins such as biomarkers. The more than 3-fold increase in the number of identified proteins by our approach compared with the glycoprotein capture approach of Zhang et al. (5) may be due in part to the avoidance of sample losses. Another reason for the limited yield in the glycoprotein capture approach of Zhang et al. (5) is due to the incomplete capture of glycopeptides inherent in the protein capture approach itself as illustrated in Fig. 1, A and B. For a multiglycosylated protein, it is highly unlikely that all of the glycosylated sites can form chemical bonds with the solid support due to the globular structure of proteins (Fig. 1A). After on-support proteolysis and a series of washes to remove non-bonded peptides, only a fraction of the glycopeptides remains on the support (Fig. 1B). For example, using glycoprotein capture only one glycosylation site was identified from
-1 chain of laminin protein (5), whereas using the glycopeptide capture approach, five additional glycosylation sites were identified. As the PeptideProphet and Protein Prophet analyses penalize single hit identifications and reward multihit identifications (21), the glycoprotein capture approach is likely to result in a lower protein identification rate (64 proteins in total) compared with our glycopeptide capture approach (302 proteins in total).
Our glycopeptide capture approach is adaptable to high throughput and automation because of the completion of capture in a single vessel. The first step proteolysis in our peptide chemical capture approach is compatible with quantitative proteomics analyses. Moreover the glycopeptide capture approach is complementary to the widely used ICAT approach that labels and enriches cysteine-containing peptides. With only a small fraction of the peptides overlapping, the number of proteins identified by our glycopeptide capture approach is similar to that of the ICAT approach. A total of 569 proteins were identified from the microsomal fractions of IGROV-1/CP by combining the ICAT and glycopeptide capture results, indicating that the use of both strategies in concert provides a powerful approach to global proteomics profiling of a complex biological medium.
Although the biological significance of the proteins identified in this study is not the focus of this report, many of the glycoproteins we identified have been implicated in ovarian carcinoma and cisplatin resistance. For instance, the folate receptor (42, 43), the insulin-like growth factor receptor (44), and the epidermal growth factor receptor (45) are overexpressed in cancer cells and are used as drug delivery targets. The tumor-associated calcium signal transducer 1 (46), tumor necrosis factor receptor (4648), metastasis suppressor protein 1 (46), heat shock protein HSP 90 (49), laminin (46, 50, 51), and reticulocalbin-1 (52) have been reported to be associated with ovarian carcinogenesis. Increased expression of disulfide isomerase (46, 53) and ADAM 10 (54) is strongly correlated with cisplatin resistance. These observations validate the contention that the peptide glycocapure method we developed is a powerful approach to the discovery of potential biomarkers. Meanwhile CD proteins that play important immune functions in cells are a class of membrane proteins that are often glycosylated and also make good drug targets and biomarkers. We compared our protein dataset with the PROW (Protein Reviews on the Web) database for CD proteins (mpr.nci.nih.gov/prow/) (361 CD proteins in total) and identified 74 CD proteins.
Although our approach can also capture O-linked glycosylated peptides, for ease of analysis we detected the N-linked glycopeptides only. With a proper combination of O-glycosidase or chemical cleavage such as ß-elimination, the O-glycopeptides can also be released from the solid support and analyzed by MS. Due to technical limitations of MS analysis such as ionization efficiency of peptides, sample complexity and dynamic range, and mass accuracy and resolution of mass spectrometry itself (4, 8, 9), not all the tryptic glycopeptides can be detected. To use this approach to study global glycosylation site(s) of individual proteins for the purpose of investigating post-translational modifications, detection approaches other than MS would be necessary. Nonetheless to serve the purpose of global glycoproteomics, our strategy provides a comprehensive and robust methodology with improved accuracy and sensitivity.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, October 29, 2006, DOI 10.1074/mcp.T600046-MCP200
1 The abbreviations used are: PNGase F, peptide-N-glycosidase F; n, number of experiments; TCEP, tris(2-carboxyethyl)phosphine hydrochloride; SPE, solid phase extraction; ADAM 10, a disintegrin and metalloproteinase domain 10; LTQ, linear quadrupole ion trap mass spectrometer; CD, cluster of differentiation. ![]()
* This work was supported by National Institutes of Health Grants 1U54DA021519 and 1U54CA119347 and NIGMS, National Institutes of Health Grant 1P50GM076547. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
To whom correspondence may be addressed: The Inst. for Systems Biology, 1441 N. 34th St., Seattle, WA 98103. Tel.: 206-732-1297; Fax: 206-732-1299; E-mail: blin{at}systemsbiology.org
To whom correspondence may be addressed: The Inst. for Systems Biology, 1441 N. 34th St., Seattle, WA 98103. Tel.: 206-732-1201; Fax: 206-732-1299; E-mail: lhood{at}systemsbiology.org
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Joenvaara, I. Ritamo, H. Peltoniemi, and R. Renkonen N-Glycoproteomics - An automated workflow approach Glycobiology, April 1, 2008; 18(4): 339 - 349. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Keshishian, T. Addona, M. Burgess, E. Kuhn, and S. A. Carr Quantitative, Multiplexed Assays for Low Abundance Proteins in Plasma by Targeted Mass Spectrometry and Stable Isotope Dilution Mol. Cell. Proteomics, December 1, 2007; 6(12): 2212 - 2229. [Abstract] [Full Text] [PDF] |
||||