|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
,






,||
From the
Department of Chemistry, Graduate School of Science and Engineering, Tokyo Metropolitan University, Minamiosawa 1-1, Hachioji, Tokyo 192-0397, Japan, ¶ Department of Applied Biological Science, Tokyo University of Agriculture and Technology, Saiwai-cho 3-5-8, Fuchu, Tokyo 183-8509, Japan, and || Core Research for Evolutional Science & Technology (CREST), Japan Science and Technology Agency, Honmachi 4-1-8, Kawaguchi, Saitama 332-0012, Japan
| ABSTRACT |
|---|
|
|
|---|
Among the approximately 200 different known PTMs (9), protein glycosylation is one of the most common in eukaryotes: on average there are potential targets in more than half of the genes encoded in eukaryotic genomes (10). Protein glycosylation plays a role in protein folding, subcellular localization, turnover, activity, protein-protein interactions, etc. and contributes significantly to physiology as evidenced by the growing number of human diseases with defects in glycoconjugate assembly and processing (11, 12). Thus, the analysis of protein glycosylation is important for both basic biology and clinical applications, including the discovery of protein biomarkers for diagnosis and drug discovery. Previous studies show that protein glycosylation is quite diverse because the oligosaccharide structure may vary widely between different proteins. In addition, a single protein can be glycosylated at multiple sites, and subsequent processing may differentially or partially modify an oligosaccharide attached at each site. These factors generate the observed complexity of glycoprotein structure and cause difficulties in characterizing protein glycosylation on a proteomic scale. At present, little is known about the final structure of most glycoproteins; however, the specific structure of each oligosaccharide and the rate of the modification(s) are often critical to individual glycoprotein function, and defects in these processes may cause disease (13). Thus, the mechanisms by which protein glycosylation is regulated remain a challenging problem for proteomics research.
Currently two methods allow large scale glycoprotein analysis directly from a complex biological mixture, and both methods utilize MS-based shotgun technology but differ in the way glycopeptides are collected. One of the methods captures glycopeptides, regardless of the glycan structure, on a solid support by chemical coupling between the cis-diol group of the glycan and hydrazide on the support, and then N-linked glycopeptides are released specifically from the support by peptide-N-glycanase (PNGase) digestion (14, 15). Another method captures a subset of glycopeptides by lectin affinity chromatography (16–18). The type of glycopeptides captured by this method depends on the specificity of the lectin used; however, comprehensive analysis of glycoproteins can be achieved by using multiple lectin columns with distinct binding specificity (e.g. non-reducing end oligosaccharides). This approach, termed isotope-coded glycosylation site-specific tagging (IGOT), includes a step to remove the glycan moiety of glycopeptides with PNGase in 18O-labeled water (16). When the enzyme releases N-linked glycans in H218O, the glycosylated Asn residue (in the consensus tripeptide sequence for N-linked glycosylation, Asn-Xaa-(Ser/Thr) where Xaa is any amino acid except Pro) is converted to Asp with concomitant incorporation of 18O from water (19). This PNGase-mediated incorporation of the 18O-tag distinguishes glycosylated peptides from non-glycosylated peptides that have non-enzymatically deamidated Asp residues. The conversion of Asn to Asp via 18O incorporation in the glycosylation consensus sequence strongly indicates that the peptide was formerly N-glycosylated.
In this study, we paired IGOT with automated multidimensional liquid chromatography-MS technology and identified 1,465 N-glycosylated sites on 829 proteins expressed in Caenorhabditis elegans. We report here the diversity of protein glycosylation and the specificity of the oligosaccharyltransferase of C. elegans that incorporates an oligosaccharide moiety en bloc into nascent polypeptide chains. Based on the analysis of the relative positions of N-glycosylation sites and putative transmembrane segments of 257 potential integral membrane glycoproteins identified in this study, we also suggest that an atypical, non-cotranslational mechanism determines the topology of integral membrane glycoproteins.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
0.8). After further cultivation for 3 h, E. coli cells were lysed by sonication at 4 °C in 50 mM sodium phosphate buffer, pH 7.5, and centrifuged at 10,000 x g for 30 min. The supernatant was then applied to an asialofetuin column (Toyopearl 650M, 2.5-cm inner diameter x 5 cm) equilibrated with 50 mM sodium phosphate buffer, pH 7.5, at a flow rate of 0.5 ml/min. After washing the column with the equilibration buffer, the adsorbed GaL6 was eluted with the same buffer containing 0.2 M lactose. The purified GaL6 (20 mg) was immobilized on TSK-GEL Tresyl-5PW (2 ml; TOSOH) according to the protocol provided by the supplier and was packed into a 4.6-mm-inner diameter x 10-cm column.
Preparation of Tryptic Digests of Soluble and Insoluble Protein Fractions of C. elegans—
C. elegans strain N2 was cultured in liquid medium at 20 °C as described previously (16, 21). A mixed growth phase culture of the worm (5–20 g, wet weight) was lysed by sonication in 5 volumes of TBS (50 mM Tris-HCl, pH 7.5, 150 mM NaCl) containing a protease inhibitor mixture (Sigma), and the homogenate was centrifuged at 1,000 x g for 10 min at 4 °C to remove cell debris. The soluble extract was then centrifuged at 100,000 x g for 30 min at 4 °C to separate the soluble and insoluble protein fractions. Each fraction was solubilized in 7 M guanidine HCl in 0.5 M Tris-HCl, pH 8.6, containing 50 mM EDTA, and the proteins were reduced with dithiothreitol and S-carbamoylmethylated with iodoacetamide (22). The S-carbamoylmethylated proteins were dialyzed against 10 mM HEPES-NaOH, pH 7.5, and digested with N
-tosylphenylalanyl chloromethyl ketone-treated trypsin (Pierce) at an enzyme:substrate ratio of 1:50 at 37 °C. After 18 h, an aliquot of protease inhibitor mixture (Sigma) was added to the mixture to stop digestion and to protect the lectin columns.
Preparation of Lectin-bound Glycopeptides—
In our earlier attempts, we prepared an N-glycosylated protein fraction by lectin affinity chromatography of C. elegans crude extract and then obtained N-glycosylated peptides from a tryptic digest of the glycosylated protein fraction by a second round of lectin affinity chromatography (16). In this study, however, we modified the procedure to more efficiently identify the integral membrane glycoproteins; the crude protein extract was first digested with trypsin after S-carbamoylmethylation in 7 M guanidine HCl, and then the N-glycosylated peptides were recovered by lectin affinity chromatography. To increase the purity of glycopeptides, we incorporated an additional "hydrophilic interaction" chromatography step (23) before PNGase-mediated 18O labeling (described later).
To collect N-glycosylated peptides, the tryptic digests of soluble and insoluble protein fractions of C. elegans were subjected to affinity chromatography on three lectin columns, concanavalin A (Con A) (LA-Con A; 4.6-mm inner diameter x 15 cm; Seikagaku Corp., Tokyo, Japan), wheat germ agglutinin (WGA) (LA-WGA; 4.6-mm inner diameter x 15 cm; Seikagaku Corp.), or GaL6 (4.6-mm inner diameter x 10 cm). Approximately 50–200 mg of peptide mixture was applied to each column equilibrated with 10 mM HEPES-NaOH, pH 7.5. After washing the column with the equilibration buffer, adsorbed glycopeptides were recovered by elution with the buffer containing a cognate sugar: 0.2 M
-methyl mannopyranoside for the Con A column, 0.2 M N-acetyl-D-glucosamine (GlcNAc) for the WGA column, or 0.2 M lactose for the GaL6 column. To maximize the recovery of glycosylated peptides, the flow-through fraction of the first chromatography was applied again to the same lectin column, and the chromatography was repeated as described above. The glycopeptide fractions from individual lectin columns of the first and second rounds of chromatography were combined for subsequent steps.
Purification of Glycopeptides by Hydrophilic Interaction Chromatography—
The N-glycosylated peptide mixture recovered by lectin affinity chromatography (10–20 ml containing 200–500 µg of peptides) was added to an equal volume of ethanol (EtOH) and 4 volumes of 1-butanol (BuOH) and was applied immediately to a Sepharose CL-4B column (5-mm inner diameter x 50 mm) equilibrated with the solvent H2O:EtOH:BuOH = 1:1:4 (v/v/v). After washing the column with the same solvent, adsorbed glycopeptides were eluted with H2O:EtOH, 1:1 (v/v). The column eluent was monitored at 220 nm, and the recovered glycopeptides were quantitated fluorometrically after reaction with o-phthalaldehyde (24).
PNGase-mediated 18O Labeling of Glycopeptides—
N-Glycosylated peptides were labeled specifically with 18O by IGOT as described previously (16). Briefly the sample glycopeptides were dried under vacuum to remove solvent containing H216O and then redissolved in 0.1 M Tris base prepared in H218O (
99 atom % 18O; Taiyo Nippon Sanso Corp., Tokyo, Japan). The peptide solution was then adjusted to pH 8–9 with a minimal volume of acetic acid, and then PNGase-A (lyophilized; Seikagaku Corp.), dissolved in H218O, was added to a final concentration of 1 milliunit/10 µg of peptide. The reaction was incubated overnight at 37 °C in a sealed polypropylene tube.
Automated 2D Nano-LC-MS/MS Analysis of 18O-Tagged Peptides—
The deglycosylated 18O-tagged peptide mixture (approximately 5–10 µg) was analyzed by automated 2D LC-MS/MS. The instrument used was a miniaturized version of that described previously (25, 26) and was equipped with a first dimensional microscale cation-exchange column (1-mm inner diameter x 50 mm) of Bioassist-S (7-µm particles; TOSOH) and a second-dimensional direct nanoflow spray tip reversed phase column (150-µm inner diameter x 50 mm) of Mightysil-C18 (3-µm particles; Kanto Chemicals) connected in tandem through an electric column switching valve and an automated solvent desalting device. The chromatography was performed automatically under the time-dependent control program, and the eluate was directly sprayed into a high resolution Q-TOF hybrid mass spectrometer (Q-TOF Ultima; Waters-Micromass) at a flow rate of 100 nl/min. The spectrometer was operated in a data-dependent MS/MS mode where a full MS scan (1 s, m/z 400–1500) was followed by two MS/MS scans (1 s each, m/z 100–1500). The two most intensive precursor ions with a charge state (z) of +2 or +3 were dynamically selected and subjected to collision-induced dissociation with a collision energy as recommended by the manufacturer and a dynamic exclusion duration of 30 s. The total analysis time for a single 2D nano-LC-MS/MS process was 24 h.
Protein Identification by Database Search—
The large volume of MS/MS data generated by the 2D nano-LC-MS/MS analysis was converted to text files using MassLynx software (version 4.0, Micromass). The peak list files were then created with smoothing by the Savitzky-Golay method (window channels, ±3) using the same software and processed by the Mascot algorithm (version 1.9, Matrix Science, Ltd.) to assign peptides on the C. elegans Wormpep 124 protein sequence database (22,259 entries, www.sanger.ac.uk/Projects/C_elegans/WORMBASE/current/wormpep.shtml). The database search was performed with the parameters as described previously (16, 21) except that we defined a custom modification, "deamidation with 18O (asparagine + 3 Da)," for the deamidation of Asn incorporating 18O. We first screened the candidate peptides with probability-based Mowse scores that exceeded their thresholds (p < 0.05) and with MS/MS signals for y- or b-ions >3; finally we selected "identified peptides" that contained one or more aspartic acid tagged with 18O atoms on the basis of their MS/MS spectra. If a prospective "identified peptide" did not contain the consensus tripeptide sequence for N-linked glycosylation (Asn-Xaa-(Ser/Thr)), the data were eliminated regardless of the match score. The resulting dataset was finally evaluated by in-house software STEM (27) to remove unreliable Mascot peptide identifications and redundant assignments and to integrate the results with key parameters of the experiment.
Characterization of the Identified Glycoproteins—
The transmembrane segment and the signal peptide of proteins were predicted by SignalP 3.0 (28) and/or ConPredII (29) bioinformatics tools.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
25% of the 22,500 genes predicted from the genome sequence of C. elegans (Wormpep), suggesting that there are
6,000 potential targets for N-linked glycosylation. To catalog N-glycosylated proteins expressed in C. elegans and to study details of protein glycosylation, we used IGOT coupled with MS-based proteomics. To increase the coverage, we used three types of lectin columns with different binding specificity for the oligosaccharide attached to the polypeptide chain; thus, the columns contained immobilized Con A, WGA, and GaL6 (20), which are specific for the non-reducing end of Man, GlcNAc, and Gal, respectively. In addition, the lectin affinity chromatography was performed with tryptic peptide mixtures derived from soluble and insoluble protein fractions of C. elegans crude extract (see "Experimental Procedures"). The glycopeptide mixtures were further purified by hydrophilic interaction chromatography on Sepharose CL-4B, subjected to IGOT (i.e. N-glycanase-mediated 18O labeling), and analyzed by automated 2D nano-LC-MS/MS shotgun technology to identify 18O-labeled formerly N-glycosylated peptides. To maximize the number of identifications, the shotgun analysis was repeated three times for each peptide mixture prepared by Con A, WGA, and GaL6 affinity chromatography of the soluble/insoluble fractions. Supplemental Table 1 lists all the candidate glycosylated peptides in C. elegans identified in this study and all their MS/MS spectra are shown in Supplemental Fig. 1-1 to 1-9. Supplemental Table 2 lists the C. elegans N-glycosylated proteins and the number of glycosylation sites identified in this study. We identified 1,204 N-glycosylated sites on 686 proteins from Con A-captured glycopeptide mixtures and likewise 474 sites on 276 proteins from WGA- and 382 sites on 330 proteins from GaL6-captured glycopeptide mixtures. After eliminating redundant identifications, we had identified 1,465 N-glycosylated sites on 829 unique proteins (Fig. 1 and Supplemental Table 2). The number of glycosylated sites assigned on each protein ranged from 1 to 24 with an average of 1.5. The glycoproteins we identified were quite diverse in terms of subcellular localization and function, etc., yet many (approximately 50%) were integral membrane proteins such as cell surface receptors, transporters, channels, extracellular matrix proteins, and proteases.
|
(F54G8.3), have relatively homogeneous glycans because multiple glycopeptides were identified only in the Con A-bound fraction, suggesting that the proteins contain a high mannose-type oligosaccharide chain(s). However, our argument, based on the binding specificity of different types of lectins, should certainly be confirmed by direct structural analysis of the oligosaccharides attached to each site of the polypeptide chain.
Amino Acid Residues Close to the Glycosylated Site
Although we identified
6,000 potential targets for N-glycosylation, not all those proteins were found to be N-glycosylated, and as a matter of course, not all Asn residues in the consensus sequences were glycosylated. This suggests that local sequence elements may help determine the specificity of oligosaccharyltransferase (OST), an enzyme responsible for attachment of an oligosaccharide to the newly synthesized polypeptide. We reported previously the frequencies of the amino acid residues around the 400 glycosylated sites on Con A-bound soluble proteins in C. elegans (16). In the present study, the analysis was performed for 1,465 unique glycosylated sites on 829 proteins captured on the three types of lectin columns (Table I). Within the consensus tripeptide sequence for N-linked glycosylation, Thr occurs at position 3 more than twice as frequently than Ser (819 Thr versus 359 Ser). Pro does not occur at positions 2 or 4 with only one exception. We also found that Cys occurs at positions –3 to 6 at 1.5–2.5 times greater frequency than that expected from natural abundance of this residue. However, we could not detect other strong amino acid preferences around the glycosylation sites. This suggests that the nematode OST can introduce an oligosaccharide to almost any Asn-Xaa-(Ser/Thr) (Xaa
Pro) sequence if the nascent polypeptide chain has a properly folded tertiary structure or meets some other criteria. The genome sequence implies that C. elegans may have a single OST-translocon (protein-conducting channel) complex in the ER lumen, whereas the yeast Saccharomyces cerevisiae has multiple OST-translocon complexes that might have different specificities for other sequences near consensus glycosylation sites (33–35). However, the factors that guide OST-mediated glycosylation remain unknown.
|
|
Analysis of Integral Membrane Glycoproteins Containing a Signal Peptide—
Because the initial protein glycosylation takes place only in the ER lumen and the glycosylated segments do not cross the membrane bilayer, the structural segment around the glycosylated site must face toward the ER lumen or an equivalent topological space (e.g. the Golgi lumen). Our topological analysis of membrane glycoproteins was based on the positions of experimentally determined glycosylation sites and putative transmembrane segments on the polypeptide chains. To simplify our analysis, we focused on the 257 glycoproteins containing a single transmembrane segment. The translated polypeptide segment that follows the signal peptide is introduced into the ER lumen through the translocon. If the transmembrane segment is translated and recognized by the translocon, it exits laterally from the channel into the membrane lipid, and the C-terminal portion remains in the cytosol (36). Therefore, for single span transmembrane proteins containing a signal peptide, the N-terminal portion of the transmembrane segment resides in the ER lumen. We assigned 181 of these Type I transmembrane proteins, and 160 of those were actually glycosylated only at the N-terminal portion of the transmembrane segment (Fig. 2). Twelve proteins were glycosylated on the C-terminal side of the transmembrane segment, and nine were glycosylated on both sides. Thus, our ability to predict signal peptides and transmembrane segments was quite good (160/181 = 88%) for single span transmembrane proteins.
|
Analysis of Integral Membrane Glycoproteins Lacking a Signal Peptide—
We performed a similar analysis on the single span proteins lacking a signal peptide (Fig. 3). Of 76 proteins, 48 and 26 were assigned as Type II and Type III transmembrane proteins, respectively. Only two proteins were modified on both sides of the transmembrane segment. Fig. 3 shows the positions of the glycosylation sites and the putative transmembrane segment on the polypeptide chains of Type II and Type III proteins. In the Type II proteins, the putative transmembrane segment appeared immediately after a short N-terminal sequence. The N-terminal segments that preceded the transmembrane segment had an average length of 82 residues with 39 of 48 Type II proteins (
80%) containing segments of less than 100 residues. This suggests that the transmembrane segment, or internal signal anchor, may replace the signal peptide function when the nascent polypeptide is targeted to the ER. The Type III transmembrane proteins, however, are clearly different from the Type II proteins in their length of the N-terminal segments. Namely Type III proteins have an average of 520 residues between the N terminus and transmembrane segment with the length ranging from 36 to 2,086 residues. Unlike the Type II proteins, 22 of 26 (
85%) Type III proteins have long N-terminal segments often far exceeding 100 residues (Fig. 3). For example, UNC-5 (netrin receptor/axon guidance protein) has a
350-residue N-terminal segment that consists of an extracellular Ig-like domain (Prosite document PDOC50835 in ExPASy, www.expasy.org) and two thrombospondin domains (Prosite PDOC50092). The C01G6.8 gene product (CAM-1) has an N-terminal
450-residue segment consisting of the Ig-like, Frizzled (Prosite PDOC50038), and Kringle2 (Prosite PDOC50020) domains upstream of the putative transmembrane segment. Our prediction that UNC-5 and CAM-1 have a Type III topology is reasonable because both proteins have typical intracellular domains, ZU5 (Prosite PDOC51145) and the death (Prosite PDOC50017) domains (UNC-5) or a protein kinase domain (CAM-1), downstream of their putative transmembrane segments.
|
Based on the structural characteristics of putative single span transmembrane glycoproteins identified in this study, we propose an atypical translocation mechanism for Type III transmembrane proteins in which newly synthesized polypeptide is post-translationally translocated into the ER through the translocon (Fig. 3). The mechanism underlying transmembrane protein topogenesis remains controversial (38) mainly because definitive structural data on integral membrane proteins are limited. In particular, there have been few naturally occurring Type III proteins reported (e.g. synaptotagmin II (38), microsomal cytochrome P450 (39), and yeast Golgi ecto-ATPase Ynd1p (40)), and therefore Type III transmembrane proteins are thought to have unusual topologies. It is generally accepted that the topology of Type II/III proteins is determined by the interaction between the translocon channel and the transmembrane segment of each protein essentially according to the "positive-inside" rule (38, 41). It is also believed that Type III transmembrane proteins have a relatively short N-terminal segment preceding the signal anchor or transmembrane sequence. This is due to the fact that the translocation of this segment is a post-translational event, and thus it must retain an unfolded structure to pass through the translocon. Nevertheless our results indicate that Type III transmembrane proteins are distributed much more widely among eukaryotic cells than previously thought and that they have an apparently longer N-terminal segment compared with Type II proteins. The mechanism of the generation of Type III proteins must therefore involve either a chaperonin-like activity (to maintain the long N-terminal polypeptide segment in an unfolded structure) or the unfolding of the tentatively folded structure prior to translocation. It is also likely that an energy-dependent mechanism is required to insert the nascent protein into the translocon of the ER. Our analysis for the Type III proteins assigned here indicates that many have an N-terminal structure stabilized by disulfide bridges, such as Ig-like domain, Frizzled domain, Kringle2, FN3 (fibronectin type III), EGF3 (epidermal growth factor), Sushi, LDLRA2 (LDL-receptor class A), LDLRB (class B), and AMOP (adhesion-associated domain in MUC4 and other proteins) domains. We assume that these domains retain a relatively flexible, unfolded structure in the reducing environment of the cytosol that might be advantageous for the post-translational translocation of the polypeptide into the ER. Thus, our findings suggest that the Type III transmembrane protein architecture is widespread in eukaryotic cells. Further studies of the mechanism that determines the topogenesis of integral membrane proteins will assess the validity of these predictions.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, August 30, 2007, DOI 10.1074/mcp.M600392-MCP200
1 The abbreviations used are: PTM, post-translational modification; PNGase, peptide-N-glycanase; IGOT, isotope-coded glycosylation site-specific tagging; Con A, concanavalin A; WGA, wheat germ agglutinin; GaL6, galectin 6; ER, endoplasmic reticulum; 2D, two-dimensional; OST, oligosaccharyltransferase; SRP, signal recognition particle; LDL, low density lipoprotein. ![]()
* This work was supported in part by grants for the Structural Glycomics Project from the New Energy and Industrial Technology Development Organization (NEDO) of Japan and for the Integrated Proteomics Project, Pioneer Research on Genome the Frontier from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan, and by a grant-in-aid for scientific research from MEXT. ![]()
The costs of publication of this article were de-frayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
To whom correspondence should be addressed: Research Center for Medical Glycoscience, National Inst. of Advanced Industrial Science and Technology (AIST), Central 2-12, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, Japan. Tel.: 81-29-861-3187; Fax: 81-29-861-3125; E-mail: kaji-rcmg{at}aist.go.jp
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Audhya and A. Desai Proteomics in Caenorhabditis elegans Brief Funct Genomic Proteomic, March 27, 2008; (2008) eln014v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |