|
Advertisement | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 7:347-369, 2008.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
Early studies of the cytosolic ribosome in plants revealed it was slightly smaller than that found in mammals, notably due to lower apparent mass of the 60 S subunit (10). Early biochemical analyses of the protein components of ribosomes have been undertaken using 2D gels in the monocotyledonous plants wheat, barley, and maize and the dicotyledonous plants soybean, tomato, and tobacco (11–16). Counting of distinct protein spots on these gels suggested that the 40 S subunit contained up to 40 proteins, whereas the 60 S subunit contained up to 59 proteins; however, without genomic sequences it was not possible to systematically assign these proteins to genes and gene families that would be required to resolve the composition of the complex in plants in a gene-specific manner.
The sequencing of the Arabidopsis genome provided the first opportunity in plants to consider the number and arrangement of ribosomal protein-coding genes in plants. Using the strong sequence conservation of the eukaryotic r-proteins, Barakat et al. (17) undertook an analysis of expressed sequence tags, and the early annotation of the complete genomic sequence identified 249 genes including 22 apparent pseudogenes that encoded 80 different types of r-proteins (32 small subunit and 48 large subunit proteins). The extra family of proteins not conserved in mammals was the plant-specific P3 component known to be in the 60 S subunit. Analysis of the r-protein gene families reveals that each family consists, on average, of three members. The sequences within these families can have very high conservation, leading to many paralogs with 97–100% sequence identity at the amino acid level, whereas other r-protein families contain significant sequence divergence. Based on public EST data and hybridization data on microarrays, most gene family members are expressed and could be present in the ribosome structure at different points of development, in different cell types, and under different conditions (18). Variation in ribosome composition by incorporation of different paralogs could be an important component in the regulation of transcript translation. This underpins the importance of a thorough understanding of the actual composition of ribosome protein complexes themselves and not just the set of genes that could encode ribosomes.
Several studies have sought to investigate the ribosomal protein complement of experimental samples from Arabidopsis to answer these questions. Chang et al. (19) performed a study using ribosomes purified from Arabidopsis cell culture. They undertook a MALDI-TOF analysis of spots from 2D gel-separated protein samples and also a limited tandem MS analysis of 1D SDS-PAGE-separated protein bands. Giavalisco et al. (20) undertook a similar analysis of ribosome samples extracted from whole Arabidopsis leaves using 2D gel-separated samples and MALDI-TOF MS analysis.
Chang et al. (19) identified proteins from 70 of the 80 gene families by their MALDI-TOF analysis and identified members from a further four gene families through their MS/MS analysis of 1D gel bands. In contrast Giavalisco et al. (20) could only identify 63 of the 80 gene families by their MALDI-TOF analysis. Both these studies made claims of products from particular genes within a large number of the gene families. This was often based on the presence of single MS ion masses, but often not even these could be found, and the proteins could only redundantly be linked back to gene families of two to seven members.
We used a combination of approaches to advance the MS/MS-based insight into Arabidopsis ribosomal composition and its post-translational modifications. First, we used an in silico digestion of all ribosomal proteins to define targets for data acquisition and to drive a strategy of data collection to maximize recognition of ribosomal protein-derived peptides. Second, we undertook an extensive MS/MS survey of the ribosome using trypsin and, when required, chymotrypsin and pepsin. We then used custom software to extract and filter peptide match information from Mascot result files and implement high confidence criteria for calling gene-specific identifications based on the highest quality unambiguous spectra matching exclusively to certain in silico predicted gene-specific peptides. This provided a much richer and more detailed analysis of the protein composition and also identified peptides from five gene families of r-proteins not identified in previous studies. Further we acquired strong MS/MS data to support gene-specific protein identification in 32 cases not revealed in the previous study by Chang et al. (19). In addition, we provide a wealth of information on co- and post-translational modification of r-proteins in Arabidopsis by initiator methionine removal, N-terminal acetylation, N-terminal methylation, lysine N-methylation, and phosphorylation.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Ribosomal Isolation—
All steps were carried out on ice or at 4 °C. Approximately 100 g fresh weight of Arabidopsis cell culture was homogenized in 300 ml of extraction buffer (0.45 m mannitol, 30 mm Tris, 0.5% (w/v) polyvinylpyrrolidone 40, 100 mm KCl, 20 mm MgCl2, 0.5% (w/v) bovine serum albumin, 20 mm cysteine) in a Waring blender for 2 min, and the homogenate was filtered through four layers of muslin. The filtrate was centrifuged for 5 min at 1500 x g, and the supernatant was again centrifuged at 16,000 x g for 15 min. The supernatant of this 16,000 x g sample was then centrifuged at 30,000 x g for 30 min. In 70-ml polycarbonate bottle assemblies (Beckman part number 355655), 50-ml portions of the 30,000 x g supernatant were then layered over 20-ml cushions of 1.5 m sucrose dissolved in Ribosome Buffer (30 mm Tris, 100 mm KCl, 20 mm MgCl2, 5 mm β-mercaptoethanol) and centrifuged at 180,000 x g for 14.5 h to form a crude ribosomal pellet. Each crude ribosomal pellet was resuspended in 1 ml of Ribosome Buffer and centrifuged at 20,800 x g for 30 min to pellet insoluble material. The supernatants were combined, brought to a volume of 50 ml with Ribosome Buffer, and ultracentrifuged through 1.5 m sucrose as above. The pure ribosomal pellet was resuspended to a protein concentration of 6.5 mg/ml in Ribosome Buffer, snap frozen in liquid nitrogen, and stored at –80 °C until use. This protocol yielded 2–3 mg of ribosomal protein from 100 g fresh weight of cells.
For the isolation of ribosomes from mitochondria, mitochondria were first purified from Arabidopsis by differential centrifugation and density gradient centrifugation essentially as described by Millar et al. (22). A concentrated suspension of mitochondria containing
40 mg of mitochondrial protein was resuspended to a total volume of 50 ml in Ribosome Buffer containing 2% (w/v) Triton X-100. The suspension was incubated on ice for 30 min with occasional gentle mixing. The suspension was then clarified by centrifugation at 30,000 x g for 30 min, and the supernatant was layered over a 20-ml cushion of 1.5 m sucrose in Ribosome Buffer and centrifuged at 180,000 x g for 20 h. After removal of the supernatant, the ribosomal pellet was resuspended in a minimal volume of Ribosome Buffer, snap frozen in liquid nitrogen, and stored at –80 °C until use.
Arabidopsis Cytosolic Ribosome Dissociation—
Ribosome dissociation was essentially carried out according to Lin and Key (23). Briefly 100 µg of purified cytosolic ribosomes were resuspended in 50 µl of either modified ribosome buffer (30 mm Tris, 100 mm KCl, 5 mm MgCl2, 1 mm DTT, pH 7.5) or modified ribosome buffer containing 0.5 m KCl. Resuspended samples were loaded onto a 10-ml linear sucrose gradient (15–30%) in either modified ribosome buffer or modified ribosome buffer containing 0.5 m KCl and subjected to ultracentrifugation at 260,800 x g at 4 °C for 4 h using a Beckman SW41 Ti rotor. Fractions of
200 µl were collected directly into 96-well plates using a peristaltic pump. Absorbance readings (A260 and A280) were conducted on a POLARstar OPTIMA microplate reader (BMG Labtech) using 200 flashes/well.
Gel Electrophoresis—
SDS-PAGE gels used were 4% acrylamide stacking gels above 12% (w/v) acrylamide, 0.1% (w/v) SDS or 5–16% (w/v) acrylamide, 0.1% (w/v) SDS in a large gel format (0.1 x 16 x 16 cm) and were run with a Tris-glycine buffering system. Gel electrophoresis was performed at 25 mA/gel and completed in 3 h.
Pro-Q Phosphoprotein Detection—
Approximately 25 µg of ribosomal proteins were solubilized in SDS-PAGE sample buffer and loaded onto a 12% polyacrylamide gel (14 cm x 16 cm x 0.75 mm) overlaid with a 4% stacking gel. The sample underwent electrophoresis for 4 h at 30 mA and upon completion was fixed in a solution consisting of 10% acetic acid and 50% methanol overnight. Following three 10-min washes in ddH2O each, gels were stained with 100 ml of Pro-Q Diamond (Invitrogen) for 90 min and destained using three successive 30-min washes with 100 ml of destain solution containing 50 mm sodium acetate (pH 4.0) and 20% acetonitrile. After two washes in
100 ml of ddH2O for 5 min at a time, fluorescent images of in-gel phosphorylated proteins were acquired using a Typhoon fluorescence scanner (GE Healthcare) with 532 nm excitation, a 580-nm band pass emission filter, and the photo multiplier tube set at 500 for optimal Pro-Q dye detection. ImageQuantTM software was used to view Pro-Q staining of the 1D PAGE gel. The gel was then stained with colloidal Coomassie overnight with gentle rocking. Following staining, the solution was discarded, and the gel was destained in 0.5% phosphoric acid for 4 h.
Phosphopeptide Enrichment Using Titanium Dioxide—
TiO2 tips (NuTip) were supplied by Glygen Inc., and phosphopeptide enrichment procedures were essentially those outlined by Larsen et al. (24) with some modifications. Briefly TiO2 tips were conditioned prior to binding of phosphopeptides by aspirating/expelling 10 µl of 0.1% TFA solution in ddH2O (pH
1.9) five times using a pipette and expelling all of the TFA solution from the tip. Before binding, the pH of a 5-µl sample containing 10 µg of trypsin-digested ribosomal peptide lysate mixture was adjusted to pH
1.9 by adding one part 1% TFA solution in ddH2O to four parts peptide mixture. The peptide mixture was aspirated/expelled 50 times using a pipette to ensure maximal binding of phosphorylated peptides to TiO2. The binding solution was then completely expelled from the tip. Bound samples were washed with 10 µl of 50% acetonitrile, 0.1% TFA aspirating/expelling the wash solution 10 times through the tip. This step was repeated two more times for a total of three wash steps. At the end of each wash step, all of the wash solution was expelled from the tip. Finally the bound phosphopeptides were eluted from the TiO2 tip by aspirating/expelling 10 µl of 0.3 m NH4OH solution in ddH2O (pH 10.5) 10 times through the tip. Portions of this eluate were used for nano-ESI-MS/MS analysis.
Q-TOF MS—
For gel-arrayed proteins, samples to be analyzed were cut from the gels and destained twice for 45 min in 10 mm NH4CO3 and 50% (v/v) acetonitrile. Samples were dehydrated at 50 °C in a dry block heater for 30 min and rehydrated with 15 µl of digestion solution, which for trypsin and chymotrypsin consisted of 25 mm NH4CO3 and 25 µg/ml protease in 0.01% (v/v) trifluoroacetic acid and for pepsin consisted of 25 µg/ml pepsin in 7% (v/v) formic acid, and incubated overnight at 37 °C. Peptides were extracted by adding 15 µl of acetonitrile and vigorous shaking for 15 min, removing liquid, and adding 15 µl of 50% (v/v) acetonitrile and 5% (v/v) formic acid to the gel plugs followed by another 15 min of shaking (this step was repeated); washes were pooled after each extraction step. Samples were loaded onto self-packed Microsorb (Varian Inc.) C18 (5-µm, 100-Å) reverse phase columns (0.5 x 50 mm) using an Agilent Technologies 1100 series capillary liquid chromatography system and eluted into a QStar Pulsar i MS/MS system fitted with an IonSpray source (Applied Biosystems). Peptides were eluted from the C18 reverse phase column at 8 µl/min using a 9-min acetonitrile gradient (5–60%) in 0.1% (v/v) formic acid. Ions were selected automatically for the N2 collision cell utilizing the information-dependent acquisition capabilities of Analyst QS version 1.1 (Applied Biosystems) and the rolling collision energy feature for automated collision energy determination based on the m/z of the ions. The method used a 1-s TOF MS scan that automatically switched (using information-dependent acquisition) to a 2-s product ion scan (MS/MS) when a target ion reached an intensity of greater that 30 counts and its charge state was identified as 2+, 3+, or 4+. TOF MS scanning was undertaken on an m/z range of 200–900 m/z using a Q1 transmission window of 180 amu (100%). Product ion scans were undertaken at m/z ranges of 70–2000 m/z at low resolution utilizing Q2 transmission windows of 50 (33%), 190 (33%), and 650 amu (34%).
For ribosomal protein lysates, samples were digested with trypsin (1:10, w/w) in 10 mm NH4CO3 overnight at 37 °C. For initial analyses digested samples of 1–10 µg were analyzed directly or after TiO2 selection as above except bound peptides were eluted over a 30-min period (5–60% acetonitrile in 0.1% (v/v) formic acid) at 10 µl/min. The analysis method used was similar to that described above except a TOF MS scan range of 400–1200 m/z was used with a Q1 transmission window of 380 amu (100%). For more detailed analyses samples of
0.2–1 µg were analyzed on a QStar Pulsar i MS/MS system (Applied Biosystems) fitted with a NanoES source (Protana Inc.) outfitted with a New Objective adapter (ADPT-PRO) to allow direct liquid chromatographic coupling to the source. Samples were loaded using an 1100 series capillary LC system (Agilent Technologies) onto a capillary sample trap column containing a 100-µm x 2.5-cm C18 insert (Upchurch Scientific) at 1 µl/min in 5% (v/v) acetonitrile and 0.1% (v/v) formic acid. Bound peptides were eluted at 500 nl/min into the mass spectrometer after automatically switching a retrofitted 10-port nanobore valve (Valco Inc.) on the QStar Pulsar i by an 1100 series Nano Pump (Agilent Technologies). Peptides were eluted over a 30-min period and analyzed using parameters similar to those outlined above for LC-MS/MS analysis of lysates.
Resulting MS/MS-derived spectra were analyzed against an in-house Arabidopsis database comprising ATH1.pep (release 6) from TAIR and the mitochondrial and plastid protein sets (TAIR). This sequence database contained a total of 30,700 protein sequences (12,656,682 residues). Mascot Generic Format (.MGF) MS/MS peak lists were generated from raw Sciex Analyst format (.WIFF) data files with Mascot Daemon using the "mascot.dll 1.6b21 for Analyst QS 1.1" data import filter available from Matrix Science. The settings used for MS/MS peak list generation were: centroid survey scan ions (TOF MS) at a height percentage of 50% and a merge distance of 0.1 amu (for charge state determination); centroid MS/MS data at a height percentage of 50% and a merge distance of 2 amu; reject a CID if less than 10 peaks or if precursor mass less than 50 Da or greater than 10,000 Da; precursor mass tolerance for grouping, 1 Da; maximum number of cycles between groups, 10; minimum number of cycles per group, 1; default precursor charge states, 2+ and 3+. Searches were conducted using Mascot Search Engine version 2.1.04 (Matrix Science) with the following parameters: mass error tolerances of ±75 ppm for MS and ±0.6 Da for MS/MS; "Max missed cleavages" set to 2; variable modification, oxidation (Met); instrument set to ESI-Q-TOF. Results were filtered using "Standard scoring," "Maximum number of hits" was set to 200, "Significance threshold" was set at p < 0.05. For initial searching, "Ions Score cut-off" was set at 0; however, peptide matches were later filtered to remove peptide matches with expect values above 0.05. To test the expected confidence levels of the analytical approach, false positive peptide identification rates under the matching criteria described above were estimated at 2.7% by searching against a randomized version of the sequence database used for real searches. More extensive protein modification options were used after initial matching was performed with putative modified peptide spectra manually interpreted to confirm modification.
Mascot peptide match reports thus produced were exported to comma-separated value text file format and parsed with a custom PHP script to collect and filter peptide match results, map peptide hits onto r-protein genes and r-protein families, produce customized tab-delimited text and HTML (Hypertext Markup Language) reports quantitatively summarizing MS/MS evidence for the detection of specific r-protein genes and r-protein families, and automatically generate annotated MS/MS spectral diagrams in scalable vector graphics (.SVG) format for manual verification of peptide identifications in cases where a protein identification was supported by a single spectrum. The script included steps to filter out peptide matches with expect values above a threshold value of 0.05, select the highest scoring MS/MS match to each non-redundant peptide sequence (different charge or modification states on the same peptide sequence were regarded as the same peptide sequence), and use the tables of in silico predicted peptides described earlier under "In Silico Digestion Analysis" to automatically assign and count "total" and "gene-specific" peptide hits to r-protein genes and total peptide hits to r-protein families and sum the Mascot "Ions Scores" for each of the resultant sets of unique top scoring hits to total and gene-specific peptides assigned to each r-protein and r-protein family. To avoid false positive claims of gene-specific r-protein identification based on ambiguous spectra, MS/MS spectra that gave significant matches to more than one peptide sequence were not allowed to contribute to gene-specific peptide counts and scores.
Supplemental Material—
Size analyses of theoretical peptides derived from ribosomal proteins following cleavage by different enzymes are provided in supplemental Data 1. Generic format data files (.MGF) generated from the primary mass spectra based on the script explained above are provided as supplemental Data 2; these can be directly reanalyzed on line at Matrix Science by the reader. Detailed information of the mass spectra matching and individual peptide scores and sequence coverage information that form the basis of the identifications in Table I are provided in supplemental Data 3. Detailed information about the mass spectral matching and peptide scores for gene-specific matches to members of protein families among S, L, and P ribosomal proteins are in supplemental Data 4 together with annotated MS/MS spectra and MS/MS peak lists for gene-specific identifications and protein families supported by single peptide identifications and the protein families S29, L36a, S30, L39, and L29. Detailed information of the mass spectra matching and peptide scores for gene-specific matches to non-ribosomal proteins identified is supplied in supplemental Data 5. Detailed information on the spectra interpreted for different classes of post-translational modified peptides is presented in supplemental Data 6; annotated, centroided, and deisotoped spectra supporting these post-translational modification claims are provided in supplemental Data 7.
|
| RESULTS |
|---|
|
|
|---|
13,000 unique peptide sequences (Fig. 1). Approximately 9400 (72%) of these peptide sequences were predicted to come from only one of the 409 protein sequences (gene-specific peptides), whereas others could be derived from a number of different genes. Analysis of all these data showed that many low molecular mass peptides are produced by trypsinization of ribosomal proteins due to the high arginine and lysine content of these relatively basic protein sequences.
|
From this analysis of both theoretical digestion and published data on the ribosome, a strategy was developed to maximize coverage of the ribosomal analysis. First, it would be necessary to obtain a large number of unique MS/MS spectra to provide coverage of the likely 80–249 ribosomal proteins expected in the cytosolic proteome. Given that 72% of tryptic peptides were likely to be gene-specific (Fig. 1),
500–1500 peptide identifications would be required to provide, on average, three to four specific peptides for each protein match depending on the number of specific proteins in the ribosome. Second, it would be important to use a low m/z cutoff for the selection of MS ions for MS/MS analysis; 200–900 was chosen on the basis of the abundance of peptides below 800 amu (which would be doubly charged MS ions below 400 m/z from electrospray ionization) especially in the families of proteins so far not identified in Arabidopsis samples. Third, it would be valuable to use a simple protein display technology such as 1D SDS-PAGE to ensure that proteins were not lost for analysis due to being too basic or not dissolving in IEF buffers for 2D gel analysis or being lightly stained and not analyzed due to lack of visible spots in 2D gel analysis of ribosomes. This was especially important because these technical reasons had been used to explain the lack of coverage of certain families in previous reports on the ribosome composition (19, 20). Fourth, given the similarity of peptide sequences within and between ribosomal protein families it would be necessary to undertake an extensive analysis of the gene specificity of any given peptide sequence to ensure that only real gene-specific matches were used as evidence of specific proteins.
Isolation, Display, Digestion, and MS Analysis of Ribosomal Proteins—
The protein complement of ribosomes isolated from Arabidopsis cell culture was displayed by SDS-polyacrylamide gel electrophoresis. The gel separation of the successive steps in the purification of ribosomal proteins provides a view of the decreasing complexity of the protein sample and the refinement of the major bands of r-proteins. In Fig. 2, far right lane, the discrete banding pattern of the ribosomal samples is evident.
|
|
|
Peptide hit information mapped onto genetic loci was then processed by the PHP script to automatically count the number of unique peptide sequences detected per gene in the ribosomal gene set, select the highest scoring MS/MS spectrum for each unique detected peptide sequence, and add together the peptide scores for all the best unique peptide hits for each r-protein family (compiled in Table I). The same approach was used to calculate the number of gene-specific peptides found and the sums of the peptide scores for non-redundant gene-specific peptides found for each gene (see supplemental Data 4). Systematic analysis of the sequences of in silico predicted gene-specific peptides revealed that some of these peptides differed from each other only through isobaric Leu/Ile or Gln/Lys substitutions that would not be resolvable in our MS/MS analysis and would result in these spectra giving rise to search engine matches against more than one gene-specific peptide sequence. This finding highlighted the importance of using only unambiguously matched MS/MS spectra to support gene-specific protein identification. Hence spectra that matched significantly to more than one peptide sequence were not allowed to contribute gene-specific peptide identification counts or scores. Fig. 5 provides a flow diagram overview of the whole data analysis procedure.
|
The newly identified proteins from our analysis were from the 40 S S29 and S30 families and the 60 S L29, L36a, and L39 families. These small proteins were identified through a combination of spectral matches from trypsin, chymotrypsin, and pepsin digestions of low molecular mass protein bands (Table II). Inspection of the matched peptides showed that 64% of the matches were to MS ions smaller than 400 m/z, justifying our strategy to include low m/z MS2+ and MS3+ ions in our analysis. All the presented spectra for these newly identified proteins were family-specific, but none were gene-specific. In the cases of the three members of S29, two members from L36a, and three members from S30, there are no specific peptides derived from theoretical digests. Among the three members of L39, a very small number of gene-specific peptides can theoretically be produced, but none were found in this analysis.
|
850 peptide identifications mainly from MS ions in MALDI-TOF with only 172 claims based on MS/MS spectra of which only 66 were gene-specific MS/MS matches. The Giavalisco et al. (20) study was entirely based on an unknown number of MS ions derived by MALDI-TOF. Claiming gene-specific identifications within conserved protein families relies on the high confidence matching of data derived from often only a small number of individual peptides. In this context, MS/MS spectra are much more powerful discrimination tools than the matching of MS ion masses from MALDI-TOF spectra. We used a high confidence cutoff for claiming gene-specific identification. For genes with only a single detected gene-specific peptide, the peptide had to have a peptide score equal to or greater than 40 and an expect value below 0.05, and the MS/MS spectrum had to be manually verified. The annotated MS/MS spectra for these cases are supplied in supplemental Data 4. For genes with more than one detected gene-specific peptide, all peptides were required to have an expect value below 0.05 and an average peptide score of at least 30. Our analysis of gene-specific peptides in silico revealed that within 10 gene families no theoretical gene-specific peptides exist: S18, S29, S30, L11, L21, L23, L36a, L38, L40, and L41. So for these proteins, no amount of spectral data will allow verification of heterogeneity of the protein population. For a further eight families the level of sequence identity is so high that few sequence-specific peptides exist, and hence it is rather unlikely they could be found (S4, S13, S17, S26, S27, L15, L35a, L37, and L29). But for the remaining 63 families enough theoretical gene-specific peptides exist to suggest that a high coverage MS/MS dataset could be able to define with clarity whether gene-specific data exist to justify the specific presence of gene products from different chromosomal loci.
Several errors in the assignments made by Chang et al. (19) were apparent from our further analysis. In the family P3, Chang et al. (19) claimed specific identification of At4g25890, but the peptides they claimed as matches were actually not from this gene product but from the other member of the family, At5g57290. For this protein we also made the specific identification of a locus-specific peptide. Chang et al. (19) claimed a specific gene match to two L19 proteins from a family of four members, At1g02780 and At4g02230. However, on closer inspection the peptide claimed for At1g02780 (GPGGDVAPVAAPAPAATPAATPAPTAAVPK) is not present in the sequence of this protein. Our data strongly point to gene-specific data only for At4g02230. In the two-member L15 family, Chang et al. (19) claimed gene-specific identification of At4g16720 based on a single peptide, but on closer inspection this peptide is indistinguishable from the peptide from the other member of this family as it is a Leu/Ile difference that cannot be determined by MS. The two members of L15 are indistinguishable in our analysis; there is only one theoretical peptide that could distinguish this pair, but it remained undetected by MS even though we had MS/MS data on 12 peptides to this family. A similar story is evident in the S5 family of two members. Chang et al. (19) claimed a specific peptide to At2g37270, but it is again a Leu/Ile difference to At3g11940 and therefore indistinguishable by MS. We had six MS/MS spectra to this family but were unable to provide a gene-specific peptide; only three exist based on theoretical digests. In S25, Chang et al. (19) claimed specific peptides for At2g21580, but these peptides are no longer gene-specific because of the discovery of a new member of S25, At4g34555. S25 proteins are only
12 kDa (encoded in a single exon), and this gene was not annotated in the earlier versions of the Arabidopsis genome. The claims of Giavalisco et al. (20) on gene specificity were not based on gene-specific peptides per se but on selecting the top match based on the Molecular Weight Search score from a search of peptide mass fingerprinting data; thus we could not with confidence compare our data with the claims made in this study.
Small Subunit Protein Families—
We found good gene-specific matches for all three S3 members and also for both members in the two-member families S3a, S6, and S24 (supplemental Data 4). In each of these cases substantial EST evidence suggests that all members of these families are expressed and hence that there is a heterogeneous population of these isoforms in ribosomes. However, there are also families in the 40 S subunit where the evidence is much more in favor of single gene products or two gene products (from large families) dominating the expressed protein profile, for example in Sa, S28, S9, and S8 where the member-specific data match closely to the EST evidence for gene expression. There are still families where we are unable to provide substantial evidence for which specific protein products are found in the ribosome (e.g. S5, S15, S20, and S25).
Large Subunit Protein Families—
We found good gene-specific matches for three of four members for the four-member families L7, L13a, and L35 and for both members in the two-member families L4, L5, L7a, L14, L17, L23a, L26, L28, and L32 (supplemental Data 4). In all of these cases substantial EST evidence suggests that all members of these families are expressed, suggesting a heterogeneous population of these isoforms from the 60 S ribosome subunit. However, there are also families in the 60 S subunit where the evidence is much more in favor of single gene products or two gene products (from large families) dominating the expressed protein profile, for example in L3, L10a, L13, L24, and L37a where the member-specific data match closely to the EST evidence for gene expression. There are still families where we are unable to provide substantial evidence for which of the protein products are found in the ribosome (e.g. L12, L15, L19, L27a, L29, and L30).
Stalk P-protein Families—
In the four P r-protein families of acidic stalk proteins specific peptides were found for two of the four P0 and three of the five P2 protein isoforms (supplemental Data 4). A series of P1 family peptides were found, one each specific for the first three members of this family, but only one of these met our tight specifications of gene specificity. Only one gene-specific peptide was found for P3, and this had a high Mascot Ions Score and was specific for At5g57290, which is consistent with much higher EST contribution from this gene family member.
The S15a Family Cytosolic and Mitochondrial Members—
The S15a family was highlighted in a high profile analysis of ribosomal protein evolution as an example of a cytosolic gene family that evolved to direct specific members to be part of the mitochondrial ribosome and retain others in the cytosolic ribosome (25). Sequence comparisons showed two types of S15a sequences, type I (At1g07770, At2g35590, At3g46040, and At5g59850) and type II (At2g19720 and At4g29430). The type II proteins were shown to be translocated to the mitochondria, whereas the type I proteins remained in the cytosol (25). Chang et al. (19) claimed peptide-specific evidence for both type I and type II proteins in the cytosolic ribosome. In our analysis we were unable to find any evidence for peptides even redundantly matching the type II proteins (At2g19720 and At4g29430) in our cytosolic ribosome samples; all 11 peptides found were specific for type I protein members (Tables I and III). In parallel, we partially enriched ribosomes from mitochondrial samples and analyzed the analogous band on 1D gels that matched band 21 where S15a peptides were found. Here we found strong evidence of type II sequences, one peptide that redundantly matched to At2g19720 and At4g29430 and two peptides specific for At4g29430 (Table III), and no significant evidence of cytosolic type I sequences. Thus we propose that the presence of type II peptides in the study of Chang et al. (19) is likely due to mitochondrial ribosome contamination of their samples rather than the real presence of type II proteins in the cytosolic ribosome.
|
Covalent Modifications of Arabidopsis Cytosolic Ribosomal Proteins—
Covalent protein modifications such as acetylation, methylation, and phosphorylation have emerged as potentially being important factors contributing to ribosomal heterogeneity in both eukaryotes (4, 19, 26, 27) and prokaryotes (28, 29). In the ribosomes of higher plants, studying the role of covalent r-protein modification in ribosomal heterogeneity is complicated by the frequent expression of relatively high (compared with other classes of organism) numbers of different, yet often highly conserved isoforms of each ribosomal subunit. This situation adds another layer of complexity to the MS-based analysis of covalent r-protein modification in higher plants because very often the mass difference between two closely related peptides from two closely related r-protein isoforms is exactly or almost equal to the mass difference expected from a covalent modification. Hence in many cases, parent ion mass, as obtained by simple MS, cannot distinguish the modified peptide of one family member from a closely related, unmodified peptide derived from another family member. As we demonstrate here, this problem can be largely overcome through the use of LC-ESI-Q-TOF-MS/MS, which allows isobaric peptides of very similar structure to be distinguished on the basis of their MS/MS fragmentation patterns.
To identify sites of covalent modification on ribosomal proteins, candidate MS/MS spectra potentially corresponding to covalently modified peptides were identified in the complete MS/MS dataset through a series of Mascot searches considering various combinations of acetylation; formylation; mono-, di-, and trimethylation; and phosphorylation as variable modifications. MS/MS spectra for which the top Mascot match (across all the Mascot searches) was a covalently modified peptide with a peptide score greater than 30 were manually inspected to verify the identity of the peptide and determine the exact position(s) of the modified residue(s). This manual inspection process allowed near isobaric modifications such as trimethylation/acetylation and dimethylation/formylation (which each differ by 0.0364 amu) to be distinguished on the basis that for low m/z fragment ions carrying the modification the mass accuracy of the TOF analyzer was sufficient to resolve the two near isobaric modifications. This approach revealed strong MS/MS evidence for 30 unique covalently modified peptides corresponding to a total of 41 covalent modification events. Detected modification types included removal of N-terminal methionine (15 cases), N-terminal acetylation (12 cases), N-terminal dimethylation (one case), N-methylation of lysine side chains (three cases), and phosphorylation (nine cases). Of the 30 covalently modified peptides detected, 13 (43%) could be assigned to a specific genetic locus, whereas the remaining 57% could only be redundantly assigned to two to four members of a particular r-protein family due to sequence conservation between family members. Overall the results of this analysis (Table IV and supplemental Data 6) suggest extensive covalent modification of the ribosome with at least 23 (29%) of the 80 ribosomal protein families exhibiting at least one covalent modification.
|
|
-amino group of the Lys-3 residue. First, in the raw spectrum, signals corresponding to singly charged, unmodified y1–y6 and y10 ions were clearly present suggesting that the modification(s) adding 70 Da to the parent ion mass was located within the first five residues. This conclusion was supported by strong doubly charged b3–b5, b8, and b11–b14 series ions that were all
35 m/z (
70 Da) higher than expected for the unmodified peptide. This confirmed that the modifications were localized to the first three residues, Pro-Pro-Lys-. Although a number of combinations of modifications that could occur at these residues could explain a nominal mass increase of 70 Da (e.g. formyl + acetyl, acetyl + dimethyl, formyl + trimethyl, or dimethyl + trimethyl), these modifications could be divided into three different accurate mass groups resolvable on the TOF mass analyzer: +C3O2H2 (+70.0055 Da), +C4OH6 (+70.0419 Da), and +C5H10 (+70.0783 Da). Calculation of the observed accurate mass shifts of the putatively modified doubly charged b3, b4, and b5 ions relative to the theoretical unmodified masses revealed that the observed signals for these ions in the raw spectrum corresponded to respective mass shifts of 70.0688, 70.0692, and 70.0782 Da. That is, the observed masses of the modified b3, b4, and b5 ions were, respectively, 44, 36, and 58 ppm closer to the masses expected for a +C5H10 modification (e.g. N-terminal dimethylation + lysine N-trimethylation) than to the next most isobaric modification, +C4OH6 (e.g. N-terminal acetylation + lysine N-dimethylation). This observation strongly suggested that the modifications responsible for the 70-Da mass increase within the first three N-terminal residues of L12 were caused entirely by the addition of alkyl groups, most probably five methyl groups. Given that the proline N terminus is theoretically capable of accommodating up to two methyl groups and the
-amino group of lysine is capable of accommodating up to three, we propose that the most likely explanation for the observed data is that the N-terminal region of the Arabidopsis cytosolic r-protein L12 is N,N-dimethylated at the N-terminal proline and N,N,N-trimethylated at the
-amino group of the Lys-3 residue (see annotated spectra in Fig. 6B).
N-Methylation of Lysine—
Mascot searches considering mono-, di-, and trimethylation of lysine as variable modifications matched 27 MS/MS spectra to a total of 20 unique lysine-methylated r-protein-derived peptides with a parent ion delta mass <0.1 Da and peptide score >30. However, on closer inspection it was found that the spectra matched to nine of the 20 peptides actually matched more strongly with unmodified, isobaric peptides of similar sequence from closely related family members. Manual interpretation of the highest scoring spectra for the 11 remaining peptides revealed strong y series evidence for lysine methylation in three peptides (Table IV). Two of these manually confirmed peptide matches, MGLENMDVESLKMe3K (Mr = 1566.76) and MGLSNMDVEALKMe3K (Mr = 1508.71) where KMe3 represents trimethylated lysine, matched specifically to two members of the L10a family (At1g08360/RPL10aA and At2g27530/RPL10aB, respectively) providing strong evidence for site-specific N-trimethylation of Lys-90, which resides in a conserved region of At1g08360 (RPL10aA) and At2g27530 (RPL10aB). The third manually confirmed hit to a lysine-methylated peptide was to QSGYGGQTKMePVFHK (Mr = 1546.79), which matches redundantly to the two completely homologous isoforms of the L36a r-protein family, At3g23390 and At4g14320. The annotated spectrum for this peptide is shown in Fig. 6C.
Phosphorylation—
Previous analyses of the eukaryotic ribosome from a variety of species have characterized a range of phosphoproteins from this complex. To obtain a more complete picture of the extent of protein phosphorylation of the Arabidopsis cytosolic ribosome, the in-gel phosphoprotein stain Pro-Q Diamond (Invitrogen) was used to obtain an in-gel phosphorylation profile (Fig. 4B). Significant staining was observed with the phosphostain in several regions of the 1D PAGE and cross-referenced with the Coomassie-stained version of the gel. Ribosomal proteins identified through mass spectrometry in these regions were highlighted as putative phosphorylation targets (Fig. 4B). Data from samples previously analyzed from these regions were reinterrogated with Mascot (Matrix Science) using the phosphorylation modification feature. Significant matches were only found for two peptides using this process, and both were derived from the two isoforms of RPS6 (Table IV). In an attempt to identify further phosphoproteins from the samples and further validate the phosphostaining, a phosphopeptide enrichment procedure using TiO2 was used on trypsin-digested ribosome lysates. A total of 10 µg of total ribosome protein was used for phosphoprotein enrichment with the enriched fraction analyzed by nano-LC-MS/MS. Resultant data were queried using Mascot (Matrix Science) with the phosphorylation modification enabled. Several significant matches were obtained to phosphopeptides derived from ribosomal proteins (Table IV), and all except the S6 peptides were shown to be highly enriched by the TiO2 procedure (data not shown). These included peptides derived from the entire family of acidic ribosomal proteins, P0, P1, P2, and P3, and a specific isoform of RPL13. The MS/MS spectra for all phosphopeptides were manually interrogated to ensure high levels of confidence for the match (supplemental Data 6), and the annotated MS/MS diagrams and MS/MS peak lists are provided in supplemental Data 7. The P3 spectrum is shown as an example in Fig. 6D.
| DISCUSSION |
|---|
|
|
|---|
Conservation of Covalent Modifications between Eukaryotic Ribosomes—
Comparisons of the covalent r-protein modifications identified by this study in Arabidopsis with previous reports from other organisms indicate a considerable degree of conservation across eukaryotes in terms of the types of modifications present and the orthology of the modified subunits (Table V). For example, in S. cerevisiae, mass spectrometry of intact proteins of the large ribosomal subunit has suggested methylation of the yeast r-protein families L1 (L10a in Arabidopsis), L3, L12, L23, L42 (L36a in Arabidopsis), and L43 (L37a in Arabidopsis), although the sites of methylation could not be determined by this approach (4). We identified methylated residues in members of three orthologous Arabidopsis r-protein families, L10a, L12, and L36a, with these accounting for all of our lysine-methylated peptide matches. Although we did not find evidence for methylation of Arabidopsis orthologues of yeast L3, L23, or L43 (Arabidopsis L3, L23, and L37a, respectively) it is possible that these r-proteins are also methylated and that the peptides carrying these modifications escaped detection in our study. Indeed the fact that we did not detect any cases of arginine methylation, as has been reported for yeast (8, 40) and mammalian (41) ribosomes, is not surprising due to the frequent occurrence of this modification within arginine- and glycine-rich protein regions (42) that, upon trypsinization, would be expected to have given rise to very small peptides giving low m/z peptide ions below the minimum of our parent ion scanning range. Patterns of initiator methionine removal, N-terminal acetylation, and phosphorylation also appear to be largely conserved between plants, animals, and fungi (Table V). Overall the high degree of conservation of covalent modifications between diverse eukaryotes suggests that many of these modifications are fundamentally important for ribosomal function.
|
-amino group of lysine 3. Interestingly L12 is the only Arabidopsis cytosolic ribosomal protein predicted to have the Pro-Pro-Lys- N-terminal sequence after methionine removal, and this sequence is conserved in almost every eukaryotic L12 protein sequence available. At the time of this publication, we could find in the literature relatively few reported examples of proteins modified by N-terminal proline methylation: the histone H2B from the starfish Asterias rubens that was confirmed by NMR to have an N,N-dimethylproline structure at its N terminus (43), the orthologous histone H2B of Drosophila (44), the cytochrome c557 of the protozoan Crithidia oncopelti (45, 46), and the cytosolic ribosomal protein S25 of S. cerevisiae (47). Remarkably all of these proteins were shown to have the same Pro-Pro-Lys- N-terminal structure as we have shown here for ribosomal protein L12 in Arabidopsis. This finding provides further support for previous suggestions that the N-terminal Pro-Pro-Lys motif may be recognized by an as yet unknown N-terminal N-methyltransferase enzyme that is widely distributed in nature (48) and, to the best of our knowledge, provides the first evidence that such an enzyme occurs in plants.
Evidence has very recently emerged that the N-terminal region of the S. cerevisiae L12 orthologue is also pentamethylated with the detection, by LC-MS/MS, of a pentamethylated form of the N-terminal peptide, PPKFDPNEVKYLYLR, in a proteolytic digest of HPLC-purified rpL12ab (37). The authors of that study interpreted their MS/MS spectrum of this peptide to suggest the presence of a mixture consisting of two isomeric forms: one with dimethylation at Lys-3 and trimethylation at Lys-10 and another with trimethylation at Lys-3 and dimethylation at Lys-10. Given the similarity of our findings with those of Porras-Yakushi et al. (37) and the presence of the putative motif for N-terminal proline dimethylation, Pro-Pro-Lys-, in the yeast rpL12 protein, we decided to examine the presented yeast MS/MS spectrum for evidence of combined N-terminal proline dimethylation and Lys-3 trimethylation. Indeed we found strong evidence to suggest that a significant proportion of the yeast peptide is modified in the same way as the equivalent peptide from Arabidopsis (see, in particular, the high intensity doubly charged b5 fragment at m/z 328.19 exhibiting the proline effect enhanced by the adjacent Asp residue, the doubly charged b3 ion at m/z 197.15, the singly charged y7 ion at m/z 954.53, and the singly charged y10 ion at m/z 1294.63 also exhibiting the Asp/Pro effect-induced enhancement of intensity). Although Porras-Yakushi et al. (37) identified MS/MS signals consistent with their proposed peptide structures (which include di- or trimethylation at Lys-10), the relative intensities of the above signals in the presented MS/MS spectrum suggest that at least a significant proportion of the peptide population responsible for the spectrum consists of a form that is not methylated at Lys-10 but carries the five methyl groups within the first three residues. Porras-Yakushi et al. (37) observed that deletion of the S. cerevisiae gene ydr198c, encoding a putative suppressor of variegation, Enhancer-of-zeste, Trithorax domain-containing lysine N-methyltransferase, led to the loss of three methyl groups and the detection of only two methyl groups on the N-terminal peptide of rpL12ab. Interestingly examination of the presented MS/MS spectrum for this dimethylated peptide derived from the
ydr198c mutant strain revealed the disappearance of the putatively pentamethylated b5 ion (present at m/z 328.19 in the wild-type spectrum) and the appearance of a peak with similar relative intensity at m/z 307.15. The m/z difference between these putative doubly charged b5 ion signals (21.034 amu) is consistent with a loss of 42.07 Da (equivalent to three methyl groups) from the first five residues of the peptide. Similarly the putative pentamethylated, doubly charged b3 ion (observed at m/z 197.15 in the wild-type spectrum) is apparently missing in the
ydr198c spectrum, whereas the relative intensity of a putative dimethylated, singly charged b3 ion is greatly increased at m/z 351.23. In addition to the 42.06-Da mass shift, the shift in charge state from z = 2 to z = 1, observed for the b3 signal, is consistent with the loss of constitutive positive charge associated with the quaternary ammonium group of the trimethyllysine residue. Together these observations strongly suggest that deletion of ydr198c gene from S. cerevisiae has no effect on the dimethylation of the N-terminal proline residue but leads to a loss of trimethylation from Lys-3 in rpL12ab. It should be noted that our interpretations of MS/MS data presented by Porras-Yakushi et al. (37) are not mutually exclusive with those of the authors and that the existence of multiple isomeric methylation states for rpL12ab in S. cerevisiae is an interesting possibility worthy of further investigation. We could find no convincing evidence for such a phenomenon in Arabidopsis rpL12 or any other Arabidopsis ribosomal protein for which we detected lysine-methylated peptides while manually interpreting the MS/MS spectra supporting our claims of covalent protein modification. The A. thaliana genome contains at least 29 actively transcribed genes encoding suppressor of variegation, Enhancer-of-zeste, Trithorax domain-containing proteins (49). However, a role for any of these proteins in the methylation of Arabidopsis ribosomal proteins has yet to be established.
Phosphoproteins of the Arabidopsis Cytosolic Ribosome—
The cytosolic ribosome represents a well characterized system for the study of protein phosphorylation. The most widely studied components, RPS6 and the acidic P-proteins (P0–3), actively modulate ribosomal function through their phosphorylation and are involved in protein synthesis (50, 51).
The RPS6 protein is a part of the small head region of the 40 S subunit, which is a central location for interactions with mRNA, tRNA, and factors that drive initiation of translation (52). The C terminus of RPS6 is sequentially phosphorylated on five serine residues in response to a variety of stimuli and developmental cues (53, 54). This C-terminal region is highly conserved across eukaryotic lineages, and in plants its multiphosphorylation sites have been well characterized in maize and Arabidopsis (26, 27). Initial characterization of this region in mammals has demonstrated an ambiguity in phosphorylation site determination (54). Previous analysis of this region in Arabidopsis has only utilized MALDI-TOF MS analysis, limiting the determination of the specific site(s) of phosphorylation (19). Confirmation of single phosphorylation sites for the two Arabidopsis isoforms of RPS6 were confirmed for the first time here using MS/MS interrogation at Ser-240 for RPS6A and RPS6B.
The acidic P-proteins form a multimeric complex that comprises the lateral stalk of the 60 S ribosomal subunit (55). This complex is highly conserved across eukaryotes and is directly involved in translational regulation through interactions with associated factors. In addition to P-proteins P0, P1, and P2, plants possess an additional P1/P2 type protein termed P3 (56). The P-proteins were initially identified as phosphoproteins in the yeast ribosome (57), and their phosphorylation states likely play a direct role in protein synthesis and translational responses to external stimuli (26, 58). Phosphorylation sites in the yeast P-proteins occur in the C-terminal region of the proteins (58), a feature that has now been confirmed for all members of the Arabidopsis family of ribosomal P-proteins. Although recent technical reports using phosphoproteomics technologies have reported the presence of several of these phosphopeptides from Arabidopsis (RPP1A and RPP2B (59) and RPP0B (60)), this study has demonstrated that the entire acidic P-protein family in Arabidopsis (P0, P1, P2, and P3) all contain phosphorylated C-terminal regions.
Conclusions—
The complexity of the plant cytosolic ribosome in terms of its size, number of protein constituents, and the evolutionary history, divergence, and multiplication of the genes encoding its subunits makes it a major challenge for biochemical analysis. Within this complexity are undoubtedly key elements in understanding ribosome structure-function relationships and the regulation of translation in plants that remain undiscovered. Analysis of peptides derived from proteolysis of ribosomes provides both protein composition and post-translational modification data. Although the biological roles of most of these covalent modifications are presently unknown, the specific knowledge generated by this proteomics characterization study will enable the immediate design of targeted genetics experiments to explore the roles of these covalent modifications and the mechanisms through which they occur.
| FOOTNOTES |
|---|
Published, MCP Papers in Press, October 13, 2007, DOI 10.1074/mcp.M700052-MCP200
1 The abbreviations used are: r-protein, ribosomal protein; 2D, two-dimensional; 1D, one-dimensional; AGI, Arabidopsis Gene Index; EST, expressed sequence tag; RP, ribosomal protein; TAIR, The Arabidopsis Information Resource; ddH2O, double deionized water (MilliQ-purified); MGF, Mascot Generic Format; PHP, hypertext preprocessor. ![]()
* This work was supported in part by grants from the Australian Research Council (ARC) through the Centres of Excellence Program and the Discovery Program. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
Supported by a Ph.D. scholarship from the Grains Research and Development Corp. ![]()
An ARC Australian postdoctoral fellow. ![]()
¶ Supported by an Australian postgraduate award scholarship. ![]()
|| An ARC Australian professorial fellow. To whom correspondence should be addressed. Tel.: 61-8-6488-7245; Fax: 61-8-6488-4401; E-mail: hmillar{at}cyllene.uwa.edu.au
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. J. Webb, A. Laganowsky, J. P. Whitelegge, and S. G. Clarke Identification of Two SET Domain Proteins Required for Methylation of Lysine Residues in Yeast Ribosomal Protein Rpl42ab J. Biol. Chem., December 19, 2008; 283(51): 35561 - 35568. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhao, W. Zhang, B. A. Stanley, and S. M. Assmann Functional Proteomics of Arabidopsis thaliana Guard Cells Uncovers New Stomatal Signaling Pathways PLANT CELL, December 1, 2008; 20(12): 3210 - 3226. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. F. Degenhardt and P. C. Bonham-Smith Arabidopsis Ribosomal Proteins RPL23aA and RPL23aB Are Differentially Targeted to the Nucleolus and Are Disparately Required for Normal Development Plant Physiology, May 1, 2008; 147(1): 128 - 142. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Pinon, J. P. Etchells, P. Rossignol, S. A. Collier, J. M. Arroyo, R. A. Martienssen, and M. E. Byrne Three PIGGYBACK genes that specifically influence leaf patterning encode ribosomal proteins Development, April 1, 2008; 135(7): 1315 - 1324. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |