|
|
||||||||
,
,¶
,**



,

From the
Department of Plant Biology and the || Computational Biology Service Unit, Cornell Theory Center, Cornell University, Ithaca, New York 14853
| ABSTRACT |
|---|
|
|
|---|
In recent years, proteomic studies together with many detailed "one-protein-at-a-time" studies are collectively beginning to provide a good insight into the chloroplast proteome. The thylakoid and envelope proteomes of chloroplasts from Arabidopsis thaliana have been analyzed in a number of studies, which were reviewed recently (69). No in-depth analysis of the highly purified stromal proteome of A. thaliana has been carried out to date but is urgently needed to complete the overview of the chloroplast proteome from A. thaliana rosette leaves. Manual data mining of the A. thaliana chloroplast literature, such as is done at The Arabidopsis Information Resource (TAIR)1 (www.arabidopsis.org/) and for the Plastid Proteome Database (PPDB; ppdb.tc.cornell.edu/) (10), will further help to provide an accurate overview of the chloroplast proteome.
Several chloroplast-localized biosynthetic pathways are linked to each other with intermediates from one pathway being used in other pathways. In some cases, biosynthetic pathways branch into two different pathways such as heme and chlorophyll biosynthesis at the level of protoporphyrin IX (11). In other cases, several enzymes are shared by different pathways, such as enzymes in the Calvin cycle and oxidative pentose phosphate pathway (OPPP) (12) or in the Calvin cycle and glycolysis (13). It has been demonstrated for several enzymes that specific protein isoforms or functional paralogues specialize in different functions or pathways often located in different subcellular localizations (e.g. cytosol versus chloroplast) or tissues (e.g. root versus shoot) (see e.g. Ref. 14). Thus to understand the regulation of metabolic activity it is important to distinguish between such functional paralogues. Tandem mass spectrometry with high mass accuracy will typically allow distinguishing between such closely related paralogues.
Assembly and disassembly of multisubunit complexes as well as suborganellar localization of enzymes (e.g. thylakoid membrane versus envelope membrane) has been shown to influence flux of different pathways. In some cases, this can lead to so-called "metabolic channeling" (15). Protein-protein interactions can also help to stabilize proteins and to protect against denaturation and proteolysis. These interactions can be homomeric (between identical proteins) or heteromeric (between different proteins). Identifying these protein interactions is needed to truly understand plastid protein functions and plastid metabolic pathways. Given the complexity of the stromal proteome, only a small number of stromal protein complexes in A. thaliana have been characterized. Examples are the heteromeric Clp protease complex of 325350 kDa with 11 different proteins (16), the 200240-kDa heterotetrameric ADP-glucose pyrophosphorylase (or glucose-1-phosphate adenylyltransferase) (17), the stromal signal recognition particle (18), and the
150-kDa heterotetrameric tryptophan synthase (19). Extensive searches of the published literature (this study) did identify a significant number of plastid complexes from a large variety of other plant species than Arabidopsis (spinach, pea, Brassica rapa, potato, barley, etc.), such as the 30 and 50 S ribosomal subunits from spinach chloroplasts (20, 21), plastid pyruvate dehydrogenase (22), a plastid hetero-oligomer of acetyl-CoA carboxylase (ACCase) of 600800 kDa in soybean (23), and homo-octameric porphobilinogen synthase from pea plastids (24). Currently there is no centralized data deposit of these protein-protein interactions in A. thaliana or other plant species.
Another important aspect of understanding chloroplast function is to determine protein expression levels and molar ratios between different chloroplast proteins. Currently there is little knowledge of the relative accumulation levels of stromal proteins even for the best studied chloroplast pathways. Quantification of molar ratios between proteins in complex proteomes is generally difficult and has not been attempted at any large scale. As will be demonstrated here, we found that native gels, such as colorless native (CN)-PAGE (or blue native (BN)-PAGE) followed by SDS-PAGE, currently provide the most convenient semiquantitative comparison of different protein species.
In this study, we set out to (i) experimentally identify the Arabidopsis stromal proteome with emphasis on distinguishing between paralogues, (ii) determine the approximate and relative accumulation levels of stromal proteins, (iii) determine the native masses of stromal proteins and, where possible, resolve protein interactions, (iv) collect information on plastid protein-protein interactions from A. thaliana or other plant species, and (v) expand the PPDB as a plastid proteome resource for the plant community.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Image Analysis and Quantification
Coomassie-stained gels were scanned with a desktop scanner or high resolution scanner (Amersham Biosciences), and SYPRO Ruby-stained gels were scanned using a charge-coupled device camera (FluorS, Bio-Rad). Spot volumes were determined, corrected for background, and normalized to total spot volume using Phoretix software version 2004 (Non-linear, Newcastle, UK). We always verified the correlation between predicted processed molecular mass and experimentally observed mass. In a limited number of cases, more than one protein was identified per gel spot. We first verified whether the identities in a spot could be explained by any background signals (from streaking) from abundant proteins in this gel area. Such identities from background were not quantified. If it appeared that the identities of the proteins in a spot were truly the results of two co-migrating proteins and if the MOWSE scores were in a similar range (within 3-fold difference), then each protein was assigned half the spot volume. If spots contained more than one protein with very different MOWSE scores (at least a factor of 3), we removed the protein identification based on the lowest MOWSE score. This was a reasonable strategy because co-migrating proteins were typically of similar molecular mass. In case a protein was identified more than once, we summed all corrected spot volumes for that accession. To facilitate comparison of abundance of different proteins, the spot volume(s) for each accession was divided by the denatured molecular mass for each accession. This resulted in a rough approximation of relative concentration. We do point out that protein abundance was calculated from SYPRO Ruby-stained spot intensity. Because SYPRO Ruby binds preferentially to charged residues (lysine, arginine, and histidine) protein abundance is underestimated or overestimated if the proteins contain few or many charged residues, respectively.
Protein Identification by Mass Spectrometry, (Un)ambiguous Identification, and Bioinformatics
Stained protein spots were excised, washed, and digested with modified trypsin, and peptides were extracted automatically (Progest, Genomic Solutions, Ann Arbor, MI). Proteins were identified by peptide mass fingerprinting using a MALDI-TOF mass spectrometer (Voyager DE-STR, Applied Biosystems) and/or by tandem mass spectrometry using a capillary LC-ESI-MS/MS (Q-TOF, Waters) as described in Ref. 25. MS or MS/MS spectra were used to search the predicted A. thaliana proteome downloaded from the TAIR database using an in-house installation of Mascot (www.matrixscience.com). Criteria for positive identification from peptide mass fingerprinting and from MS/MS data are described in Ref. 25. In the analysis of the MS data, an effort was made to distinguish between members of the same gene family or otherwise related gene products. In some cases, peptides were identified that match to more than one protein. Uniquely matching peptides (diagnostic peptides) are then needed to determine which protein is expressed. In case the mass spectrometry data did match ambiguously to more than one protein, these protein identifications are automatically linked within the PPDB database and reported in the tables.
Plastid Proteome Database and New Interface
The construction of the PPDB (ppdb.tc.cornell.edu/) was originally described in Ref. 25. The PPDB interface was improved, and search functions were expanded since its inception in 2004. Also more detailed curated information can now be accessed. The annotated CN-PAGE gel image and associated experimental, predicted, and other data presented in this study can be accessed via PPDB. PPDB also contains the theoretical analysis of all Arabidopsis entries (currently release 5.0 of ATH1.pep with 29,161 nuclear encoded Arabidopsis proteins as well as the mitochondrial and plastid genomes). Mascot scores, number of matching peptides, and highest peptide score for each identification as well as functional classification are listed.
Miscellaneous
Chlorophyll concentrations were determined spectrometrically in 80% acetone (26), and protein determinations were done with the bicinchoninic acid assay (27).
| RESULTS |
|---|
|
|
|---|
|
76% of the total stroma mass as determined from the relative spot intensities. Enzymes involved in starch synthesis and degradation (Bin 2) were also well represented. The function of 11% of the identified proteins was unknown (see the supplemental table and PPDB).
|
Relation to Other Chloroplast Proteomic Studies
About 40% (over 100 proteins) were not observed in earlier A. thaliana chloroplast proteomic studies on the thylakoid and envelope membrane and their associated (stromal) proteins (25, 3946) (for cross-correlation, see the supplemental table and PPDB). Predicted functions of these newly identified proteins include biosynthesis of amino acids, nucleotides, and proteins as well as secondary metabolism (e.g. tetrapyrrole, thiamine, isoprenoids, and hormones), and numerous proteins were without any predicted function.
A significant number of proteins were listed in the chloroplast analysis described in Ref. 47. However, this dataset is problematic because it appears to contain a large percentage (>40%) of non-chloroplast proteins as also reflected in the low percentage of chloroplast predicted proteins. This is either a consequence of experimental contaminants and/or the result of high rates of "false identifications" during the search with mass spectrometry data (for discussion see Ref. 6).
Relative Abundance of the Identified Proteome
An important aspect of understanding plastid function is to determine the expression level and molar ratios between different plastid proteins in addition to their identification. Currently there is very little knowledge of the relative accumulation levels of the proteins in the chloroplast stroma. To this end, we quantified all 287 protein spots from analytical SYPRO Ruby-stained CN-PAGE gels of two independent chloroplast purifications and normalized those to total spot intensity (spot volume) of each gel. Proteins were identified in 251 spots, representing
99% of the total spot intensity (spot volume). As is apparent from the gel images in Fig. 1, just a handful of protein spots represent a large percentage of the total protein biomass of the stroma. The 23 most abundant stromal proteins (based on normalized spot volume(s)) or based on "relative normalized concentration" (calculated from normalized spot volume divided by the experimental molecular mass of the protein) are listed in Table II. Together they represent
85% of the total stromal protein mass, and their expression covers a dynamic range of 2 orders of magnitude. These proteins were also prominent from independent "shotgun" nano-LC-ESI-MS/MS analysis of the stroma.3 We will discuss these relative abundances further below, in connection with protein interactions and protein function. We do point out, however, that protein abundance was calculated from SYPRO Ruby-stained spot intensity. Because SYPRO Ruby binds preferentially to charged residues (lysine, arginine, and histidine) protein abundance is underestimated or overestimated if the proteins contain few or many charged residues, respectively.
|
20 to
950 kDa with ribosomes and other large protein complexes accumulating in the stacking gel in the first (native) dimension.
To compare these native mass data with existing data, we extensively searched the literature for mass information on closely related and more distant orthologues in and outside of plastids. Mass information from various species was found for 140 proteins of the 241 identified on our CN-PAGE gels, together representing 82 different monomeric proteins or protein complexes (supplemental table). For
70% of those 140 identified proteins the native mass deduced from the CN-PAGE gels is in approximate agreement with data from the literature. The native mass and estimated oligomeric state and references for selected proteins are listed in Table III. The complete dataset is available in the supplemental table. Thus our comprehensive experimental native data provide a resource and starting point when searching for potential stromal protein partners. One of the biggest surprises when searching the literature was that about 60% of the proteins were identified as homo-oligomeric complexes or monomers rather than heteromeric complexes. The potential significance will be discussed.
|
Calvin Cycle, OPPP, Glycolysis, and Respiration (Bins 1.3, 1.4, and 1.7)
We experimentally identified 27 proteins associated with the Calvin cycle, glycolysis, OPPP, and photorespiration, representing about 76% of the total stromal protein mass (Table I).
The relative concentrations of these 27 proteins ranged from level 1 to 5 with the small and large subunits of Rubisco (RBCS and RBCL, respectively) at level 1, the other enzymes of the Calvin cycle mostly at levels 2 and 3, and specific enzymes of glycolysis and OPP at levels 35. We also identified 2-phosphoglycolate phosphatases 1 and 2 involved in photorespiration at level 3 as a dimer. The Rubisco complex needs to be activated by the reversible carbamylation of a lysine residue in RBLC (Lys-201) followed by rapid binding of magnesium. This process is regulated by Rubisco activase (RCA) (48). The relative concentration of RCA (At1g73110) was about 70 times less than RBCL and RBCS. Rubisco N-methyltransferase catalyzing N-methylation of RBCL (needed for an active enzyme) was
14.000 times less abundant than Rubisco.
Many of the enzymes in these three pathways form hetero- or homo-oligomeric complexes. The 550-kDa Rubisco complex is the most abundant and well studied example and forms a hetero-oligomer with eight small and eight large subunits. RCA was found in different native mass complexes (330, 480, 600, and 770 kDa and as a >950-kDa complex in the stacking gel) and is known to transiently associate with the Rubisco complex (for a review, see Ref. 49). Sedoheptulose/fructose-bisphosphate aldolase 1 and 2 (SFBA-1 or -2) (At2g21330 and At4g38970) were both found at 178 kDa, suggesting a homo- or heterotetrameric state. These aldolases are reported to form stromal homotetramers, whereas aldolases outside the chloroplast form dimers (50, 51).
Minor and Major Carbohydrate Metabolism (Bins 2 and 3)
Starch and minor carbohydrate metabolism contribute to the carbohydrate storage and sugar diversity in plastids. We identified 12 proteins in this functional category with relative expression levels ranging from levels 3 to 5. These included two ADP-dependent pfkB carbohydrate kinases and SexI involved in starch phosphorylation control (52, 53). Glucan phosphorylase-1 (At3g29320) (
110 kDa), involved in phosphorolytic starch breakdown, forms a homodimer or heterodimer (54, 55). We identified it on the stroma native gels at 233 kDa, corresponding to a dimer.
isoamylase, a starch debranching enzyme (At4g09020) of the hydrolytic or amylolytic pathway, was found in the stroma as a monomer. Aldose 1-epimerase (At5g66530) catalyzes the interconversion of the
- and ß-anomers of hexose sugars such as glucose and galactose. Aldose 1-epimerase (35 kDa) acts as a dimer in Aspergillus niger (56) and was indeed found on the CN-PAGE gel at 61 kDa.
Tricarboxylic Acid Cycle and Organic Transformation (Bin 8)
This functional class was represented by just five proteins in agreement with the fact that these are mostly mitochondrial functions (tricarboxylic acid cycle) or present in different subcellular compartments (carbonic anhydrases). We identified two types of malate dehydrogenases (MDHs), differing in the choice of cofactor (NADP or NAD) and activation mechanism. Chloroplast NADP-MDH (At5g58330) catalyzes the reduction of oxaloacetate into L-malate and is involved in the export of reducing power from the chloroplast to the cytosol (the malate valve). We identified it predominantly at 323 kDa and to a lesser extent at 154 kDa. Chloroplast NADP-MDH (At3g47520; 37 kDa) was suggested to form a homodimer (57), and we identified it at 96 kDa. The accumulation level of the NADPH form was higher than the NAD form in agreement with enzyme activity assays on purified Arabidopsis chloroplasts (58). The Arabidopsis genome encodes some 15 different carbonic anhydrases. We identified an abundant ("at expression level 2") carbonic anhydrase (CA1; At3g01500) at multiple native masses between
100 and 360 kDa in agreement with observations of homo-, octa-, and decameric states (59, 60). A second paralogue, CA2 (At5g14740), was identified ambiguously with CA1 at low levels (level 5).
Nitrogen and Sulfur Assimilation (Bins 12 and 14)
Plastids play a vital functional role in sulfur and nitrogen assimilation with both elements used in amino acid biosynthesis. Chloroplasts import nitrite, which is converted by Fd-dependent nitrite reductase (NiR) into (toxic) ammonia followed by the glutamate- and ATP-dependent conversion into glutamine by glutamine synthase 2 (GS2) and subsequent conversion into glutamate by Fd- and
-ketoglutarate-dependent glutamate synthase (Glu1 or Fd-GOGAT-1). We identified and quantified each of these three key enzymes (At2g15620, At5g35630, and At5g04140) as a monomer (NiR) and at different native masses (GS2 and GOGAT). GS2 appears to have a high relative concentration in particular as compared with NiR. This is logical because 90% of the glutamine synthesized in leaf chloroplasts is derived from photorespiration rather than by chloroplast import of nitrate and subsequent reduction to nitrite (for discussion, see Ref. 61).
ATP sulfurylase (ATPS1) catalyzes the formation of adenosine 5-phosphosulfate from inorganic sulfate and ATP. Four paralogues (ATPS14) are predicted to localize to non-green and/or green plastids in A. thaliana. Cytosolic and chloroplast isoforms were purified from spinach leaves, and their native mass was about 170 kDa as determined by gel filtration subsequent to other fractionation steps (62). We detected ATPS1 (At3g22890;
49 kDa) at low levels (level 5) in the stroma with a native mass between 129 and 147 kDa.
Amino Acid Synthesis and Degradation (Bin 13)
We identified 17 proteins involved in amino acid metabolism, representing about 0.8% of the stromal protein mass. They were typically expressed at levels 3 and 4. Literature searches for their oligomeric state suggest that most of these accumulate as dimers and trimers (between
100 and 150 kDa) in different plant species, and indeed these 17 proteins were found in this mass range (supplemental table). As an example, we mention the carbamoylphosphate synthetase large (At1g29900; 130 kDa) and small (At3g27740; 45 kDa) subunits. In Aquifex aeolicus and in Escherichia coli, they were reported to form a heterodimer of 171 kDa (63). In appears that this oligomeric state is conserved in A. thaliana chloroplasts because we found both subunits with a native mass between 162 and 173 kDa, corresponding to a heterodimer.
Synthesis of Lipids (Bin 11), Hormones (Bin 16), Isoprenoids (Bin 17), and Cofactors and Vitamins (Bin 18)
We identified six proteins involved in lipid/fatty acid biosynthesis, all of them at low levels (mostly level 5), clearly less abundant than proteins involved in e.g. amino acid biosynthesis. These include acetyl-CoA synthetase (acetate-CoA ligase) generating acetyl-CoA from acetate (typically produced from glycolysis in mitochondria), ACCase, and one of the three ketoacyl-ACP synthases, KAS1, as well as two desaturases, stearoyl-ACP desaturases 1 and 2. KAS1 (At5g46290; 50 kDa) is an essential enzyme involved in the construction of unsaturated fatty acid carbon skeletons, and we identified it at 130 kDa. ACCase, catalyzing the first committed reaction of de novo fatty acid biosynthesis, forms a heterotetrameric enzyme with plastid-coded subunit ß-carboxyltransferase, biotin carboxy carrier, biotin carboxylase, and
-carboxyltransferase. We did not identify these additional subunits most likely because they are primarily associated with the inner envelope membrane (64).
We identified two proteins in plastid isoprenoid biosynthesis from the non-mevalonic acid or methyl-D-erythritol 4-phosphate pathway. These were 4-diphosphocytidyl-2C-methyl-D-erythritol synthase (ISPD or CMS) and GcpE/IspG/Clb4 immediately downstream in the pathway. Both accumulated at "level 4." GcpE/IspG was only recently localized to plastids (65). ISPD (At2g02500;
30 kDa) was identified at 68 kDa on the CN-PAGE gel, probably forming a dimer.
Plant hormones such as jasmonic acid, gibberellic acid, and abscisic acid and products of the terpenoid pathway are, in part, synthesized in plastids. We identified lipoxygenase LOX2 (At3g45140; 102 kDa) involved in jasmonic acid synthesis with a predominant native mass of 116 kDa, corresponding to a monomer. It is surprisingly abundant, here quantified with a relative concentration of level 3. We also identified it with high Mascot scores in an earlier study as a thylakoid-associated protein (43). It was shown that chloroplast-localized LOX2 is required for the wound-induced synthesis of the plant growth regulator jasmonic acid in leaves (66).
Cofactors and vitamins are synthesized in plastids. We identified two plastid enzymes involved in thiamine/vitamin B1 biosynthesis, Thi1 (level 3) and ThiC (level 4), both accumulating in complexes of 200 and 157 kDa, respectively (supplemental table). To our knowledge, ThiC has not been identified earlier in chloroplasts. Interestingly we found that Thi1 was heavily oxidized as evidenced by high levels of methionine oxidation in the mass spectrometer. This high level of methionine oxidation was clearly specific for Thi1. It should be noted that Thi1 is dually targeted to both plastids and mitochondria, using two different translation initiation sites (67).
We also identified 6,7-dimethyl-8-ribityllumazine (DMRL) synthase (At2g44050) involved in vitamin B2 synthesis (riboflavin). It was shown for the isoforms in E. coli and spinach that DMRL synthase is a 60-mer forming an icosaeder of 12 pentamers. In E. coli, each subunit is about 1317 kDa, and the complex migrated around 850 kDa (6870). We found Arabidopsis lumazine synthase at 738 kDa, corresponding quite well with observations for the bacterial orthologue.
Tetrapyrrole Synthesis and Degradation (Bin 19)
We identified nine proteins involved in tetrapyrrole synthesis (at "level 3") and interestingly also the red chlorophyll catabolite reductase (also named "accelerated cell death 2" or ACD2) involved in chlorophyll degradation ("level 4"). Together they totaled about 0.5% of the stromal protein mass. Porphobilinogen synthase (At1g69740;
45 kDa) and coproporphyrinogen III oxidase (At103475;
40 kDa) were reported to form a homo-octamer (24) and a homodimer (71). We identified each on the CN-PAGE gel at
340 and 82 kDa, which corresponds to these reported oligomeric states. The A and B subunits of glutamyl-tRNA amidotransferase (At3g25660, 52 kDa; and At1g48520, 60 kDa) were identified at 190199 kDa on the CN-PAGE gel, strongly suggesting heterotetrameric interactions. An
2 homodimer of 120 kDa has been identified in Chlamydomonas reinhardtii (72).
Stress Responses (Bin 20) and Redox Regulation (Bin 21)
Many enzymes in plastids are activated and deactivated through oxidation/reduction reactions via the thioredoxin system. The thioredoxin family consists of nine proteins grouped in four clusters (m1,2,3,4; x; y1,2; and f1,2) (7375). We identified five thioredoxins on the CN-PAGE gels; TrXm1 was found at around 90 kDa, TrXm2 and TrXx were found at 154 kDa, and TrXf1 was found at both 154 and 212 kDa. Most likely they associate transiently with one or more different enzymes, but given the proteome complexity and the resolution of the native gels, we could not identify their respective partners. This is not surprising because recent affinity studies using modified thioredoxins as bait have shown that the thioredoxins interact with 50 or more chloroplast stromal proteins covering a wide range of functions (76).
We identified 14 proteins involved in oxidative stress responses. Some of them were quite abundant, such as peroxiredoxins A and B at level 2. Peroxiredoxin IIE (At3g52960) was identified at a native mass of 84 kDa, whereas the abundant peroxiredoxins A/B were found at multiple native masses in agreement with reports from the literature (7779). We also identified several members of the ascorbate and glutathione defense systems, such as monodehydroascorbate reductase and dehydroascorbate reductase-2, involved in recycling oxidized ascorbate, and glutathione reductase. The native data suggest that they might interact with each other in an 80100-kDa complex, corresponding to a heterodimer.
Nucleotide Synthesis and Degradation (Bin 23)
Plastids are a major site for pyrimidine and purine nucleotide synthesis, and indeed we identified 12 enzymes in these pathways (all at levels 3 and 4), corresponding to about 0.5% of stromal protein mass. Two of them are shared with amino acid biosynthesis. NDPK2 (At5g63310) is a nucleotide-diphosphate kinase involved in nucleotide metabolism (transfer of phosphate from NTP to NDP). NPDK2 (
20 kDa) was reported to form a homohexamer (80, 81) in agreement with an observed native mass of 133 kDa on the CN-PAGE gels. NDPK2 was predicted by TargetP and Predotar to be plastid-localized and was purified from spinach chloroplasts (81). Curiously NDPK2 is proposed to be a signal transducer in phytochrome-mediated light signaling, co-localizing with phytochrome in the nucleus (80, 82). In light of the purification from spinach chloroplasts and its significant accumulation level in A. thaliana chloroplasts in this study, it seems less likely that NDPK2 it is localized in the nucleus.
RNA (Bin 27) and Protein Synthesis and Protein Fate (Bin 29)
We were quite surprised to identify so many proteins (67) assigned to Bins 27 and 29, together representing about 7% of the stromal mass. The relative concentration among these proteins spanned 34 orders of magnitude with ROC4 (At3g62030; 20 kDa) as the most abundant protein. ROC4 is an abundant stromal peptidylprolyl isomerase with demonstrated in vitro rotamase activity, but its role is unclear (83). ROC4 was found predominantly around 110 kDa and may form a hexamer.
Spinach plastid 70 S ribosomes are composed of more than 60 proteins and have a native mass of around 2 MDa (20, 21). The stromal 70 S ribosomes migrated just into the CN-PAGE gel with low amounts of other large complexes (Fig. 1, A and B). We analyzed 21 protein spots in this gel area, but the analysis was not exhaustive. Nevertheless we identified some 11 30 S subunits, 10 50 S subunits, and a plastid-specific ribosome-associated protein (PSRP2). Ribosome-associated proteins RAP41 (At3g63140) and RAP38 (At1g09340), originally identified in C. reinhardtii ribosomes (84), were each found at three different locations of the stromal CN-PAGE native gels: (i) in a complex larger than 950 kDa most likely associated with 70 S ribosomes, (ii) at 224 kDa, and (iii) at 106126 kDa. At 224 kDa, the only obvious potential partners are ribosomal protein L5 (At4g01310) and ribosomal protein L31 (At1g75350) (Fig. 2A). At 106126 kDa, no obvious partners were found, suggesting the possibility that RAP38 and RAP41 form a heterotrimer.
|
We identified several proteins involved in mRNA binding and processing/degradation. Polyribonucleotide nucleotidyltransferase (At3g03710;
95 kDa) acts as a 3'5' exoribonuclease. We identified it in a stromal complex of 410 kDa. This protein shows sequence homology to the polynucleotide phosphorylase (PNPase) of the E. coli degradosome. It possibly acts as a homotetramer (8890) and not as a hetero-oligomer as shown in other non-plant eukaryotes. We also identified At5g48960 (
65 kDa) encoding for a putative 5'-nucleotidase with unknown function; it was detected at 166 kDa in the stroma. At5g26742 encodes a putative DEAD box RNA helicase (RH3) and was identified on the CN-PAGE stromal gel in a molecular mass complex of >950 kDa. A plastid-localized RH3 was identified in tobacco, and a loss of function mutant resulted in variegated leaves and abnormal roots and flowers (91). In E. coli, a DEAD-RNA helicase is part of the "degradosome" along with the PNPase, the endoribonuclease RNase E, and the glycolytic enzyme enolase (92). RH3 in chloroplasts does not seem to be associated with the PNPase or At5g48960 mentioned above.
Chaperones cpHSP70-1 (At4g24280) and cpHSP70-2 (At5g49910) are most likely in a complex of
200 kDa with GrpE1 (At5g17710) and GrpE2 (At1g36390) (Fig. 2B). In addition, GrpE1 and GrpE2 seem to form a complex at 150 kDa without cpHSP70, but a potential pitfall is that transketolase is a major spot possibly masking cpHSP70. Indeed when analyzing the CN-PAGE gel of the peripheral thylakoid proteins, where transketolase is less abundant, cpHSP70 was detected close to transketolase at
150 kDa and most likely forms a second type of complex with GrpE12 (not shown). cpHSP70-1 and cpHSP70-2 (mostly the -1 isoform) also form complex at 123 kDa, but we did not detect any GrpE in this native mass range. GrpE and cpHSP70 have been shown to form complexes of
120 and 230 kDa in chloroplasts of C. reinhardtii, and GrpE also forms homo-oligomers (93).
We identified a well resolved CPN60 complex at 806 kDa with
(At2g28000) and ß subunits (At5g56500, At3g13470, and At1g55490) in an approximate 1:1 ratio. CPN10 (At2g44650) or CPN20 (At5g20720) were not part of this 800-kDa complex but were found in complexes of 150170 kDa. A trigger factor-like protein (At5g55220;
60 kDa) was identified most likely as a dimer. Trigger factor in the cytosol of E. coli prevents misfolding and aggregation of nascent proteins emerging from the 70 S ribosome (94). The structure of "free" trigger factor (without ribosomes) of Vibrio cholerae was determined at 2.5-Å resolution as a dimer (95).
The ClpP/R core protease complex was detected at 325350 kDa. It is a hexadecamer composed of two rings of seven subunits with ClpP1,3,4,5,6 and ClpR1,2,3,4 in one or more copies associated with two chaperone-like proteins named ClpS1 and -S2 (16). Oligopeptidase A in the M3 peptidase family (At5g65620; 88 kDa) and a zinc-metalloprotease of the M16 family (At3g19170; 120 kDa) both have unknown function and accumulated as monomers.
Miscellaneous Function and "Unknown" Proteins (Bins 24, 26, 28, and 35)
A significant number of the identified A. thaliana proteins currently have no obvious function (supplemental table). Here we highlight just a few and comment on their native mass, abundance, and functions in non-plant species.
Denelactone hydrolase is a monomeric enzyme in bacteria involved in the degradation of aromatic compounds such as chlorocatechol and has been crystallized from Pseudomonas (96). Nothing is known about its substrate in plants. We found a dienelactone hydrolase at 27.5 kDa (At1g35420;
30 kDa) as monomer. At1g21440 and At1g77060 are
33-kDa paralogues and encode for a putative plastid-localized mutase likely involved in the formation of CP bonds. In Streptomyces hygroscopicus it is involved in the biosynthesis of phosphonates (97, 98). We identified both proteins in the same spot at a native mass of 119 kDa; it is quite possible that they form trimers. At1g77060 was also identified at 147 kDa. The biosynthesis of phosphonate has not been investigated in plants, and the only report found in the literature concerns the accumulation of synthase mRNA in senescing carnation (Dianthus caryophyllus) flower petals in response to ethylene (99).
The DAG gene (At1g11430) encodes a 20-kDa protein with no obvious functional domains involved in the development of chloroplasts (100). We identified the DAG protein in a high molecular mass complex of >950 kDa. The protein THF1 (At2g20890) involved in thylakoid biogenesis was shown to be present in chloroplasts as well as in plastid-associated stromules (101). We originally identified the 25-kDa THF1 protein under denaturing conditions in the stripped thylakoid proteome (43) and in this study as a 270-kDa complex in the stroma. Interacting protein partners are unknown.
| DISCUSSION |
|---|
|
|
|---|
Protein Identification and Quantification: Paralogue Specialization, Novel Chloroplast Functions, and Investments
In this study we identified 241 proteins from the isolated stroma, representing about 99% of the stromal protein mass. Very few non-plastid contaminants are present in the dataset indicative of the purity of the experimental sample and the quality of mass spectrometry-based identification. Indeed 88% of the proteome was predicted by TargetP to be plastid-localized. This also shows that most chloroplast proteins have typical N-terminal transit peptides. This stromal dataset complements the existing chloroplast membrane proteome datasets (25, 3946): about 40% of the identified stromal proteins were not present in these published studies.
The analysis covers most known chloroplast functions, ranging from protein biogenesis and protein fate to primary and secondary metabolism. A number of new components were identified that have not yet been described for chloroplasts; examples are elongation factor typA/bipA, peptidases of the M1 and M3 families, and a homologue of E. coli trigger factor, an isomerase involved in biogenesis of nascent chains (102). At least 25 proteins without any obvious function were also identified, and their native masses were determined. Importantly because false-positive paralogue identification was avoided, this dataset will help to assign individual gene family members to subcellular locations and tissues. This will then allow coupling of these protein accumulation data to the available A. thaliana microarray data sets (in public databases) for specific tissues and conditions.
Quantification of molar ratios between different proteins in complex proteomes is generally difficult and has not been attempted at any large scale. Quantitative techniques to determine changes in protein accumulation levels of the same protein in a complex mixture between samples are now quite powerful: excellent quantifications can be obtained using stable isotope labeling techniques introduced into proteins during growth of the organism (stable isotope labeling by amino acids in cell culture and 15N/14N labeling) or after purification of the proteome (cleavable ICAT and iTRAQ) (for reviews, see Refs. 103106). However, these techniques cannot be used to compare accumulation levels of different proteins within a sample and are thus not applicable to answer questions raised in the current study. A new approach has been introduced recently in which isotope-labeled peptides matching to individual proteins are added as internal standards into complex mixtures. This technique (assigned AQUA) is most appropriate for comparing expression levels of a small number of known proteins, and it is not suitable for quantification of unidentified proteins (107). Denaturing IEF using commercial IPG strips is not quantitative when comparing stoichiometries of different protein species because of underestimation of high molecular mass proteins and proteins with significant hydrophobicity or extreme pI. Here we show that native gels, although not perfect, currently provide the most convenient semiquantitative comparison of larger sets of protein species.
This study provides an overview of the steady state accumulation levels of stromal proteins and provides insight in the relative concentrations of different metabolic pathways. For instance, the data clearly suggest that components of protein synthesis, folding, and proteolysis are generally more abundant than enzymes involved in amino acid synthesis. Enzymes in fatty acid biosynthesis are general of lower abundance in the stroma than those of amino acid synthesis possibly because the first are typically associated with the inner chloroplast envelope. The Calvin cycle, OPPP, and glycolysis together represent some 75% of the total mass with protein synthesis, folding, and biogenesis representing about 6.5% and nitrogen and sulfur assimilation representing an additional 7.5%. This leaves less than 10% of the total stromal mass for all other stromal functions. The large investments in carbon metabolism and the Calvin cycle in particular reflect the inefficiency of Rubisco as well as the high metabolic flux of reduced carbohydrates. The large protein mass dedicated to protein synthesis (ribosomes, RNA-binding proteins, and elongation factors), protein folding, and proteolysis is likely a reflection of the abundance of the chloroplast-encoded large subunit of Rubisco and chloroplast-encoded proteins of the thylakoid photosynthetic apparatus. The abundance of the chaperones is likely also a reflection of the need for folding and assembly of the thousands of nuclear encoded proteins imported into the stroma. The identified proteins span at least an expression range of 5 orders of magnitude.
It is interesting to contrast the proteome of chloroplasts with those of non-green plastids. Non-green plastids exist in roots, non-green flowers, fruits, and seeds. Two shotgun proteomic studies were published concerning (non-green) wheat amyloplasts and tobacco (colorless) BY-2 cell culture plastids (108, 109). It is hard to compare these identified proteomes to the current chloroplast proteome analysis because these studies did not quantify protein accumulation but also because wheat, tobacco, and A. thaliana are quite distant in evolution with relatively low levels of sequence homology. Nevertheless the most striking difference between the BY-2 plastid proteome and the chloroplast proteome is that a significantly higher percentage of identified proteins is involved in amino acid metabolism in the BY-2 cells (25 versus 7%).
Determination of Native Mass and Deduction of Oligomeric State of Chloroplast Proteins
In this study we present a "snapshot" of the oligomeric chloroplast proteome of A. thaliana, isolated during the first half of the 10-h light period from complete rosettes of 6-week-old plants. The CN-PAGE gels were reproducible across different stromal preparations, suggesting that the native state of the stromal proteome visualized on the gel was relatively stable under the conditions used. We chose to use CN gels originally developed by Schägger et al. (28) rather than the more popular BN gels (110) because the high concentrations of Coomassie in the BN-PAGE sample buffer destabilized a significant fraction of the soluble complexes (not shown). Although BN-PAGE has been successful for separation of membrane complexes (111114), it is less suited for native mass determination of the soluble stroma. The effective native mass range on the CN-PAGE gels was about 10950 kDa. It should be noted that very little destabilization of complexes took place during separation in the first dimension as evidenced by the near absence of "streaking" in the first dimension. The most significant difference with studies reported in the literature is that the native stromal proteome was separated under low salt conditions while keeping the protein concentrations high until the point of entering the native gel. Most other studies analyzed "one protein (complex) at a time" and involved typically multiple chromatography steps (ion exchange and gel filtration) often combined with ammonium salt precipitations and sometime co-immunoprecipitations followed by N-terminal Edman sequencing or Western blots.
About 10% of the 241 proteins identified in this study were found in a monomeric state. More than 85% of the proteins were essentially (within a
10-kDa mass range) found at a single native mass. The remaining 15% were found at multiple native masses either because they were abundant and trace amounts of streaking were enough to identify the protein by the sensitive mass spectrometry measurements or because they likely interacted with different partners or formed different homo-oligomeric states. Examples of the latter explanation include thioredoxin f at 155 and 212 kDa, uridylyltransferase-related identified as a
30-kDa monomer and at 287 and 721 kDa, 3-ß-hydroxy-
5-steroid dehydrogenase as a monomer and at 112 kDa, and chloroplast NAD malate dehydrogenase at 323 and 154 kDa.
Searching the literature for native mass information for each of the identified stromal proteins or their homologues, we were surprised to find that
60% of the complexes reported in the literature are homomeric rather than heteromeric. These complexes are involved in a wide variety of metabolic activity without any particular bias for metabolic function. In a few cases, it appeared that these complexes are homomeric in chloroplasts but heteromeric in non-plant species. For example, aspartate transcarbamoylase in pea chloroplasts is a homotrimer, whereas in other eukaryotes it is part of a multifunctional enzyme containing glutamine-dependent carbamoylphosphate synthetase and dihydroorotase activities (115). It is not clear whether there is a more general trend for a higher degree of homo-oligomerization in plastids than elsewhere in the plant or in other non-plant species.
The prevalence of homomeric complexes over heteromeric complexes in the published literature is striking and warrants examination. Does this reflect a bias due to the methodology used (mostly non-denaturing chromatography involving salt) and proteins studied, or does this truly reflect the protein interaction network in plastids? If indeed homo-oligomers are prevailing over heteromers, what could be the benefit for plastid function? It might be easier to regulate protein complex concentrations if they consist of only one gene product. Maybe many interactions in vivo are in fact between different homo-oligomers forming larger hetero-oligomers, but these interactions are more dynamic and less stable than those between homomers themselves. Currently it is difficult to evaluate any of these explanations. Different types of protein-protein interaction studies are required as well as in vivo monitoring of protein-protein interactions using techniques such as fluorescence resonance energy transfer and bioluminescence resonance energy transfer (116118).
In recent years, several large scale protein interaction studies in yeast (119, 120) and E. coli (121) using tandem affinity purification (TAP) tags and mass spectrometry were carried out. These experiments were focused on identifying hetero-oligomers, whereas homo-oligomers were not recognized because the identification step involved only denaturing SDS-PAGE. No native masses of the tagged complex were measured in these studies. It would be interesting to determine how many of these TAP-tagged proteins in yeast and E. coli (for which no heteromeric interaction was found) in fact form homo-oligomeric structures.
Protein Stoichiometry, Native Mass, and Paralogue Identification of Enzymes in the Calvin Cycle: Data Integration and Interactive Figure
As an example as to how the experimental data, prediction data, and published literature can be integrated, we generated a diagram for the Calvin cycle, one of the most studied plant pathways (Fig. 3). The experimental data concerning the relative concentrations and native state of the Calvin cycle enzymes and the regulators CP12, Rubisco activase, and Rubisco methyltransferase were integrated in Fig. 3. Because this figure contains a fair amount of detail, we also incorporated the figure into the PPDB, allowing retrieval of associated information for each accession. All A. thaliana proteins and paralogues associated with the Calvin cycle were collected using information from two metabolic databases, Aracyc (www.arabidopsis.org/tools/aracyc/) and MapMan, as well as published literature. The presence of chloroplast transit peptides was predicted using TargetP. Accessions in purple have no predicted cTP, whereas those in green have predicted cTPs. Proteins identified on the CN-PAGE gel of the stromal proteome are underlined. Enzymes boxed in orange and in blue are shared with OPPP and glycolysis, respectively.
|
As is indicated in Fig. 3, all but three of the Calvin cycle enzymes are shared with glycolysis or with the OPPP. A number of paralogues for several of these enzymes show specificity to one pathway. Examples are GAPDH with GAPDH-Cp-1 (At1g16300) involved in glycolysis and GAPDH-A1,A2 and -B (At1g12900, At3g26650, and At1g42970) involved in the Calvin cycle (for a review, see Ref. 122).
Conclusions
This study provides a first semiquantitative overview of the oligomeric state of the stromal proteome of A. thaliana chloroplasts and relates this information to plastid functions. This collective dataset is a major step forward in the analysis of the stromal proteome. These experimental observations were compared with published literature on stromal chloroplast complexes. Data are accessible via the PPDB. The integration of literature, experimental paralogue identification, and protein quantification for the Calvin cycle highlights the value of the information accumulated in this study and the challenge for understanding metabolic pathways. To truly understand chloroplast metabolic function, these protein measurements need to be integrated with information of their post-translational modifications, binding of substrate molecules, and measurements of metabolic flux and metabolite levels as is eloquently discussed in a recent review (123). Systematic TAP tagging of chloroplast proteins and measurements of protein interactions under different conditions are needed to truly establish stromal protein interactions networks in particular for proteins of low abundance. New technologies are now in place to advance each of these different aspects.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, October 4, 2005, DOI 10.1074/mcp.M500180-MCP200
1 The abbreviations used are: TAIR, The Arabidopsis Information Resource; PPDB, Plastid Proteome Database; OPPP, oxidative pentose phosphate pathway; ACCase, acetyl-CoA carboxylase; CN, colorless native; BN, blue native; LOX, lipoxygenase; EF, elongation factor; GS2, glutamine synthase 2; SFBA, sedoheptulose/fructose-1,6-biphosphate aldolase; TPI, triose-phosphate isomerase; ROC4, rotamase CyP; CPN, chaperonin; CA, carbonic anhydrase; Rubisco, ribulose-1,5-bisphosphate carboxylase/oxygenase; RBCS, ribulose-1,5-bisphosphate carboxylase/oxygenase small subunit; RBCL, ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; RCA, ribulose-1,5-bisphosphate carboxylase/oxygenase activase; Fd, ferredoxin; Fd-GOGAT, ferredoxin-dependent glutamate synthase; Tricine, N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]glycine; MDH, malate dehydrogenase; NiR, nitrite reductase; ACP, acyl carrier protein; DMRL, 6,7-dimethyl-8-ribityllumazine; PNPase, polynucleotide phosphorylase; RH, RNA helicase; TAP, tandem affinity purification; RAP, ribosome-associated protein; CTP, chloroplast transit peptide. ![]()
2 Q. Sun and K. J. van Wijk, unpublished data. ![]()
3 G. Friso, A. Rudella, H. Rutschow, and K. J. van Wijk, unpublished data. ![]()
4 J.-B. Peltier, G. Friso, and K. J. van Wijk, unpublished data. ![]()
* This work was supported by the National Science Foundation, division of Molecular and Cellular Biochemistry (Grant 090942), the United States Department of Agriculture (Biochemistry) (Grant 2003-35100-13579), and New York State Office of Science, Technology, and Research. Large scale data collection at Cornell University was conducted using the resources of the Cornell Theory Center, which receives funding from Cornell University, New York State, federal agencies, foundations, and corporate partners. ![]()
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
Present address: Laboratoire de protéomique 2, Place Viala 34060, Montpellier cedex 1, France. ![]()
¶ Present address: The Research Inst. for Children, Childrens Hospital, 200 Henry Clay Ave., New Orleans, LA 70118. ![]()
** Present address: Thermo Electron Corp., 355 River Oaks Parkway, San Jose, CA 95134. ![]()

To whom correspondence should be addressed: Dept. of Plant Biology, Emerson Hall 3rd floor, Tower Rd., Cornell University, Ithaca, NY 14853. Tel.: 607-254-1211; Fax: 607-255-5407; E-mail: kv35{at}cornell.edu
| REFERENCES |
|---|
|
|
|---|