The Chloroplast Grana Proteome Defined by Intact Mass Measurements from Liquid Chromatography Mass Spectrometry*

Proteomics seeks to address the entire complement of protein gene products of an organism, but experimental analysis of such complex mixtures is biased against low abundance and membrane proteins. Electrospray-ionization mass spectrometry coupled with reverse-phase chromatography was used to separate and catalogue all detectable proteins in samples of photosystem II-enriched thylakoid membrane subdomains (grana) from pea and spinach. Around 90 intact mass tags were detected corresponding to approximately 40 gene products with variable post-translational covalent modifications. Provisional identity of 30 of these gene products was proposed based upon coincidence of measured mass with that calculated from genomic sequence. Analysis of isolated photosystem II complexes allowed detection and resolution of a minor population of D1 (PsbA) that was apparently palmitoylated and not detected in less purified preparations. Based upon observed +80-Da adducts, D1, D2 (PsbD), CP43 (PsbC), two Lhcbs, and PsbH were confirmed to be phosphorylated, and a new phosphoprotein was proposed to be the product of psbT. The appearance of a second +80-Da adduct on PsbH provides direct evidence for a second phosphorylation site on PsbH, complicating interpretation of its role in regulation of thylakoid membrane organization and function, including light-state transitions. Adducts of +32 Da, presumably arising from oxidative modification during illumination, were associated with more highly phosphorylated forms of PsbH implying a relationship between the two phenomena. Intact mass proteomics of organellar subfractions and more highly purified protein complexes provides increasingly detailed insights into functional genomics of photosynthetic membranes.

Proteomics seeks to address the entire complement of protein gene products of an organism, but experimental analysis of such complex mixtures is biased against low abundance and membrane proteins. Electrospray-ionization mass spectrometry coupled with reverse-phase chromatography was used to separate and catalogue all detectable proteins in samples of photosystem II-enriched thylakoid membrane subdomains (grana) from pea and spinach. Around 90 intact mass tags were detected corresponding to approximately 40 gene products with variable post-translational covalent modifications. Provisional identity of 30 of these gene products was proposed based upon coincidence of measured mass with that calculated from genomic sequence. Analysis of isolated photosystem II complexes allowed detection and resolution of a minor population of D1 (PsbA) that was apparently palmitoylated and not detected in less purified preparations. Based upon observed ؉80-Da adducts, D1, D2 (PsbD), CP43 (PsbC), two Lhcbs, and PsbH were confirmed to be phosphorylated, and a new phosphoprotein was proposed to be the product of psbT. The appearance of a second ؉80-Da adduct on PsbH provides direct evidence for a second phosphorylation site on PsbH, complicating interpretation of its role in regulation of thylakoid membrane organization and function, including light-state transitions. Adducts of ؉32 Da, presumably arising from oxidative modification during illumination, were associated with more highly phosphorylated forms of PsbH implying a relationship between the two phenomena. Intact mass proteomics of organellar subfractions and more highly purified protein complexes provides increasingly detailed insights into functional genomics of photosynthetic membranes.

Molecular & Cellular Proteomics 1:46 -59, 2002.
With many genomes completed and many more in the pipeline it is clear that the post-genomic era has arrived. Considerable attention is now being directed toward defining the function of individual gene products and the inter-relation-ships between them (functional genomics). Proteomics seeks to catalogue the full complement of the gene products of a cell and the effect of development, environment, and disease upon their expression. Mass spectrometry is driving proteomics, most commonly as a tool to identify proteins separated and visualized on two-dimensional gels. However, such strategies are insensitive to low abundance proteins (1,2), proteins that are not fully represented on two-dimensional gels (for example, some classes of intrinsic membrane proteins) and to subtle changes in covalent modifications that do not appreciably alter isoelectric point or electrophoretic mobility. To address some of these shortcomings, intact mass proteomics has been proposed (3)(4)(5)(6)(7)(8).
The ideal analysis of any protein includes a mass spectrum of the intact molecule to define the native covalent state and its heterogeneity (3,4). A versatile procedure has been developed for effective electrospray-ionization mass spectrometry (MS) 1 of intact intrinsic membrane proteins purified by reverse-phase chromatography in aqueous formic acid/isopropanol (3). An advantage of this technique is the ability to accurately measure proteins greater than 30 kDa, thereby allowing analysis of the majority of the gene products from any genome. The mass measurements of spinach PS II D1 (38,022 Da; 5 membrane-spanning ␣-helices (MSH)), D2 (39,419 Da; 5 MSH), and Halobacterium halobium bacteriorhodopsin holoprotein (27,052 Da; 7 MSH) were within 0.01% of calculated theoretical values, setting a benchmark standard for analysis of intrinsic membrane proteins (3). The high mass accuracy of this technique for larger hydrophobic proteins has been demonstrated by analysis of a His 6 -tagged Escherichia coli lactose permease (lacY; 47,357 Da; 12 MSH) (5, 7), His 6 -tagged Vibrio parahaemolyticus Na ϩ /galactose cotransporters (sglT; 60,676 and 90,544 Da; 14/15 MSH) (9), and the ␣-subunit of the rat Na ϩ /K ϩ -ATPase (atn2; 112,344 Da; 10 MSH). 2 Strict translation of a published gene sequence is usually not sufficient to match a mass to a gene, and genome sequence manipulation is required before agreement with measured masses is achieved. Varietal differences, DNA sequencing errors, post-transcriptional and post-translational modifications, as well as protein damage, must all be considered. Spectra frequently reveal lesser quantities of other molecular species that can usually be equated with covalently modified subpopulations of the dominant proteins (3,4).
Toward a complete description of the thylakoid proteome, the closely appressed membranes of the granal stacks were prepared using their known resistance to solubilization by Triton X-100, resulting in a subfraction highly enriched in the polypeptides of PS II. Using this simplified starting material it was possible, using LC-MS, to record the masses of all detectable polypeptides in PS II-enriched membrane preparations from spinach and pea. Proteins were identified provisionally based upon coincidence of measured mass with that calculated from sequences in the data base and their predicted elution from the HPLC column based upon calculated hydrophobicity. The heterogeneity of the larger PS II subunits was revealed illustrating the accuracy and resolution afforded by electrospray-ionization-MS. The observation of light-dependent double phosphorylation of PsbH demonstrates potential advantages and pitfalls of the approach for examining relative expression and steady-state native modifications. Analysis of subfractions of the proteome using this technique decreased the genomic coverage but allowed resolution of several, otherwise unrecognized, native modifications that may be functionally significant.

EXPERIMENTAL PROCEDURES
Leaves from 3-week-old greenhouse-grown pea (Pisum sativum cv. Alaska) plants, 6-month-old tobacco (Nicotiana tabacum cv. Samsung) plants, and spinach obtained from local market sources were used for preparation of PS II-enriched membranes (10). Samples (80 g of chlorophyll or 250 g of protein; pea, n ϭ 3; spinach, n ϭ 7; tobacco n ϭ 2) were prepared by acetone precipitation prior to dissolution in 60% HCOOH (diluted from 90% ACS grade; Fisher). Primary reverse-phase chromatography was performed as described previously (3). The poly(styrene-divinylbenzene) coploymer (Polymer Labs PLRP/S; 5 m ϫ 300 Å; 2.1 ϫ 150 mm) stationary-phase column was eluted at a flow rate of 100 l/min at 40°C. The primary gradient (Buffer A, 0.1% trifluoroacetic acid/water; Buffer B, 0.1% trifluoroacetic acid/acetonitrile) eluted extrinsic polypeptides and predominantly small to moderately sized intrinsic proteins. The column was equilibrated in 5% Buffer B followed by a stepped linear gradient from 5 to 25% Buffer B between 5 and 10 min after injection, 25 to 75% Buffer B between 10 and 130 min, and 75 to 100% Buffer B between 130 and 150 min.
The spinach D1 spectra (see Fig. 2A) was obtained by acetone precipitation of PS II reaction centers (100 g of protein, prepared according to Ref. 11) that were dissolved in 60% formic acid and subjected to reverse-phase chromatography with the gradient described above, except using Buffer A and Buffer C (0.05% trifluoroacetic acid in 1:1 acetonitrile/2-propanol, v/v). The addition of 2-propanol improves the elution efficiency for larger intrinsic membrane proteins.
A secondary gradient (Buffer D, 60% HCOOH; Buffer E, 2-propanol) was used to elute the very hydrophobic PS II polypeptides that remained bound to the column after the primary chromatographic elution in the AB buffer system described above (3,11). The column was equilibrated in 95% Buffer D, 5% Buffer E, prior to linear gradient elution to 100% Buffer E over 55 min. Six primary runs were used to accumulate material for each secondary elution.
Mass spectra were recorded on a PerkinElmer Life Sciences Sciex API IIIϩ triple-quadrupole mass spectrometer with an Ionspray™ source (Applied Biosystems, Foster City, CA) as described (3). The instrument was scanned from 600 -2300 m/z with a step size of 0.3 m/z and a dwell of 1 ms giving a total scan time of 6 s. An orifice potential of 65 V was used in all experiments. The computations of measured protein molecular mass were made using MacSpec 3.3 and zero-charge molecular mass reconstructions using BioMultiView 1.3.1 software (Applied Biosystems, Foster City, CA). Calculated average molecular masses were generated from translated published gene sequences (GenBank TM ) or published protein sequences (PIR or Swiss-Prot) using PeptideMass (expasy.cbr.nrc.ca/tools/peptide-mass.html). Post-translational modifications of thylakoid proteins were predicted by comparison of published modifications of orthologs.
Predictions of the HPLC retention time of proteins under 40 kDa were made by searching the ARATH (Arabidopsis thaliana) subset of the Swiss-Prot and TrEMBL data bases with TagIdent (ca.expasy.org/ tools/tagident.html). The ProtParam tool at ExPASy was used to calculate a predicted hydrophobicity (GRAVY; grand average of hydropathicity) of each of the proteins found in the TagIdent search. The GRAVY results from Arabidopsis thaliana were plotted against the predicted masses and compared with the observed pea HPLC retention times. The results of the TagIdent search (performed in July 2001) represent a crude data set because of the lack of, or incorrect, annotation of the A. thaliana genomes, especially of nuclear-encoded chloroplast proteins that are proteolytically trimmed of a substantial N-terminal leader sequence prior to thylakoid insertion in vivo.

RESULTS
Thylakoid membrane proteins were stripped of lipids, chlorophyll, and other pigments by precipitation with acetone and dissolved in 60% formic acid for the primary reverse-phase HPLC analysis coupled with electrospray-ionization mass spectrometry (LC-MS; see Ref. 3). Many polypeptides elute efficiently under standard conditions of 0.1% trifluoroacetic acid with increasing acetonitrile concentration. Averaging the elution profiles of several experiments allows a generalized elution map to be generated (Fig. 1). The trace shown follows total ion production across the scanned mass range such that peaks appear as molecules of different masses elute, rather like a protein UV elution profile except that relative abundance can be biased by the potential for different proteins to have different ionization efficiencies. Generally, the abundance of smaller proteins is exaggerated, because they ionize more efficiently than larger ones.
The very hydrophobic proteins elute from the column with lowered efficiency such that a substantial proportion remain bound to the column. Thus more hydrophobic PS II core polypeptides, including larger ones such as D2, CP43, and CP47, and small ones such as PsbM, are variably detected in the primary elution profile for appressed membrane fractions. The addition of isopropanol to acetonitrile (Buffer C) improved elution efficiency, though optimal spectra of D1 and D2 were obtained only by loading larger quantities of highly purified PS II reaction-center preparations (a subfraction of the appressed thylakoids used for the primary elution described above) to reduce the total number of proteins in the sample ( Fig. 2A). Efficient elution of the more hydrophobic PS II polypeptides was achieved in 60% formic acid with elution by isopropanol (Buffers D and E), but chromatographic resolution is impaired, and less complex mixtures are preferable (3). The generalized primary elution profile for appressed membranes (Fig. 1) was annotated with peak identities assigned by comparison of measured intact masses to values calculated based upon gene sequences and available information on post-transcriptional and post-translational modifications. There are few proteins for which all this information is available. However, specific genes in related species (orthologs) are generally highly conserved in sequence, and known modifications in one ortholog were useful for predicting the same modification in another.
Extrinsic polypeptides, including those of the oxygenevolving complex, elute early whereas more hydrophobic intrinsic polypeptides elute later in the primary elution profile. The masses of detected proteins and their proposed identities are presented in Table I in the order of their elution. Each significant mass spectrometric signal yielding a molecular weight is considered an intact mass tag (IMT) and compared with the data base. Some IMTs are easily assigned, because they correspond closely to well studied proteins whose mass can be calculated with great confidence based upon previous characterization. The native masses of some of the proteins that have been previously identified match very well with the IMT and are so indicated in Table I. Others are assigned based upon the pure coincidence of measured and calculated masses, with confidence being modulated by the number of other proteins of similar mass and similar hydrophobicity within the sample. In special cases it was necessary to compare all known orthologs of a particular protein and use phylogenetic conservation of sequence to identify probable sequence errors. 3 The calculated masses presented here are based upon sequences from data base entries without such sequence correction. Of the 90 IMTs recognized in the primary/secondary elution profiles (see Tables I and II), 40 genes were assigned, leaving 30 IMTs unassigned. Some of these likely correspond to the products of genes whose function remains as yet unidentified or proteins for which there is no gene sequence data in pea or spinach.
The majority of proteins found in the preparations were the membrane-bound PS II proteins, but extrinsic PS I and PS II proteins were also observed. The abundance of extrinsic proteins was dependent upon whether the appressed mem-branes were NaBr-washed after Triton treatment. Detectable quantities of some of the extrinsic components of PS I were measured even in washed preparations. Varying proportions of the more hydrophobic membrane-bound PS II proteins remain immobilized on the column under the primary elution gradient conditions used to generate the data in Table I but can be eluted efficiently with a second gradient. Table II shows the IMTs obtained from experiments using the formic acid-isopropanol system (solvents D and E) for the secondary elution profile applied to pea appressed-membrane preparations. Although several intrinsic membrane-bound proteins observed in the primary elution profile are also evident in the secondary elution, no extrinsic PS I or PS II proteins were observed in the secondary elution.
Some IMTs were observed only when larger quantities of more highly purified preparations were analyzed. A segment of the spinach D1 (PsbA) ion series ( Fig. 2A) with 31-28 charges is shown to illustrate the signal to noise ratio in the mass spectrum for the major and minor isoforms. Molecular weight is obtained by multiplying the observed m/z by the charge state and subtracting the mass of the charging species (number of protons), a process that is automated for presentation of averaged molecular weight spectra (Fig. 2B). Significantly, the minor signals (m/z 1234.9, 1276.0, 1320.4, and 1367.2; see Fig. 2A) in the mass spectrum, corresponding to the palmitoylated isoform of D1 described previously (12), were not discernable when the more complex appressed membrane grana subfractions were analyzed. Thus by working with more highly purified complexes we detect otherwise unseen minor components, of potential functional significance. The predominant mass peak in the reconstructed molecular weight spectrum of spinach D1 is in good agreement with the sequence translated from the gene, with removal of fMet and acetylation of Thr 2 (13), as well as C-terminal processing (14) (38025.0 Da observed; 38020.6 Da calculated; difference between observed and calculated masses, ⌬ ϭ 0.012%). The molecular weight reconstruction also shows the phosphorylated species (38104.0; 38100.6; 0.009%) and a palmitoylated species (38259.0; 38254.6; 0.012%), as well as other signals (38059.0; 38132.2) that probably arise from protein formylation during sample preparation. The data obtained with the acetonitrile/isopropanol system (solvents A and C) provides improved overall isoform resolution over the originally presented spectrum of spinach D1 obtained with the formic acid/isopropanol system (solvents D and E, compare spectrum to Ref. 3).
Based upon results from the secondary gradient elution of appressed thylakoids from pea, electrospray-ionization spectra are reported for the four large core subunits of PS II. Pea D1 (PsbA) is apparently processed as in spinach with N-and C-terminal trimming and N-acetylation (Fig. 3A, 38033.0 Da measured; 38033.5 Da calculated; ⌬ ϭ 0.002%) with a lesser subpopulation apparently phosphorylated (38118.0; 38113.5; 0.012%), though no evidence was observed for the palmitoy-FIG. 2. Electrospray-ionization mass spectrometry of minor thylakoid membrane protein isoforms. A, mass spectrum of spinach D1 protein. A segment of the mass spectrum showing four multiply charged ions derived from D1 is shown to illustrate signal to noise and the detection of a minor palmitoylated species. Each ion is labeled with its charge state and the measured m/z of the minor species. The molecular weight of the uncharged molecule is calculated by multiplying m/z by the charge state and subtracting the mass of the charges (protons). Measured molecular weight is averaged from all detectable signals across the complete mass spectrum (Hypermass, PE Sciex; Applied Biosystems, Foster City, CA). B, zero-charge molecular weight spectrum of spinach D1 was reconstructed from corresponding mass spectrum by computation (BioMultiView, PE Sciex; Applied Biosystems, Foster City, CA).  IMTs assembled in order of elution during the primary reverse-phase chromatographic elution. The mean Ϯ S.D. of n experiments is presented. b Calculated average mass of the uncharged assigned gene product shown in ID column. c ⌬, % difference between expected and observed masses; X, ⌬ Ͻ 0.01%; Y, ⌬ ϭ 0.01-0.02%; Z, ⌬ ϭ 0.02-0.07%. 1, IMT 26532.1 is in the mass range and retention time for PsbO, except the mass does not match the published sequence. The mass difference of 129 Da could reasonably be expected to be because of one or more amino acid changes. Removal of a single Q from the C terminus (26533.7 calculated) results in excellent agreement, for example. 2, the pea PsbH sequence has at least two amino acid changes from the published sequence. We have identified one (A18V) by MS/MS (not shown). 3, the pea petD sequence does not include the first exon and erroneously starts with an AUG codon in the second exon. A chimeric protein translated from the first exon of the spinach sequence and the second exon from pea gives a calculated mass of 17369.7 Da. The mass difference is consistent with changing either Pro 8 or Pro 12 encoded in exon 1 of spinach petD to Ser. 4, the published pea psbT gene sequence appears to have an in-frame deletion that is currently being confirmed. d Protein IDs in quotes indicate assignments made by retention times consistent with membrane localization and similar masses from published sequences from other plants. IDs in italic type indicate that a tobacco IMT (not shown) was correlated with a published tobacco gene sequence and that the protein eluted at a similar time to the pea and spinach orthologs. O, oxygen; P, phosphate; Ac, acetyl; F, formyl; fM, N-formylmethionine. Five unassigned post-translationally modified masses are labeled in lower case letters (a-e) in order of elution. All accession numbers are from GenBank™, unless otherwise noted.  a IMTs assembled in order of elution during the secondary reverse-phase chromatographic elution. Grouped IMTs indicate co-elution. IMTs without a range were observed in only one or the other experiment.
b Calculated average mass of the uncharged assigned gene product shown in ID column (Da). *, mass observed in Table I but with no published gene sequence from which to predict a mass. $ , modification of a (*) mass observed in Table I. c ⌬, % difference between expected and observed masses; X, ⌬ Ͻ 0.01%; Y, ⌬ ϭ 0.01-0.02%; Z, ⌬ ϭ 0.02-0.07%. 1, the mass and relative elution time of IMT 24347.4 Da is consistent with Lhcb3. The difference may be because of one of several possible single amino acid substitutions. 2, the published pea PsbT gene sequence appears to have an in-frame deletion (see text). 3, the IMT 6641.5 Da elutes at a time consistent with that predicted for Ycf9 (a conserved protein in plant chloroplast genomes of unknown function) and has similar mass. The discrepancy may be because of published errors in the DNA sequence or because of allelic differences in the cultivars used (Alaska, this study; Rosakrona, M27309  (15). Both isoforms are consistent with removal of fMet, acetylation of residue 2, and C-terminal removal of the last nine amino acids, and both have phosphorylated subpopulations. There are several single amino acid substitutions that could account for the 7-Da difference in isoform masses. Further experiments using various pea cultivars are necessary to confirm this observation. lated isoform after elution with formic acid/isopropanol as we observed in spinach under the same solvent conditions (3). The minor species at 38057.7 may correspond to a singly formylated form of D1 (38061.5 Da calculated; 0.011%). D2 (PsbD) is apparently predominantly non-acetylated after fMet removal (Fig. 3B, 39441.0; 39437.5; 0.009%) and has a phosphorylated isoform (39517.0; 39517.5; 0.001%). Two other minor species potentially correspond to the acetylated isoform (39487.0; 39479.5; 0.019%) and the acetylated phosphorylated isoform (39550.0; 39559.5; 0.024%) observed previously in spinach (13). The lower mass coincidence in the latter two species is presumably a consequence of the weaker ion currents because of the low abundance of these isoforms in the sample. The measured masses of pea D1 and D2 reported herein are lower than values reported previously (3,15), possibly because of allelic differences in the different pea cultivars used (Table II). The other two large PS II core subunits are CP43 (PsbC) (Fig. 3C) and CP47 (PsbB) (Fig. 3D). Pea CP43 is trimmed, removing 14 amino acids at the N terminus and acetylated (50207.0; 50204.9; 0.004%), and a population is phosphorylated (50285.0; 50284.9; 0.0002%) in agreement with MS-MS data taken from N-terminal tryptic phosphopeptides of spinach CP43 (13). The major pea CP47 species is apparently N-acetylated after fMet removal (55897.0; 55902.0; 0.009%). Other heterogeneity may potentially arise from oxidation, formylation, allelic sequence differences, or the existence of multiple copies of the gene in pea or spinach.
Post-translational modifications can provide useful functional information, because these are often regulatory. Scanning the intact mass tags for characteristic mass differences allows the identification of specific modifications. Phosphorylation increases mass by 80 Da, and several tags were indeed separated by this increment. IMTs identified as D1, D2, CP43, two Lhcbs, and PsbH were observed with the ϩ80-Da tag confirming their partial phosphorylation in vivo. Furthermore, a tag provisionally assigned as PsbT also exhibited the ϩ80 tag suggesting the presence of a previously unidentified thylakoid phosphoprotein. Moreover, PsbH had a second ϩ80 tag providing convincing evidence to support a second phosphorylation site on spinach and pea PsbH in vivo, as matrix-assisted laser desorption/ionization time-of-flight analysis of spinach (16) and tobacco 4 has shown previously. Mass spectrometric sequencing provided evidence for phosphorylation of Thr 2 and Thr 4 on a tryptic peptide recovered from proteolyzed A. thaliana thylakoids matching the N-terminal sequence predicted for PsbH (17), consistent with dual phosphorylation of the intact PsbH protein. Molecular weight reconstructions of spinach PsbH are shown in Fig. 4 demonstrating the utility of LC-MS in examining physiological changes to the native covalent state of the protein under different light conditions. Labeling studies provided compelling evidence for phosphorylation of thylakoid polypeptides smaller than PsbH (18). Other common modifications that are observed include covalent oxidation of methionine (ϩ16 Da) and formylation as a result of formic acid treatment (ϩ28 Da), as well as non-covalent adducts such as trifluoroacetic acid (ϩ113 Da). IMTs arising from these adducts were ignored unless considered relevant to the in vivo state, i.e. reproducible oxidation. DISCUSSION Proteomics seeks to reconcile protein data against genomic data to provide functional insights across the entire complement of expressed information. Most proteomics studies have focused upon identification of proteins after their fragmentation into small pieces that are subsequently matched to segments of genes. For example, the proteins of 4 Unpublished data.
FIG. 3. Large PS II membrane proteins elute efficiently during secondary elution of reverse-phase column. Membrane proteins that elute with low efficiency during the primary elution (Buffers A and B; see "Experimental Procedures") accumulate on the column and are eluted with a secondary buffer system (Buffers D and E; see "Experimental Procedures"). This subfractionation technique allowed electrospray-ionization mass spectrometry of the larger PS II subunits from the cruder appressed membrane preparation avoiding the need to prepare reaction-center subfractions, as used in Fig. 2. Note, however, that a palmitoylated isoform of pea D1 was not resolved. Zero-charge molecular weight spectra were prepared as for Fig. 2. A, pea D1; B, pea D2; C, pea CP43; D, Pea CP47. the thylakoid lumen have been separated by one-or twodimensional gel electrophoresis, and stained spots were identified by use of peptide-mass tags, sequence tags, and Nterminal Edman sequence data (19 -21).
Using an alternative approach, we have compiled a list of intact proteins detected in Triton-resistant thylakoid-membrane subfractions enriched in PS II (10) using LC-MS. The list of IMTs defines the appressed thylakoid proteome, at least for proteins of sufficient abundance to permit detection. Each IMT has been examined carefully by comparison of measured mass to masses calculated from published gene sequences and predicted elution time from the HPLC column based on predicted hydrophobicity. Based upon these comparisons, the identity of many of the IMTs has been proposed thereby allowing consideration of proteome coverage with respect to that predicted based upon genomic information and the body of the current knowledge of protein modification.
Because complete genome information is not available for spinach or pea, the A. thaliana genome was examined (23). The predicted total number of A. thaliana chloroplast proteins less than 40 kDa is presented in Table III. The TagIdent program identified more than 700 proteins that are predicted to be less than 40 kDa. The chloroplast subset of these proteins contains 133 members, 51 of which are known to be associated with the thylakoid membranes (either intrinsically or extrinsically). The search performed in July 2001 is an underestimate of the number of proteins in A. thaliana under 40 kDa because of incomplete annotation of the A. thaliana genomes. Several chloroplast proteins encoded in the nucleus (i.e. PsaH, PsbT, PsbW, and several light-harvesting proteins) were not returned by the TagIdent searches, because the chloroplast-targeting peptide is incorrectly annotated in a large number of the genes in the genomes. The mass range of several of these incorrectly annotated proteins is greater than 40 kDa, even though the processed protein is substantially less than 40 kDa. No attempt was made to correct the annotations that we knew to be wrong, because it would bias the results in Table III toward thylakoid proteins. The range of hydrophobicities of the proteins in Table III is illustrated in Fig. 5. The generalized order of elution from the reverse-phase HPLC column shown in Fig. 1 is presented below the GRAVY index (grand average of hydropathicity) to compare the relative elution time with the predicted hydrophobicity. The GRAVY statistic provides a reasonable indication of HPLC retention but tends to underestimate hydrophobicity if the protein has many buried charged residues, as in the case of the light-harvesting proteins, which have 15 charged amino acid residues within the bilayer (24). Retention of elements of secondary structure could also lead to higher retention times than predicted as polypeptide backbone polarity is minimized.
Not all IMTs are easy to assign, with some being more ambiguous and others remaining unassigned. In some cases modest alterations to a published gene sequence must be made to permit assignment. Such changes are based upon extreme conservation of specific residues across a wide range of sequences published for a variety of species. 3 Of course, all assignments are coincidental based upon match of intact mass to a calculated value, such that rigorous identification may be further required. This can be accomplished by splitting the effluent line between UV detector and mass spectrometer and running a proportion of the flow to a fraction collector for analysis by peptide mass fingerprints or sequence tags (LC-MSϩ; see Refs. 31 and 32). For example, LC-MSϩ was used to identify a ninth component of chloroplast thylakoid cytochrome b 6 f complexes as ferredoxin: NADP ϩ oxidoreductase (33,34). In another experiment, the spinach PsbF protein was sequenced by MS-MS 5 confirming that RNA editing results in a change of residue 26 from Ser to Phe (35), in agreement with MS studies of intact spinach and pea PsbF (3,36). In the case of the IMT assigned to pea PsbH (7697.3 Ϯ 1 Da), divergence from the calculated mass (7727.0 Da) is evident. Although peptide sequence obtained from MS-MS (not shown) shows that residue 18 is actually Val, rather than Ala as translated from the published DNA sequence (37), this change is insufficient to account for the mass difference between the observed and the calculated masses, and thus there are more changes to be found. Acetylation and phosphorylation of the N terminus of several thylakoid membrane proteins was identified in peptides isolated by immobilized metal ion chromatography using MS-MS (13,38). Clearly complete annotation of genomic data will require detailed structural analysis by mass spectrometry.
One approach to monitor the thylakoid "phosphorylome" involves treatment of thylakoid membranes with trypsin, enrichment of phosphopeptides with immobilized metal ion chromatography, and their quantification by LC-MS-MS (17). However, yield of these peptides can be influenced by trypsin accessibility, and thylakoid protein conformation is known to be affected by phosphorylation (39,40); furthermore, there is uncertainty about the efficiency of recovery of low abundance phosphopeptides by the immobilized metal ion chromatography procedure. 6 Therefore, care must be exercised in the interpretation of results obtained by this experimental method. An improvement would be to place such methodology downstream of the MS profile of the intact protein. For example, the oxidation of PsbH (see Fig. 4) would alter the mass such that a peptide may no longer be counted (unless specific incorporation of procedures to identify these species are used). Of course these detailed characterizations will be essential to clarify the relationship between membrane protein oxidation and phosphorylation.
The appearance of ϩ32-Da adducts on phosphorylated forms of PsbH (Fig. 4B) would suggest oxidative addition of dioxygen via sulfone or endoperoxide formation, for example. Adducts of ϩ32 Da were also observed when peptides derived from light-treated PS II reaction centers were analyzed (41) and contrast with ϩ16-Da adducts that typically result from oxidation of methionine to its sulfoxide in aqueous buffers (6). Furthermore, it appears that the most highly phosphorylated forms of PsbH are also the most highly modified with the ϩ32 adducts suggesting a link between the two phenomena. Detailed investigations of oxidative damage to proteins in biological membranes under physiological conditions are necessary. CONCLUSION Electrospray-ionization mass spectrometry, coupled with liquid chromatography, is being used (3,4,(32)(33)(34)(41)(42)(43)(44) to analyze intact thylakoid membrane proteins. LC-MS experiments provide an attractive means to monitor physiological changes in covalent status across the entire complement of thylakoid proteins and in subfractions from different membrane domains as a function of light and other stresses, providing significant benefits to functional genomics. Although it has been possible to propose identities of many thylakoid membrane proteins based upon coincidence of measured with calculated masses, it may be impractical to do so where there are much greater varieties of gene products (humans, for example) with more extensive modifications, especially glycosylation. However, the incorporation of fraction collection concomitant with intact mass measurements (LC-MSϩ) provides a convenient solution to the problem.