|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 2:1253-1260, 2003.
© 2003 by The American Society for Biochemistry and Molecular Biology, Inc.



,¶
From the
Department of Chemistry and Chemical Biology and the
Department of Plant Biology, Cornell University, Ithaca, New York 14853
| ABSTRACT |
|---|
|
|
|---|
3000 proteins predicted for the chloroplast of Arabidopsis, identification of 97 as the precursors of isolated proteins has been achieved with two studies (5, 6) utilizing the bottom-up mass spectrometry (MS)1 technique (810); the study in the laboratory of one of the present authors (K. J. vW.) identified 81 proteins (5). However, this is far less effective for characterization of the primary structure of an isolated protein, such as determining the cleavage site for loss of the signal peptide and locating posttranslational modifications, as well as distinguishing highly similar proteins. For this, the recently developed technique of "top-down" MS has uniquely valuable attributes (1118). Here we directly compare the two MS methods for both the identification and characterization of seven chloroplast proteins. In the common bottom-up approach the protein is first purified and cleaved into peptides (e.g. with trypsin), whose relative molecular weight (Mr) values are measured by MS (810). The resulting "peptide mass fingerprint" is matched against those expected from the DNA-predicted proteins to identify the precursor. MS/MS of individual peptides can provide more specific information for identifications of higher confidence, especially valuable for proteins of lower purity (19). However, to characterize the structure of the isolated protein, i.e. to identify all differences between the predicted and actual protein sequence, requires Mr or MS/MS data from peptides representing all of the protein, which is made difficult by the typical sequence coverage of 1540% (5, 8, 9, 20). Further, peptide mass differences thought to be due to RNA editing, alternative splicing, signal peptide cleavage, and posttranslational modifications could also be due to impurities or self-proteolysis (1, 2, 5). The signal peptide cleavage site may be determined by an additional step of Edman degradation (5) or predicted theoretically (21). However, the high identification value of the well established bottom-up method has led to its convenient automation for routine samples.
In the alternative top-down approach (1118), careful purification is not required, and the protein mixture without digestion is introduced directly into the Fourier transform MS instrument (22) using electrospray ionization (ESI) (23, 24). Application to trace level proteins is illustrated here with a 5% mixture component; top-down characterization of 1% mixture components has been reported (17, 18). The resulting mass spectrum shows accurate Mr values for the proteins present in the mixture. After high resolution separation of the molecular ions of an individual protein, these are dissociated (2532), and the resulting fragment masses are matched against the DNA-predicted sequence to identify the protein. If its Mr value, minus that of the predicted signal peptide lost, does not match the measured value, such discrepancies in the fragment masses can then locate multiple modifications or sequence alterations (1118). This approach is applied here to find accurate (±1 Da) Mr values for 22 proteins and to identify and characterize seven proteins, all from the three soluble proteomes (thylakoid peripheral, thylakoid lumen, and stroma) of the chloroplast of A. thaliana identified previously by bottom-up MS (5, 6).
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Size Exclusion Chromatography (SEC) and Ion Exchange Chromatography
Thylakoid lumen proteins were separated by SEC on a 50 x 0.7-cm column (BioRad) packed with Sephacryl 200 (Amersham Biosciences) using 50 mM Tris-HCl (pH 7.8) with 0.1 M NaCl as a mobile phase at 55 µl·min-1 at 4 °C. Fractions were concentrated to 35 mg of protein·ml-1. Thylakoid peripheral proteins were ion exchange chromatography-fractionated on a Q Hi-trap bullet column (Amersham Biosciences) and eluted with a linear gradient of NaCl (01 M). Native stroma extract was loaded on the 100 x 1.2-cm SEC column (BioRad) packed with Sephacryl 500 (Amersham Biosciences) and separated using 50 mM Tris-HCl (pH 7.8) as a mobile phase at 0.5 ml·min-1 (4 °C). The rubisco-containing fraction, eluting as a 550-kDa complex, was denatured in 6 M urea and 20 mM dithiothreitol, and the resulting small and large subunits were separated by SEC as described for thylakoid lumen.
Proteolysis
Protein digestions with Lys-C, Glu-C, and Asp-N (Roche Applied Science) were performed according to the manufacturers instructions. Digestion with Glu-C was carried out for 30 min at 25 °C, Arg-C for 2 h at 37 °C, and Asp-N for 4 h at 37 °C.
MS Analysis
Samples were desalted on a reverse-phase protein trap (Michrom Bioresources Inc.), washed with 2 ml of 0.1:99:0.5 MeCN/H2O/CH3COOH, and eluted with 150 µl of 50:45:5 MeCN/H2O/CH3COOH. This eluent was loaded into a nanospray ESI emitter with a 24-µm inner diameter tip with 1.01.5 kV versus the MS inlet, producing a flow rate of 20300 nl·min-1. The resulting ions were guided into the ion cell (10-9 torr) of a modified 6 T Finnigan FTMS device (22). Fragmentation was achieved by "in-beam"-activated ion electron capture dissociation (ECD) (31), plasma ECD (32), or infrared multiphoton dissociation (IRMPD) (26) for ions entering the FTMS cell or by isolating specific ions in the cell using stored waveform inverse Fourier transform (SWIFT) (33) followed by IRMPD or collisionally activated dissociation (CAD) (25, 27). Short sequence tags deduced from the fragment masses were used to search the public database of A. thaliana (34). Alternatively, the unprocessed spectral data were searched using the ProSight search engine,2 specifically developed for top-down FTMS by Kelleher and co-workers (16, 35). Assignments of fragment masses were made with the computer program THRASH (36). The mass difference (in units of 1.00235 Da) between the most abundant isotopic peak and the monoisotopic peak is denoted in italics after each Mr value.
| RESULTS |
|---|
|
|
|---|
|
|
20,212 appear in all three samples.
|
B
C represent MS3.
|
|
|
|
HR- cleavages from ECD (2830), allowing assignment of the protein terminus (N or C) contained by fragment ions from dual cleavages (38). | DISCUSSION |
|---|
|
|
|---|
Visualization of Proteins in Crude Samples
Electrophoretic separations are conventionally employed for the qualitative assessment of the protein content of the sample. For this the ESI mass spectrum is quite complementary. For example, the Mr values from the thylakoid peripheral proteins of 11,701-7, 16,309-10, 20,211-12, and 26,567-16 correspond to electrophoretic bands (Fig. 1) and provide accurate (usually ±1 Da) molecular weight information. However, with the broad range of Mr values in this sample, the heavy (>45 kDa) proteins shown by electrophoresis do not appear in the ESI spectrum.
Direct Component Identification
For the thylakoid peripheral sample, the SDS-PAGE analysis indicates an
16-kDa protein in possibly 5% concentration (Fig. 1A). SWIFT (33) ejection of all other ions from the cell isolated the corresponding 16,309.7-10 isotopic cluster in fair signal/noise (Fig. 1). CAD of these ions gave detectable high mass products corresponding to sequential mass losses of 75.0 (16,310.0 - 16,235.0), 113.1, 128.1, and 71.0 Da (Fig. 1B). These differences are indicative of C-terminal Gly (57.0) + H2O, Leu/Ile (113.1), Lys (128.095) or Gln (128.06), and Ala (71.0). Searching the DNA-predicted protein sequences (34) for one with this C-terminal sequence found correspondence only for At4g21280 (34) (photosystem II oxygen-evolving complex) with its C-terminal sequence HO-Gly-Leu-Lys-Ala. However, this protein would have instead Mr = 16,123.4-10 after the predicted removal of the signal peptide (5), substantially lower than the measured value of 16,310-10 (±1 Da). To check this discrepancy, the dominant 16,122-10-Da isotopic peaks in the CAD MS/MS spectrum were of sufficient abundance to be fragmented again by CAD (MS3, Fig. 1C); this gave sequential terminal losses of 186.0, 113.2, 87.1, 113.2, and 128.1. These must be from the N terminus, as the next predicted losses from the C terminus of this fragment ion would be Lys (128 Da) and Ala (71 Da) (Fig. 1B). The predicted N terminus of At4g21280, after loss of the signal peptide, matches only part of the observed sequence tag, Ile (113.1 Da)-Ser (87.0)-Ile-Lys (128.1); thus, the protein N terminus is actually longer by a 186-Da fragment. The signal peptide loss did not include the next two predicted residues, Ala-Asp, whose mass sum of 186.1 Da corresponds to the 186-Da difference of the MS3 spectrum. Inclusion of these residues gives a predicted Mr value of 16,309.5-10, in close agreement with the measured value of 16,309.7-10. Although the molecular ion C terminus showed a high tendency for amino acid loss in MS2, the new C-terminal group of this product ion (39) has apparently stabilized this end so that further MS3 dissociation occurs predominantly at the N terminus. Thus, MS2 correctly identified the protein from among those predicted, but the accurate Mr value showed that the predicted sequence was incorrect, with the complementary sequence information of MS3 pinpointing the discrepancy to the N-terminal signal peptide cleavage.
Identification in SEC Fractions
Further concentration of the protein components can increase the amount of MS/MS sequence data and make such top-down identification more straightforward. In SEC fraction 15 from the thylakoid lumen sample, only six protein molecular ions (each in multiple charge states and m/z values) are significant (Fig. 2). CAD of the isolated 16+ ions of Mr = 16,348.8-10 shows a sequence tag of mass differences (Fig. 4) of 97.4 (Pro), 113.0 (Leu/Ile), 128.1 (Lys/Gln), 98.8 (Val), 114.0 (Gly2 or Asn), 96.9 (Pro), and 210.3 Da (Leu/Ile + Pro). Of the predicted proteins, the only match found in the database was At4g05180 (34) (also important for oxygen evolution) with the residues 311 sequence of Pro-Ile-Lys-Val-Gly-Gly-Pro-Leu-Pro but with Mr = 16,149.3-10 after the predicted removal of the signal peptide (5). The Mr value difference of 200.5 Da corresponds to the sum of the masses of the next amino acid residues in the DNA predicted sequence (before signal peptide loss), Glu and Ala (129.0 + 71.0 Da), showing that the mature protein also contains these N-terminal amino acids. In total, CAD produced four N-terminal b ions and 19 C-terminal y ions, confirming the predicted sequence and the absence of posttranslational modifications. Although the bottom-up methodology correctly identified the DNA-predicted precursor protein, top-down fully characterized the primary structure of the isolated protein.
For the Mr = 20,211.3-10 protein, IRMPD of the 18+ ions (Fig. 2) and CAD of the 18+ and 19+ ions yielded extensive sequence information (Fig. 5) that not only included a "tag" region (Ser, Glu)-Gly-Gly-Phe-(Asp, Asn2, Ala)-Val-Ala that matched the predicted protein At1g06680 (34) (oxygen evolution) but also gave a total of 27 fragment ions fully supporting this sequence without posttranslational modifications. Although this is also indicated by the match with the predicted Mr value of 20,212.4-10, cases are known in which an exact (±1 Da) Mr match gave an incorrect retrieval (15, 18). Here the bottom-up identification has been confirmed by top-down, but with far higher reliability.
Characterization of Proteins with Overlapping Molecular Ions
An important component of the thylakoid peripheral sample gives isotopic clusters corresponding to Mr = 26,567.9-16 in the ESI spectrum (Fig. 1), although the measured isotopic abundance distribution is significantly broader than that expected theoretically (Fig. 6, top left) for a single component (see other predicted/experimental matches in Figs. 1 and 2). IRMPD spectra of the isolated 26+, 27+, and 28+ ions gave a mass-difference sequence tag of (Asp, Tyr)-Ala-Ala-Val-Thr-Val-(Gln, Leu) that matched (Fig. 6) regions of two proteins, At5g66570 (34) (Mr = 26,565.3-16) and At3g50820 (Mr = 26,571.3-16); these paralogues of OEC33 are important for oxygen evolution. However, the actual mass values of the five b and eight y ions defining the sequence tag only matched those expected for Mr = 26,565.3-16. Using also the complementary MS/MS technique of electron capture dissociation, CAD and ECD (in-beam and plasma) spectra of the 26,567.9-16 isotopic cluster ions gave a total of 40 fragment ions that could be formed only from the 26,565.3-16 protein (Fig. 6), plus 33 more that could come from both proteins. However, nine fragment ions were also found that would be formed only by the 26,571.3-16 protein, clearly showing the presence of both paralogues. The bottom-up study found only one two-dimensional gel spot, whose proteolysis gave peptides consistent with either of the DNA-predicted precursors (5).
To compare the capabilities of the bottom-up methodology to identify these highly similar proteins using the higher FTMS performance, the protein mixture was concentrated further and subjected to partial proteolysis, and the ESI/FTMS spectrum was measured for the resulting peptide mixture. Glu-C digestion after SEC separation gave 72 peptide Mr values, average mass of
9695 Da, of which eight are assignable to 26,565.3-16 protein and four to 26,571.3-16 protein. Arg-C digestion after separation by ion exchange chromatography gave 110 peptide Mr values, average mass of
13,472 Da, of which the only assignable were two for 26,565.3-16 protein. Asp-N digestion after SEC separation gave 92 Mr values, average mass of
4365 Da, of which 14 matched the 26,565.3-16 protein and four matched the 26,571.3-16 protein. Thus, in total this bottom-up approach would have given eight Mr values that indicated the presence of the second protein, but with >80% unassignable Mr values as well as requiring further sample purification and three sample-consuming proteolytic procedures.
Relative Amounts of Proteins with Overlapping Molecular Ions
With MS/MS characterization of these unmodified paralogues with Mr = 26,565.3 and 26,571.3, the observed abundances of the isotopic peak cluster can now be deconvoluted into two predicted abundance distributions around these Mr values whose sum fits the observed distribution. A 3:1 ratio of the paralogues gives the fit shown in Fig. 6, upper right. In qualitative agreement, MS/MS gave a 40:9 ratio of the number of fragment ions uniquely originating from the paralogues, whereas proteolysis with three enzymes gave a 24:8 ratio of peptides uniquely originating from them.
Characterization of Posttranslational Modifications
MS indicated the stromal protein sample to be the most complex; of 14 Mr values, seven are between 13,182-8 and 14,810-9 (Fig. 3), a challenge for the protein purification used in the bottom-up approach. Of these proteins, the two most abundant show Mr = 14,712.2-9 and 14,810.4-9. Separate CAD spectra of their isolated 14+ molecular ions gave identical extensive sequence tags Cys-Leu/Ile-Ser-Phe-Leu/Ile-Ala-Tyr-Gln/Lys, although these identical mass differences were derived from non-identical fragment ion mass values. Matching of this tag versus the DNA-predicted sequence possibilities identified two paralogues, At5g38410 and At1g67090 (34), of the small subunit of rubisco that is involved in the CO2 fixation. The bottom-up studies only identified At1g67090 (5, 6).
However, for these closely related proteins, the predicted Mr values after signal peptide loss are 14 units low. To examine these discrepancies with a more concentrated sample, the 550-kDa rubisco complex (40) was isolated by SEC with SDS-PAGE identification and denatured, and the corresponding small subunit was isolated by SEC. Its ESI mass spectrum (Fig. 7A) is now dominated by these two components, and dissociation of these ions with IRMPD (Fig. 7B) and in-beam ECD gave product ions representing cleavage of 47 of 125 bonds in At5g38410 and 54 of 124 bonds in At1g67090 (Fig. 3). All of the N-terminal fragment ion masses are 14 Da higher than predicted; the smallest of these in both proteins limits the +14-Da modification to the first four residues. All of the C-terminal fragment masses agree with the predicted values; the +14-Da modification is limited in At5g38410 to the first two N-terminal residues by the y124 ions and to the first six residues in At1g67090 by the z119 ions.
The b18 ions from the two precursors, as well as the other N-terminal fragments of <22 residues, are of the same nominal mass (isobaric); the Lys-2 (128.095 Da) of At5g38410 is replaced by Gln (128.06 Da) in At1g67090 (Fig. 3). IRMPD of the mixed Fig. 7A ions produced the mixed b18 (2161.2 Da) in sufficient abundance so that MS3 dissociation gave a significant abundance of 2016.2-Da fragment ions (Fig. 7C). The difference of 145.0 Da demonstrates that the N-terminal Met (131.0 Da) has the 14-Da modification. Other peaks representing further losses of 128 (Lys or Gln), 99 (Val), and 186 Da (Trp) confirm the presence of the extra 14 Da. It is conceivable that these fragments originated from only one of the two b18 precursors; if the other contained no extra 14-Da group in its first four N-terminal residues, any corresponding fragment ions would be 14 Da higher in mass (vertical arrows, Fig. 7C). The absence of these peaks, as well as the expected similar fragmentation behavior of the two b18 ions, makes it highly probable that both have the extra N-terminal 14 Da. An extra methyl group is the most logical explanation, with N-methylation far more probable than elsewhere on the terminal Met. This rare posttranslational modification was recently found for the first time in plants, also in small rubisco subunits, with the N-terminal methylation characterized by electron ionization MS (41) of the first residue from Edman degradation. Again, MS3 has provided valuable sequence information at the fragment ion terminus farthest from the original cleavage that formed the MS2 fragment ion.
Conclusions
For the present bottom-up and top-down methodologies, the automation of the former makes it the better choice for the first identifications of the precursor proteins from a genome such as Arabidopsis. Bottom-up identified 97 proteins, whereas this study found 22 different protein molecular ions.
However, for the seven of these proteins studied further, the top-down approach has shown unique capabilities for characterization of a complex eukaryotic proteome. Extensive protein purification and enzymatic digestion are not required; an
5% component of the thylakoid peripheral sample was directly characterized by MS2 and MS3. The accurate mass values, ±1 Da or better, of the protein fragment ions can provide sequence tags for direct identification. Even higher mass accuracy has been demonstrated with FTMS (42), such as that sufficient for distinguishing the isobaric fragment ions containing Lys-2 (128.095 Da) of At5g38410 versus Gln-2 (128.06 Da) of At1g67090. Agreement between the Mr value of a predicted protein and the measured Mr value is a strong indication of the absence of posttranslational modifications. However, of the seven intact proteins here characterized directly, all but one, At1g06680, showed different experimental and predicted Mr values. For proteins At4g21280 and At4g05180, MS/MS and three-stage MS product ions corrected the predicted cleavage site for the loss of the signal peptide. Such data also identified two highly similar proteins differing in mass by only 6 Da (for which bottom-up (5, 6) only indicated that one or both were present); deconvolution of their isotopic peaks indicated relative amounts of 3:1. Finally, two similar proteins (of which only one was identified by bottom-up (5, 6)) were found to be posttranslationally methylated, and the unusual N-terminal site of the modification was pinpointed by three-stage MS. Although none of these successful primary characterizations required >50% sequence coverage in the MSn experiment, more extensive efforts with ECD, CAD, and IRMPD have cleaved 250 of 258 interresidual bonds in bovine carbonic anhydrase (43) with a single plasma ECD spectrum showing 183 different cleavages (32). Of particular importance for the future routine application of this top-down methodology to plant proteomes is the development of automated methods for sample separation, MSn, and data analysis, as pioneered by the Kelleher laboratory (16, 35).
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, September 22, 2003, DOI 10.1074/mcp.M300069-MCP200
1 The abbreviations used are: MS, mass spectrometry; MS/MS, tandem mass spectrometry; MS2, tandem mass spectrometry; MS3, three-stage mass spectrometry; MSn, multistage mass spectrometry; ESI, electrospray ionization; FTMS, Fourier transform mass spectrometry; SWIFT, stored waveform inverse Fourier transform; CAD, collisionally activated dissociation; IRMPD, infrared multiphoton dissociation; ECD, electron capture dissociation; SEC, size exclusion chromatography; rubisco, ribulose-bisphosphate carboxylase/oxygenase. ![]()
2 The Kelleher Group (Neil L. Kelleher and colleagues), ProSight PTM at prosightptm.scs.uiuc.edu/. ![]()
* This work was supported by Grant MCB 0090942 from the National Science Foundation (to K. J. vW.) and Grant GM16609 from the National Institutes of Health (to F. W. M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
¶ To whom correspondence should be addressed. Tel.: 607-255-4699; Fax: 607-255-7880; E-mail: fredwmcl{at}aol.com
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
V. Zabrouskov, Y. Ge, J. Schwartz, and J. W. Walker Unraveling Molecular Complexity of Phosphorylated Human Cardiac Troponin I by Top Down Electron Capture Dissociation/Electron Transfer Dissociation Mass Spectrometry Mol. Cell. Proteomics, October 1, 2008; 7(10): 1838 - 1849. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Roth, A. J. Forbes, M. T. Boyne II, Y.-B. Kim, D. E. Robinson, and N. L. Kelleher Precise and Parallel Characterization of Coding Polymorphisms, Alternative Splicing, and Modifications in Human Proteins by Mass Spectrometry Mol. Cell. Proteomics, July 1, 2005; 4(7): 1002 - 1008. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. Hanson, H. Videler, C. Santos, J. P. G. Ballesta, and C. V. Robinson Mass Spectrometry of Ribosomes from Saccharomyces cerevisiae: IMPLICATIONS FOR ASSEMBLY OF THE STALK COMPLEX J. Biol. Chem., October 8, 2004; 279(41): 42750 - 42757. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. P. Syka, J. J. Coon, M. J. Schroeder, J. Shabanowitz, and D. F. Hunt Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry PNAS, June 29, 2004; 101(26): 9528 - 9533. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||