Characterization of Protein Variants and Post-Translational Modifications: ESI-MSn Analyses of Intact Proteins Eluted from Polyacrylamide Gels*

We have developed a strategy to characterize protein isoforms, resulting from single-point mutations and post-translational modifications. This strategy is based on polyacrylamide gel electrophoresis separation of protein isoforms, mass spectrometry (MS) and MSn analyses of intact proteins, and tandem MS analyses of proteolytic peptides. We extracted protein isoforms from polyacrylamide gels by passive elution using SDS, followed by nanoscale hydrophilic phase chromatography for SDS removal. We performed electrospray ionization MS analyses of the intact proteins to determine their molecular mass, allowing us to draw hypotheses on the nature of the modification. In the case of labile post-translational modifications, like phosphorylations and glycosylations, we conducted electrospray ionization MSn analyses of the intact proteins to confirm their presence. Finally, after digestion of the proteins in solution, we performed tandem MS analyses of the modified peptides to locate the modifications. Using this strategy, we have determined the molecular mass of 5–10 pmol of a protein up to circa 50 kDa loaded on a gel with a 0.01% mass accuracy. The efficiency of this approach for the characterization of protein variants and post-translational modifications is illustrated with the study of a mixture of κ-casein isoforms, for which we were able to identify the two major variants and their phosphorylation site and glycosylation motif. We believe that this strategy, which combines two-dimensional gel electrophoresis and mass spectrometric analyses of gel-eluted intact proteins using a benchtop ion trap mass spectrometer, represents a promising approach in proteomics.

We have developed a strategy to characterize protein isoforms, resulting from single-point mutations and posttranslational modifications. This strategy is based on polyacrylamide gel electrophoresis separation of protein isoforms, mass spectrometry (MS) and MS n analyses of intact proteins, and tandem MS analyses of proteolytic peptides. We extracted protein isoforms from polyacrylamide gels by passive elution using SDS, followed by nanoscale hydrophilic phase chromatography for SDS removal. We performed electrospray ionization MS analyses of the intact proteins to determine their molecular mass, allowing us to draw hypotheses on the nature of the modification. In the case of labile post-translational modifications, like phosphorylations and glycosylations, we conducted electrospray ionization MS n analyses of the intact proteins to confirm their presence. Finally, after digestion of the proteins in solution, we performed tandem MS analyses of the modified peptides to locate the modifications. Using this strategy, we have determined the molecular mass of 5-10 pmol of a protein up to circa 50 kDa loaded on a gel with a 0.01% mass accuracy. The efficiency of this approach for the characterization of protein variants and post-translational modifications is illustrated with the study of a mixture of -casein isoforms, for which we were able to identify the two major variants and their phosphorylation site and glycosylation motif. We believe that this strategy, which combines two-dimensional gel electrophoresis and mass spectrometric analyses of geleluted intact proteins using a benchtop ion trap mass spectrometer, represents a promising approach in proteomics.

Molecular & Cellular Proteomics 2:483-493, 2003.
The most widely used approach in proteomics using mass spectrometry (MS) 1 is referred to as "bottom up" strategy. In this approach, protein identification is achieved after protein separation by one-or two-dimensional (1D or 2D) polyacrylamide gel electrophoresis followed by protein digestion usually with trypsin, analyses of the resulting peptide mixture using various MS approaches, and data base search. To overcome limitations of 2D gel electrophoresis, alternative techniques based on multidimensional liquid chromatography have been recently developed (1,2). The "bottom up" strategy has allowed the identification of thousands of proteins in complex mixtures (3). The low protein sequence coverage often obtained using the "bottom up" approach hinders the characterization of post-translational modifications, singlepoint mutations, and truncated forms of a protein. Moreover, the molecular mass of the intact protein is not directly accessible. Recently, "top down" approaches have been described, which are based on mass spectrometric analyses of intact proteins (4). Protein identification is obtained after data base search using the measured protein molecular mass and protein sequence tags determined from MS and tandem MS (MS/MS) analyses of the intact protein, respectively. The critical step in this strategy is the correct charge state assignment of the protein fragment ions to obtain sequences tags. This is achieved using either high-resolution Fourier transform-ion cyclotron resonance mass spectrometers (5-7) or low-resolution ion trap mass spectrometers enabling fragment ion charge state manipulation through ion/ion reactions, which convert multicharged ions into singly charged ions (8 -10). Even though the molecular mass of the intact protein is determined using this approach, the characterization of protein modifications remains challenging. Recently, the use of a moderate resolution routine quadrupole time-of-flight mass spectrometer was described as an efficient approach for the analysis and identification of intact proteins in complex mixtures (11).
Protein isoforms may originate from alternative splicing of mRNA, single-point mutations, and post-translational modifications including proteolytic cleavages. These modifications often introduce a variation in the molecular mass and net charge of the protein. The most efficient technique to separate protein isoforms thus remains 2D gel electrophoresis. The identification and localization of the modification can then be obtained using MS/MS analyses of the peptide bearing the modification after proteolysis of the protein. Depending on the nature of the modification to be identified, different strategies to detect the modified peptide can be employed. The choice of this strategy would be facilitated if the molecular mass of the intact protein isoform were known. Indeed, the mass difference between two isoforms or between the isoform and the expected theoretical molecular mass would allow drawing a solid hypothesis on the nature of the protein modification. Efforts have been made to extract proteins from 2D polyacrylamide gels using various methods, including electroelution (12)(13)(14)(15), electrotransfer on membranes (16 -18), and passive elution (19 -21). These methods, however, are not adapted for the further determination of the molecular mass of picomoles of protein isoforms using MS. For example, electroelution uses large elution volumes leading to significant losses of material. Electrotransfer on nitrocellulose and polyvinylidene difluoride membranes is compatible with matrix-assisted laser desorption/ionization time-of-flight analysis of intact proteins, but mass accuracy is not always sufficient using this technique for the identification of protein modifications. Finally, passive elution has been described using formic acid, which may lead to formylation of serine and threonine residues (20), thus excluding the use of this approach for post-translational modifications studies. Passive elution has also been described using SDS (22). The presence of this detergent being detrimental for MS analyses, a further step to remove SDS is necessary to allow MS analyses.
Here we report on the usefulness of a strategy combining the "bottom up" and "top down" approaches to characterize protein isoforms resulting from single-point mutations and post-translational modifications. This strategy is based on polyacrylamide gel electrophoresis separation of protein isoforms, MS and MS n analyses of intact proteins, and MS/MS analyses of proteolytic peptides. Protein isoforms are extracted from polyacrylamide gels by passive elution using SDS, followed by nanoscale hydrophilic phase chromatography for SDS removal. Electrospray ionization (ESI)-MS analyses of the intact proteins are performed to determine their molecular mass, allowing hypotheses on the nature of the modification to be drawn. In the case of labile post-translational modifications, like phosphorylations and glycosylations, ESI-MS n analyses of the intact proteins are conducted to confirm their presence. Finally, proteins are digested in solution, and MS/MS analyses of the modified peptides are performed to locate the modifications. This strategy allows the determination of the molecular mass of 5-10 pmol of a protein up to circa 50 kDa loaded on a gel. The efficiency of this approach for the characterization of protein variants and posttranslational modifications is illustrated with the study of -casein isoforms.

MATERIALS AND METHODS
1D SDS-PAGE-Myoglobin, protein A, enolase, and bovine serum albumin were purchased from Sigma Aldrich, and carbonic anhydrase was obtained from Amersham Pharmacia Biotech. SDS-PAGE separations were conducted on a MultiPhor II system (Amersham Pharmacia Biotech) using ExcelGel® SDS homogenous 12.5% precast gels. Gels were stained with Coomassie Brilliant Blue, and spot intensities were measured using the Quantity One® software (Bio-Rad, Hercules, CA).
2D Gel Electrophoresis--Casein (Sigma Aldrich) was desalted by trichloracetic acid precipitation and resuspended in the rehydration buffer (9 M urea, 2.2% 3-[(3-cholamidopropyl)dimethylammonio]-1propanesulfonic acid, 65 mM dithiothreitol). The first dimension was performed on the IPGphor system (Amersham Pharmacia Biotech). Briefly, the 7-cm pH 4 -7 immobilized pH gradient strip (Amersham Pharmacia Biotech) was rehydrated with the sample solution for 11 h under 50 V at 20°C. Then the sample was focused for 30 min at 200 V, 30 min at 500 V, 1h at 2000 V, followed by 8000 V for a total of 80 kVh.
Immediately prior to the second dimension, the IPG strip was incubated in 5 ml of 50 mM Tris-HCl, 6 M urea, 2% SDS, 30% glycerol, and 65 mM dithiothreitol under rotary shaking for 15 min, followed by 15 min in the same solution where dithiothreitol was replaced by 2.5% iodoacetamide.
SDS-PAGE was performed on the MultiPhor II system using a 12.5% acrylamide precast gel (Amersham Pharmacia Biotech). Complete migration was achieved at 15°C after 1 h at 200 V followed by 600 V for a total of 500 Vh. Spots were visualized by Coomassie Brilliant Blue staining.
Passive Elution of Proteins from Polyacrylamide Gels-Gel pieces were excised and washed with H 2 O for 2 h and proteins were allowed to diffuse out of the gel overnight at 37°C by incubation in 30 l of 0.1 M sodium acetate, 0.1% SDS, pH 8. Nano-ESI-MS of Intact Proteins-Nano-ESI-MS analyses were performed on an ion trap mass spectrometer (LCQ Deca, Ther-moFinnigan, San Jose, CA). Mass spectra were acquired by scanning an m/z range from 370 to 2000 at 70 ms/scan. Eluates (0.5 l) were injected into the mass spectrometer using the Famos® auto-sampler (LC Packings, Dionex, Amsterdam, The Netherlands) and the Ulti-mate® pumping device (LC Packings) through a fused silica capillary (ID 75 m) at a flow rate of ϳ250 nl/min (carrier solvent: H 2 O/MeOH/ acetic acid 50/50/0.5). Electrospray needles were obtained from New Objective (Woburn, MA). Spray voltage was set at 1.8 kV, and capillary temperature was set to 180°C. Mass spectra were deconvoluted using the Biomass® program included in the Xcalibur® software (ThermoFinnigan).
For MS n experiments, various mass-to-charge ratios were selected using a 1 m/z unit ion isolation window and fragmented by collisions using a 20% relative collision energy and a Q activation parameter of 0.25 for 30 ms.
On-line Capillary High-pressure Liquid Chromatography-Nanospray MS/MS Analyses of Protein Digests-After extraction of the protein from the gel spot and desalting, acetonitrile was evaporated from the eluate by vacuum centrifugation. Five microliters of trypsin (Promega, Madison, WI) at 12.5 ng/l were added, and the sample was incubated at 37°C for 4 h. The peptide mixture was directly analyzed by on-line capillary high-pressure liquid chromatography (LC Packings) coupled to a nanospray ion trap mass spectrometer (LCQ Deca, ThermoFinnigan). Peptides were separated on a 75 m ID ϫ 15 cm C18 PepMap™ column (LC Packings). The flow rate was set at 150 nl/min. Peptides were eluted using a 0 to 40% linear gradient of solvent B in 40 min (solvent A was 0.1% formic acid in 2% acetonitrile, and solvent B was 0.1% formic acid in 90% acetonitrile). The mass spectrometer was operated in positive ion mode at a 1.8-kV needle voltage and a 30-V capillary voltage. Data acquisition was performed in a data-dependent mode consisting of, alternatively in a single run, a full scan MS over the range m/z 370 -2000, a zoom scan of the ion selected in an exclusion dynamic mode (the most intense ion is selected and excluded for further selection for a duration of 0.5 min), and a full scan MS/MS of the selected ion. MS/MS data were acquired using a 3 m/z units ion isolation window and a 35% relative collision energy.

Protein Recovery After Elution from Polyacrylamide Gels and SDS Removal-
The efficiency of protein elution from polyacrylamide gels prior to ESI-MS analysis was assessed by calculating the protein recovery of standard proteins of different sizes after passive elution and after further SDS removal.
Myoglobin (17 kDa), carbonic anhydrase (29 kDa), protein A (45 kDa), and bovine serum albumin (64 kDa) were used as standard proteins. One microgram of each protein was loaded onto a polyacrylamide gel and submitted to electrophoresis. After Coomassie Blue staining, spots were excised from the gel and submitted either to passive diffusion alone or to passive diffusion followed by SDS removal. Each sample was then submitted to a subsequent migration by SDS-PAGE, and quantities of proteins were calculated based on the corresponding band intensity after Coomassie Blue staining. Protein elution from the gel was performed in the presence of 0.1% SDS. Further SDS removal from the protein samples is a critical step to allow ESI-MS analysis. It has been achieved using ZipTip HPL chromatography, a subset of normal phase chromatography. Separation depends on hydrophilic interactions between the solutes and the hydrophilic stationary phase. Originally designed for detergent and Coomassie Blue stain removal (not retained) from peptide mixtures, the Zip-Tips HPL are well suited for amounts of sample in the microgram range. The protocol was optimized for protein samples using trifluoroacetic acid instead of formic acid in the elution buffer.
As shown in Fig. 1, proteins up to 45 kDa efficiently diffused out of the gel and were recovered from the ZipTip HPL . In contrast, bovine serum albumin (64 kDa) could not be extracted from the polyacrylamide gel. Based on spot intensities after Coomassie Blue staining, the yield of protein recovery for myoglobin, carbonic anhydrase, and protein A reached ϳ60% after passive elution and 30% after the additional SDS removal step.
ESI-MS Detection Limit and Mass Accuracy for Protein Molecular Mass Determination-The detection limit and mass accuracy for ESI-MS analysis of proteins eluted from polyacrylamide gels was determined using decreasing amounts of the standard proteins. The protein solutions eluted from the ZipTip HPL were directly analyzed by ESI-MS using an ion trap mass spectrometer equipped with a nanospray source. Fig. 2 shows the mass spectra obtained for 1, 0.5, 0.2, and 0.1 g of myoglobin loaded on the gel along with the corresponding deconvoluted spectra. Signal-to-noise ratios above 5 were observed in all cases, and a mass accuracy of 0.01% was obtained after deconvolution. Considering that the protein yield is about 30% after passive diffusion followed by SDS removal and that only 0.5 l out of 4 l of the protein eluate was injected into the mass spectrometer for the analysis, mass spectra were acquired from as low as circa 200 fmol of myoglobin. It is noteworthy that the presence of trifluoroacetic acid in the protein samples did not hamper mass spectrometric analyses.
The results obtained with higher molecular mass proteins are summarized in Table I. The detection limit and mass accuracy are similar to the ones observed with myoglobin. Molecular masses can be determined with 0.01% accuracy for picomoles of proteins up to 50 kDa loaded on a polyacrylamide gel.

Molecular Mass-based Identification of -Casein Isoforms
Separated by 2D Gel Electrophoresis-The approach described above can be applied to the study of protein variants and post-translational modifications. To evaluate its efficiency for this purpose, bovine -casein, a 19-kDa protein for which several variants and post-translational modifications are known, was chosen as a model protein.
Forty micrograms of a -casein isoform mixture were separated by 1D and 2D SDS-PAGE. As shown in Fig. 3, the intense band detected around 22 kDa after 1D SDS-PAGE splits into several spots (numbered from 1 to 6) with similar apparent molecular mass but with isoelectric point varying from 5 to 6.3 after 2D SDS-PAGE. These spots were all identified as -casein by the "bottom-up" proteomic approach. Two additional spots of lower molecular mass were also observed and identified as ␤-lactoglobulin (spots 7 and 8).
Each spot was excised, and proteins were eluted from the gel, desalted, and analyzed by ESI-MS. Fig. 4 shows the mass spectra and the corresponding deconvoluted peaks obtained   FIG. 3. 1D and 2D SDS-PAGE of -casein isoform mixture. Trichloracetic acid-precipitated -casein isoform mixture was separated by 1D and 2D electrophoresis as described in "Materials and Methods." The use of a narrow pH range (pH 4 -7) in the first dimension allowed observing the presence of six major isoforms identified as -casein (spots 1-6) under the single band observed on the 1D gel. Two additional spots were identified as ␤-lactoglobulin (spots 7 and 8).

TABLE I Detection limits and mass accuracy of ESI-MS analyses of polyacrylamide gel-eluted standard proteins
Molecular masses have been determined from the deconvolution of mass spectra in the m/z range 700 -1900 and represent the average of three independent analyses. from spots 1 to 6. Mass spectra of spot 1 to 4 (Fig. 4, A-D) showed a unique envelope, which was deconvoluted into a single peak, suggesting the presence of a single protein in each spot. In each case the measured molecular mass corresponded to a -casein isoform. In Fig. 4E, the deconvoluted mass spectrum from spot 5 displayed two peaks indicating a mixture of two electrophoretically unresolved isoforms of -casein. Despite a weaker signal-to-noise ratio and the presence of several isoforms in the spot, the mass spectrum from spot 6 (Fig. 4F) presented a major envelope, which led to a molecular mass of 20099 Da after deconvolution.
-Casein is a well described protein that is found under several forms (variants A, B, B2, E, F, G, and H) in bovine milk (23). This protein has also been shown to harbor two phosphorylation sites (24) and five glycosylation sites (25,26). Finally, the mature protein results from the cleavage of a signal peptide and from the cyclisation of the N-terminal glutamic acid into a pyroglutamic acid. Given the sequence of each variant and considering the known post-translational modifications, the theoretical molecular mass of each isoform can be calculated. Comparison of theoretical molecular mass with experimental ones allowed the identification of each isoform in spots 1 to 5 (Table II). The measured molecular masses for spot 1 (19039 Da) and spot 2 (19120 Da) were in perfect agreement with the molecular masses calculated for variant B with no modification and monophosphorylated, respectively. Spots 3 and 4 both corresponded to a protein with a molecular mass of 19151 Da, thus identifying these isoforms as a monophosphorylated isoform of variant A. The two molecular masses determined in spot 5 corresponded to variant A carrying one (19151 Da) and two (19231 Da) phosphorylations. Finally, the molecular mass determined for spot 6 was not readily attributed to any of the -casein variants at this stage. However, the important mass difference (1028 Da) between the measured and the theoretical molecular masses of variant A strongly suggested the presence of a polysaccharidic chain for this isoform.
The molecular masses of the two isoforms of ␤-lactoglobulin identified in spots 7 and 8 were also determined. The isoform from spot 7 could be identified as variant A of ␤-lactoglobulin, while uncertainty remained on the identification from spot 8. The mass accuracy did not allow to distinguish variant B from neither variant W, in which an isoleucine is replaced by a leucine (ϩ0 Da), nor variant D, in which a glutamic acid is replaced by a glutamine (-1 Da).
It is noteworthy that the measured mass of an intact protein may not necessarily correlate with the predicted mass obtained from the sequence data base, which contains possible errors in protein sequences and does not integrate posttranslational modifications. If observed, the mass difference between two protein isoforms may then allow the identification of a modification without having the complete sequence information.
Post-translational Modification Characterization Using ESI-MS n Analyses of Intact -Casein Isoforms-The determination of the molecular mass may not always be sufficient to identify protein modifications, in particular when there is a combination of modifications and when the mass difference with the theoretical molecular mass is important as it is the case with the -casein isoform from spot 6. Moreover, the micro-amounts of protein present in the spot eluates did not allow the use of biochemical studies to confirm the presence of phosphorylations and to elucidate the saccharidic motif suspected in spot 6. In order to evidence the presence of such  18563 ␤-Lactoglobulin D a Theoretical molecular masses were calculated according to the sequence of the variants and considering both the known post-translational modifications (i.e. removal of the signal peptide 1-22 and cyclisation of Glu22 for -casein, removal of signal peptide 1-16 for ␤-lactoglobulin) and the carbamidomethylation of the cysteine residues (2 for -casein and 5 for ␤-lactoglobulin). b ND, not determined at this stage.
post-translational modifications, we took advantage of the capabilities of an ion trap mass spectrometer and investigated the efficiency of ESI-MS n analyses of gel-eluted proteins. MS/MS experiments were initially conducted on -casein eluates from the previously analyzed spots 1 to 6. Fig. 5 presents the MS/MS spectrum of the 10-charge precursor ion from spot 2 (Fig. 4B, m/z 1912.8 10ϩ ), from which molecular mass measurement indicated the presence of a phosphorylation. This spectrum is dominated by peaks close to the precursor ion at m/z 1910. 9, 1909.2, and 1902.9. Assuming that these ions possess 10 charges, the major peak at m/z 1902.9 corresponds to the loss of one phosphate group from the precursor protein (-98 Da, i.e. H 3 PO 4 ). The other peaks (m/z 1910.9 and m/z 1909.2) correspond to successive losses of water molecules from the precursor ion. MS/MS experiments conducted on other charge states of the protein led to identical results (data not shown). These MS/MS analyses thus confirm that the -casein isoform from spot 2 is monophosphorylated and show the ability of ESI-MS/MS experiments of gel-eluted proteins to highlight the presence of a phosphorylation.
Similar experiments were conducted on -casein isoforms eluted from spots 3, 4, and 5. In each case, the MS/MS spectrum showed the loss of 98 Da from the protein precursor ion (data not shown). These data confirmed the presence of a phosphate group and the molecular mass-based identification. For spot 5, one of the measured molecular masses suggested the presence of two phosphorylations. An MS 3 experiment was then conducted. The fragment ion having lost one phosphate group from the MS/MS spectrum was further fragmented. This MS 3 spectrum showed the loss of a second phosphate group, thus confirming the identification of this isoform as the diphosphorylated variant A (data not shown).
The approach was then applied to the -casein isoform from spot 6, for which molecular mass measurement did not readily identify its modifications. The ESI-MS 2 spectrum of the 12-charge precursor ion (m/z 1675.8 12ϩ ) of the 20099-Da unidentified isoform from spot 6 is presented in Fig. 6A. The spectrum displays several major peaks at m/z 1674.2, 1651.1, 1650.1, 1626.9, and 1626.0. Considering that all of these ions possess 12 charges, they correspond to increasing losses of 19, 296, 308, 587, and 598 Da, respectively, from the precursor protein. These losses can be attributed to neutral losses of one H 2 O molecule, one N-acetyl neuraminic acid moiety (NeuAc), one NeuAc and one H 2 O, two NeuAc, and two NeuAc and one H 2 O, respectively. These data thus show the presence of two NeuAc moieties in the saccharidic motif of the -casein isoform present in spot 6 (Fig. 3). These results are in good agreement with the NeuAc␣(2-3)Gal␤ (1-3)[NeuAc␣ (2-6)]GalNAc tetrasaccharidic moiety previously described as the major glycosylated moiety in -casein (27). The protein mass increment due to such a tetrasaccharide is 947 Da, thus bringing the molecular mass of the non-glycosylated form down to 19152 Da. This molecular mass corresponds to the monophosphorylated variant A. To verify the presence of a phosphorylation in addition to the tetrasaccharidic moiety, MS 3 experiments were conducted. The fragment ion having lost the two NeuAc groups and one H 2 O molecule (m/z 1626.0 12ϩ in the MS 2 spectrum from Fig. 6A) was further fragmented (Fig. 6B). Several fragment ions were observed. Assuming that all fragment ions have the same charge state as the precursor ion, the fragment ion at m/z 1617.3 corresponds to the loss of one phosphate group from the precursor ion. Other fragment ions correspond to losses of one hexose and of one hexose plus an N-acetyl hexosamine. Experiments conducted on other charge states led to identical assign- ments. These results thus confirm the proposed saccharidic moiety and the presence of an additional phosphorylation of -casein variant A.

Post-translational Modification Localization Using Liquid Chromatography-MS/MS Analyses of Eluted -Casein Tryptic
Digest-Elution of proteins from 2D SDS-PAGE gels followed by ESI-MS and ESI-MS n analyses of the intact proteins allowed the identification of protein variants and the detection of phosphorylations and glycosylations. In order to localize the modifications, the eluted protein were subjected to sub-sequent proteolysis, and the resulting peptides were analyzed using MS/MS. -Casein isoforms eluted from the 2D gel were digested in solution with trypsin, and the peptide mixtures were analyzed by on-line capillary high-pressure liquid chromatography-nanospray MS/MS with an ion trap mass spectrometer. This strategy is illustrated here with the isoform from spot 2. As shown in Fig. 7A, the obtained sequence coverage reached 98% for this isoform. Indeed, except for the peptide 43 IAK 45 that was not in the observed range of m/z 370 -2000, the complete sequence of -casein variant B was obtained. This sequence coverage included the N-terminal peptide carrying a pyroglutamic acid at its N terminus and the 53-residue C-terminal peptide carrying the phosphorylation. To localize its phosphorylation site, the MS/MS analysis of the triply charged precursor ion at m/z 1834.6 was performed (spectrum shown in Fig. 7C). The observed b-series of fragment ions allowed the assignment of the phosphorylation on Ser170. The other putative phosphorylation site at Ser148 was not phosphorylated in this isoform. DISCUSSION Among the various approaches developed in proteomics the identification of proteins based on mass spectrometric analyses of intact proteins, i.e. the "top-down" strategy, is emerging as an alternative strategy to the more widely used protein identification based on peptides analyses, i.e. the "bottom-up" strategy. The strategy presented here combines both approaches to characterize protein variants and posttranslational modifications using a benchtop ion trap mass spectrometer and taking advantage of its MS n capabilities. After protein isoforms separation by 2D SDS-PAGE, passive diffusion out of the gel using SDS, and SDS removal by hydrophilic phase chromatography, ESI-MS, and MS n analyses are performed. This approach allows i) the determination of the molecular mass of intact protein isoforms by ESI-MS analysis, ii) the identification of known protein variants, iii) the confirmation of the presence and number of labile post-translational modifications such as phosphorylations and glycosylations by ESI-MS n analyses, and iv) the localization of the post-translational modification by LC-MS/MS analysis of the tryptic digest.
A key step in the described approach is the passive diffusion of proteins out of the polyacrylamide gel. The aim was to develop a protein elution procedure applicable after Coomassie blue staining, which remains the most widely used detection system for proteins separated on SDS-PAGE gels. Optimization of the overall diffusion conditions included the removal of SDS prior to MS analyses. This was achieved using hydrophilic phase ZipTip HPL chromatography. Using this procedure, 5-10 pmol of proteins up to 50 kDa were successfully recovered from the polyacrylamide gel and their molecular mass determined by mass spectrometry. However, higher amounts of material would be necessary for further characterization of the putative modifications. Proteins up to 100 kDa were previously reported to diffuse out of polyacrylamide gels with recovery yields reaching 80% (28). The gels in this case were reverse-stained with the imidazole-zinc system and the elution procedure included more steps. Recently, this approach was applied to the identification of truncated forms of ␣-crystallins (29). The sensitivity of the method presented here and its upper limit in molecular mass may be improved by using less reticulated polyacrylamide gels (Ͻ12.5%) and an ultrasonic assisted diffusion step (19,30), while remaining fast and simple. The sensitivity may also be improved by adding a chromatographic step using a reverse phase C4 column to concentrate the protein sample prior to MS analysis. Recently, an analog of SDS, the acid-labile surfactant, has been shown to facilitate protein identification following polyacrylamide gel electrophoresis compared with the widely used SDS by enhancing peptide recovery (31). The use of this novel surfactant may also improve the recovery of protein elution from the polyacrylamide gel.
A mass accuracy of 0.01% was obtained in all cases for the molecular mass determination of intact proteins by ESI-MS using an ion trap mass spectrometer. This mass accuracy was sufficient to distinguish mutation-based variants of -casein and to detect the presence of post-translational modifications such as glycosylation and phosphorylation. All 2D gel spots of -casein analyzed were thus identified based on mass spectrometric analyses of intact proteins and their molecular mass measurement. In the case of ␤-lactoglobulin, however, the variants differed by only 1 Da and could not be distinguished. A higher resolution analyzer would be required in this case.
The attribution of labile post-translational modifications to -casein variants A and B was ascertained by MS/MS analysis of the intact proteins. The MS/MS parameters were chosen to favor neutral losses corresponding to phosphate groups and glycosydic moieties without fragmenting the protein backbone. Using these conditions the loss of 98 Da, corresponding to the loss of one phosphate group (H 3 PO 4 ) from a serine residue, was observed for -casein isoforms in spots 2, 3, and 4. These analyses thus confirm that these isoforms are monophosphorylated and validate the molecular mass-based identification. MS/MS analysis of the isoform from spot 5 also showed the loss of one phosphate group, while the measured molecular mass suggested a diphosphorylated isoform. Due to the low collision energy used in the MS/MS experiment to avoid backbone fragmentation, a further MS step was necessary to observe the loss of the second phosphate group from this protein. These MS 3 analyses, readily performed on the ion trap mass spectrometer, could also be performed on the new generations of hybrid instruments such as a Q-trap mass spectrometer.
Linkages engaged by glycosidic moieties and especially by sialic acids are more labile than those engaged by phosphates. This explains the preferential loss of two neutral sialic acids in the MS 2 spectrum of the 20099 Da protein in spot 6. The presence of NeuAc has already been evidenced in -casein (32). A disialylated species was previously detected as the major subcomponent of -casein (27). Moreover, the NeuAc␣(2-3)Gal␤ (1-3)[NeuAc␣ (2-6)]GalNAc tetrasaccharide containing two NeuAc was shown to be the most dominant sugar chain. In order to confirm the suggested glycosidic moiety as well as the identity of the -casein variant in spot 6, a MS 3 spectrum was acquired by fragmenting the de-disialylated fragment in the MS 2 spectrum. The observed losses of a phosphoric acid, a hexose, and an acetyl-hexosamine both confirmed the structure proposed for the sugar chain and allowed the identification of the protein as the glycosylated form of the monophosphorylated variant A, in perfect agreement with the molecular mass measured on the intact protein.
In agreement with previously reported data, we confirmed that variants A and B were the major isoforms of -casein and were present in the mixture under various phosphorylation and glycosylation states (24). These results highlight the possibilities offered by the coupling of 2D-PAGE separation with MS and MS n analyses of intact proteins. Indeed, because of the often high heterogeneity of glycosylation, separation of the various isoforms is a key step for the success of their characterization.
To go further in the identification of protein variants and to localize the modifications, MS/MS analysis of the tryptic digest of the protein was performed. After diffusion of the protein out of the polyacrylamide gel and MS analysis of the intact protein, the tryptic digestion was performed in solution. This may represent an advantage over an in-gel digestion for an efficient recovery of all types of tryptic peptides. This strategy was illustrated with the -casein isoform from spot 2, which sequence coverage reached 98%. This sequence coverage included the 53-residue peptide bearing the -casein phosphorylation. For all monophosphorylated isoforms of -casein, the MS/MS spectrum of this peptide allowed the localization of the phosphorylation at the same residue, Ser170. These results are consistent with previous studies, which showed that -casein was fully phosphorylated at Ser170 and only partially at Ser148 (33). Surprisingly, monophosphorylated variant A isoforms appeared at different isoelectric point values on the 2D gel, which could not be explained by differences in the measured molecular masses and by the phosphorylation localization. One hypothesis is that the -casein mixture contains deamidated isoforms. This modification would explain the isoelectric point shift toward the acidic pH. Moreover, as shown with ␤-lactoglobulin, the 1-Da mass increment due to this modification would not be assessed by ESI-MS with a 0.01% mass accuracy.
The strategy reported here aims at characterizing protein variants and post-translational modifications of proteins separated on SDS-PAGE gels by determining the molecular mass of the intact protein. These analyses then took advantage of the MS n capabilities of ion trap mass spectrometers to identify labile modifications, like phosphorylations and glycosylations, directly on the intact protein. This strategy finally allows the digestion of the protein in solution and the analysis of the resulting peptides for the localization of the modification. The determination of the molecular mass of the intact protein is a valuable piece of information for choosing the appropriate strategy for the localization of the modification. The coupling of 2D gel electrophoresis separation to mass spectrometric analyses of gel-eluted intact proteins as described should represent a promising strategy in proteomics.
Acknowledgments-We are grateful to Dr. Michel Riviè re for helpful discussion.
* This work was supported in part by grants from the Centre National de la Recherche Scientifique (ACI program Proté omique et Gé nie des Proté ines), the Association pour la Recherche contre le Cancer (Ré seau ARECA, Proté omique et Cancer), the Conseil Régional Midi-Pyré né es, the Gé nopole Toulouse Midi-Pyré né es, and the Consortium Pierre Fabre Mé dicament (Program ASG, Proté omique et nouvelles cibles thé rapeutiques). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.