|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 6:1274-1286, 2007.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
In recent years, alternative strategies to the classical two-dimensional gel electrophoresis approaches based on the chromatographic separation of complex mixtures of protease-generated peptides, also referred to as "shotgun proteomics," have been developed. Peptides eluting from an RP1-HPLC column are sprayed into a mass spectrometer that detects, isolates, and fragments specific peptide ions to obtain structural information, which is correlated against protein databases to identify the peptide sequence. Recently these approaches have incorporated stable isotopic labeling techniques to produce relative quantitative information besides protein identification (1). When a peptide sample labeled with a heavy isotope is equimolarly mixed with an unlabeled sample, peptide ions appear as doublets showing the same intensity separated by a mass-to-charge offset determined by the mass of the label and the charge state of the peptide. Peptides differentially represented in each sample show departures from the expected 1:1 ratio, allowing the quantification of changes in the corresponding protein levels. Stable isotopic labeling can be achieved chemically by modifying reactive peptide groups (for example, cysteine side chains) with site-specific reactants (2, 3), metabolically by growing cells in culture media enriched in heavy isotopes such as 15N- or 13C-labeled amino acids (46), or enzymatically by adding [18O]water molecules into peptide C termini during or after proteolytic cleavage (710). Among these techniques, 18O labeling is emerging as a powerful labeling strategy for quantitative proteomics applications. Unlike other strategies that are restricted to peptides containing specific amino acids, any peptide generated by proteolytic digestion can be labeled with H218O (912).
Peptide labeling with 18O tags is performed by digesting the proteins with a proper endoprotease, originally trypsin, in the presence of [18O]water; this produces incorporation of two 18O atoms at the C-terminal end of peptides. It has been shown that proteolytic 18O labeling can be decoupled from protein digestion so that the endoprotease can be used in a separate step to label peptides after proteolysis has taken place. This procedure has the advantage that proteins can be kept in solution prior to digestion using adequate chaotropic buffers or surfactants, that labeling can be performed with a limited volume of H218O, and that digestion and labeling conditions can be optimized separately (10). This postlabeling strategy is becoming very popular mainly due to the simplicity of the exchange reaction and the availability of fast and efficient peptide desalting procedures and has been adopted by most researchers using this technique to make relative quantification of complex protein mixtures in solution.
The 18O labeling process introduces either 2- or 4-Da mass tags depending on the number of C-terminal oxygen atoms exchanged. Because the efficiency of the exchange reaction is not always complete with all peptides, a rather complex isotopic envelope pattern is usually obtained due to the overlap of the natural envelopes of unlabeled, singly labeled, and doubly labeled peptides. This has limited the use of three-dimensional ion trap mass spectrometers, which produce low resolution survey scans, for peptide quantification by this labeling method. However, ion traps can perform high resolution mass spectra (or "ZoomScans") of selected ions over a limited m/z range, and we have demonstrated previously that by using these scanning modes in a linear ion trap and taking advantage of its high scanning speed it is possible to make accurate quantitative measurements without compromising the ability of this machine to perform high throughput peptide identification (13).
Described software applications for quantification from 18O labeling data either use simple schemes assuming complete exchange to the doubly labeled species so that the 16O/18O ratio is directly computed as the intensity ratio of the monoisotopic peak and the peak located at +4 Da (14, 15) or implement quantification algorithms that basically follow the formulation of Yao et al. (8) and Zang et al. (16). These algorithms compute the 16O/18O ratio either by sequentially subtracting the contributions of the peak heights of the three overlapping isotopic clusters (17, 18) or by directly measuring the heights of the first, third, and fifth isotopic peaks and then applying an analytical formula that corrects the intensities for the cluster overlap (17, 19). Here we present an advanced quantification algorithm specifically designed to deal with peptides labeled by postdigestion 18O exchange. Our method fits the entire isotopic envelope to parameters related with a kinetic exchange model, allowing at the same time an accurate calculation of the relative proportion of peptides in the original samples and of the specific labeling efficiency of each one of the peptides. By improving the quantification procedure and allowing a full control over potential artifacts, we think that our method is a significant step toward the complete automation of the method for relative protein quantification using this popular labeling strategy.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
-lactalbumin (0.5 pmol/µl). 5 µl of this mixture were used for enzymatic digestion.
Preparation of Endothelial Cell Extracts
Cells were grown at 37 °C at 5% CO2. The EA.hy926 cell line (kindly provided by Dr. Antonio Martinez-Ruiz) was cultured in Dulbecco's modified Eagle's medium with hypoxanthine, aminopterin, and thymidine supplement, 20% fetal bovine serum, 100 units/ml penicillin, 100 µg/ml streptomycin, and 5 µg/ml gentamicin. For preparing cell protein extracts, confluent cells were scraped and resuspended in nondenaturing lysis solution (50 mm Tris-HCl, pH 7.4, 300 mm NaCl, 5 mm EDTA, 0.1 mm neocuproine, and 1% Triton X-100 plus protease inhibitor mixture), incubated in ice for 15 min, and centrifuged at 10,000 x g for 15 min at 4 °C. Supernatants were collected, and the protein content was quantified using the Bradford reagent (Bio-Rad).
Preparation of T Cell Extracts
These extracts were kindly provided by Drs. Montse Carrascal and Joaquín Abián and were prepared as follows. Mononuclear cells were purified from PBS-diluted blood using the Ficoll gradient centrifugation method as described by Boyum (20). To this end, Ficoll-Paque (Amersham Biosciences) was used according to the instructions of the manufacturer. The purity of the mononuclear fraction was then verified by checking under the microscope the fraction of mononuclear cells among all cells observed; it was around the expected value of 95%. Activation was carried according to the following procedure: either 5 ml of PBS (negative control) or 5 ml of 10 µg/µl PBS-diluted
-CD3 antibody (BD Biosciences) were added to T75 vials to coat the surface with antibody. After a 5-h incubation, flasks were washed twice with PBS, and 55 x 106 cells, diluted in 10% FCS-supplemented RPMI 1640 medium, were added to each vial. After 15 min of incubation, cells were washed twice with PBS and centrifuged at 500 x g for 10 min. Cell pellets were stored at 80 °C until use. To extract proteins, lysis buffer (6 m urea, 50 mm Tris, 10 mm DTT) was then added. After alkylation with iodoacetamide (55 mm), protein extracts were acetone-precipitated prior to digestion.
Protein Digestion and Postdigestion Labeling
Total cell extracts from endothelial cells were subjected to cold acetone precipitation, lyophilized to dryness, and suspended in 8 m urea, 25 mm ammonium bicarbonate. 200 µg of starting protein material were used for cellular extracts, and 5 µl were used for the model mixture of proteins. Proteins were reduced for 1 h in 10 mm DTT, 25 mm ammonium bicarbonate, pH 8.25, at 37 °C and then alkylated for 45 min in 50 mm iodoacetamide. Digestion was carried out overnight at 37 °C with trypsin (1:50 protease/protein). In the case of the T cell extract, it was digested upon reconstitution in 25 mm ammonium bicarbonate, 2 m urea without further treatment. After acidification with 1% TFA, the digest was lyophilized. The dried pellet was then resuspended in 50 µl of 0.1% TFA and desalted in a Vydac C8 RP column (2.1 x 25 mm), using a Beckman Gold HPLC system, by one-step elution with 80% acetonitrile containing 0.1% TFA. The clean peptide pool was then dried down and dissolved in 40 µl of 20% acetonitrile (v/v); 40 µl of 100 mm ammonium acetate, pH 6.75, was added together with 40 µl of 50 mm calcium chloride and 10 µg of sequence grade trypsin; and the resulting mixture was dried in a vacuum centrifuge. The sample was then resuspended in 40 µl of either H216O or H218O (95%, Sigma-Aldrich) containing 20% acetonitrile and incubated at 37 °C for 48 h. Labeling was stopped with 1% formic acid.
Mass Spectrometry
Protein digests were analyzed by LC-ESI-linear ion trap-MS/MS using a Surveyor LC system coupled to a linear ion trap mass spectrometer model LTQ (Thermo-Finnigan, San Jose, CA). The separation column was a 0.18 x 150-mm Biobasic RP column (ThermoHypersil-Keystone) operating at 1.5 µl/min. Peptides were eluted using 300-min gradients from 5 to 40% solvent B (solvent A: 0.1% formic acid; solvent B: 0.1% formic acid, 80% acetonitrile). The linear ion trap was operated in data-dependent ZoomScan and MS/MS switching mode using the three most intense precursors detected in a survey scan from 400 to 1600 amu (three microscans). ZoomScan mass windows were set to 12 Da to allow monitoring of the entire 16O/18O isotopic envelope of doubly and triply charged peptides irrespective of which isotopic peak was chosen as the precursor. Singly charged ions were excluded for MS/MS analysis. ZoomScan settings (maximum injection time, 50 ms; zoom target parameter, 3000 ions; and the number of microscans, 10) were exactly as optimized in a previous work (13). Normalized collision energy was set to 35%, and dynamic exclusion was applied during 3-min periods to avoid fragmenting each ion more than twice.
Database Search
Protein identification in human International Protein Index (IPI) version 3.12 database, downloaded from the European Bioinformatics Institute website, was carried out as described using SEQUEST. Database search parameters included as variable modifications Met oxidation and Lys and Arg + 4-Da labeling and a fixed carbamidomethylation of Cys. Calculation of error rates of peptide identification was carried out as we have proposed earlier (21). Only ZoomScan spectra corresponding to peptide matches with a false discovery rate lower than 5% were used for quantification.
Kinetic Model for Trypsin-catalyzed 18O Labeling
Let's consider a general case where peptides are subjected to a certain labeling reaction where two labeled species are produced using a reagent whose concentration is in large excess. Let's also assume that the labeled species may return to the unlabeled state and that the labeling reaction takes place according to the following kinetic mechanism,
![]() |
where B0 stands for the concentration of non-labeled peptide; B1 and B2 stand for the concentration of mono- and dilabeled peptide, respectively; k is the kinetic constant; and
and p are constants. A kinetic analysis of this equation (see the supplemental information) reveals that the concentrations of the three species along the labeling reaction are mathematically related in a manner that does not depend on the time or the kinetic constant.
In the case of trypsin-catalyzed H218O incorporation into peptides, the parameter p is equal to the fraction of water molecules containing 18O, i.e. to the purity of H218O. Besides oxygen exchange takes place in one of the two atoms at the C termini; hence only half the fraction of monolabeled species will give rise to a dilabeled species or return to the non-labeled state. Therefore,
= 0.5, and the relation between the three species also becomes independent on the constant p (see the supplemental information). This relation may be expressed as a function of the total peptide concentration (B = B0 + B1 + B2)) and the labeling efficiency f.
![]() |
![]() |
![]() |
Note that it would also be possible to derive Equations 13 by assuming that the exchange of the two oxygen atoms is independent of each other. In other words, when
= 0.5, the labeled oxygen atoms become homogeneously distributed among the mono- and dilabeled peptide species even though a true equilibrium between these species has not actually been reached.
ZoomScan Spectra Preprocessing and Initial Quantification Step
ZoomScan spectra are base line-corrected by subtracting from all points the median intensity of all local minima. A smoothened spectrum is then obtained by applying a moving average window, which is weighted using a tricubic function to better preserve peak heights. An initial estimate of relative isotope amounts is carried out on the basis of the formulation described by Zang et al. (16). The amounts of non-labeled peptide (A) and of mono- (B1) and dilabeled peptide (B2), expressed in the same units as the ion intensities determined by the MS detector, are determined from
![]() |
![]() |
![]() |
where I0, I2, and I4 are the raw intensities of the first, third, and fifth peak maxima of the isotopic cluster and M0, M2, and M4 are the relative proportions of the first (monoisotopic), third, and fifth peaks in the natural isotopic envelope. They are calculated from the elemental composition using a combinatorial subroutine determined from the peptide sequence. Isotopic abundance values are those reported by Chapman (22). The monoisotopic mass is also determined from the peptide sequence. Peptide sequence and charge state are taken from SEQUEST outputs once they are statistically analyzed to determine true peptide assignations.
Refinement of Quantification Using an Envelope Shape Model and Calculation of Standard Errors
The initial estimates of the relative content of the three isotopic species mentioned above (A, B1, and B2) are used as starting parameters for a procedure where the isotopic envelope profile of the ZoomScan spectrum is fitted to a theoretical curve. Fitting is performed by a nonlinear, Newton-Gauss unweighted least squares iterative method (23). The fitted function is the area covered by the first eight peaks of the signal doublet, which is modeled as a sum of three peptide isotopic envelopes, each one composed of a set of peaks evenly spaced 1/z units. To model peptide peaks, we observed that Gaussian distributions did not explain adequately the behavior of ions upon resonance ejection, which showed a variable degree of leptokurtosis. For this reason, peaks were modeled using mixed Gaussian/double exponential distributions
![]() |
where I(x) is the intensity at m/z x for a peak whose area is the unity, µ is the location parameter (the mean in the Gaussian distribution),
is the scale parameter (the standard deviation in the Gaussian distribution), and ß evaluates the proportion of double exponential component within the mixed distribution; the two last parameters are individually fitted for each spectrum. Five variable parameters are used to fit the function to the isotope envelope: A, B, f,
, and ß. Standard errors for these parameters were calculated from the matrix of partial derivatives as described elsewhere (23).
A Modified Algorithm Including a Correction for Incomplete Labeling Efficiency
The corrected method was identical to the original one except that instead of fitting the parameters A, B1, and B2 the procedure fits the parameters A, B, and f. ZoomScan spectra were modeled, as in the original protocol, as a sum of three isotopic envelopes; the first one was computed as the sum of A and B0 species (see Fig. 1), and B0, B1, and B2 were calculated from B and f using Equations 13. The modified algorithm used f = 1 as starting value for the curve fitting. This procedure allows a direct determination of corrected A and B proportions and of the labeling efficiency of each one of the peptides.
|
![]() |
![]() |
Applying Equations 13 gives Equation 10.
![]() |
Using this equation it is possible to estimate the effect of labeling efficiency on the ratio determined using the standard method.
Statistical Analysis of Differential Expression Events
Expression ratios were statistically analyzed by a method similar to that described by Li et al. (24). Briefly a ratio histogram in a log2 scale was constructed, and the distribution was fitted by least squares to a Gaussian curve, determining the mean and the S.D. A standard two-tailed z test was then used to determine probability associated to peptides showing an expression ratio significantly higher or lower than the mean, which is usually, but not always, centered on zero. By introducing a sample-dependent normalization, this method corrects for any systematic errors introduced during sample handling (24). To calculate the final p values, the standard error of the fitted parameters (i.e. the experimental error performed in the calculation of the ratio) was also taken into account by using error propagation theory (25, 26); this compensated for the fact that a noticeable departure from the center of distribution may not be statistically significant if the error made during the determination of the ratio (the fitting procedure) is sufficiently high.
The final statistical significance of potential differential expression events was assessed by determining the false discovery rates (FDRs) (27, 28), defined as the proportion of peptides expected to pass the p threshold, calculated from the fitted Gaussian distribution and the total number of determinations, among the observed number of peptides actually passing this threshold.
| RESULTS |
|---|
|
|
|---|
In this work we developed an improved algorithm that determines the same parameters by curve fitting of the isotopic envelope obtained in ZoomScan spectra. This method is based on the same assumption, uses the results obtained by our previous method as initial estimates, and makes a curve fitting to the peak envelope by using a peak shape model as explained under "Experimental Procedures" and exemplified in Fig. 1, AC. Among other advantages, this algorithm, to which we will refer to as the standard method, allowed a more accurate quantification of proportions and also calculation of errors of estimates from curve-fitting residuals.
Quantification Bias Associated to Incomplete 18O Exchange
To test the performance of the standard method in the practice, a negative control was prepared by making a comparative expression analysis of two identical aliquots of a peptide digest from a crude proteome extract from endothelial cells. The two samples were labeled with H216O and H218O water, respectively, and mixed in equal proportion, and their relative proportions were analyzed. The distribution of ratio values was analyzed using a base 2 logarithmic scale, which is a common practice for gene expression data and is expected to produce a symmetric distribution tightly centered on zero. We observed that the distribution of log2 ratios was not correctly centered on the expected value but was significantly biased toward the non-labeled sample. Besides it was significantly asymmetric and showed an extended right tail (Fig. 2A).
|
This was an unexpected observation given the reported good labeling behavior of tryptic peptide digests of BSA and other widely used purified proteins (10, 13, 29). In further experiments with different proteomes, we obtained similar results, corroborating the fact that the performance of the labeling process cannot be extrapolated from simple protein mixtures to very complex samples, such as those derived from the analysis of whole proteomes. Using higher enzyme/substrate ratios or higher concentrations of organic solvents or diluting the peptides in these solvents before adding water and enzyme failed to avoid the presence of residual amounts of peptides labeled at a low extent. These results suggest that this resistance toward C-terminal oxygen exchange is more strongly dependent on an unknown set of peptide-specific structural constraints, which are difficult to be controlled by the user, than on the particular labeling conditions. This is particularly relevant in the case of complex samples where, among a large number of structural patterns, there is a certain probability of finding a subset displaying very slow exchange kinetics. Incomplete labeling of these particular peptides may produce deviations in the expected mean ratio, suggesting true differential expression events.
A Kinetic Model for 18O Exchange
In an attempt to overcome this problem, we analyzed the kinetic behavior of the labeling process. As described under "Experimental Procedures," a mathematical analysis of the kinetic process revealed that the relative proportions of non-labeled and mono- and dilabeled species from any peptide and at any time along the labeling reaction are expected to relate to each other according to Equations 13 even when the purity of [18O]water is low. According to this equation, it would be possible to predict the relative proportion of the three isotopic species for any peptide if the labeling efficiency f is known. Conversely it would be possible to estimate the proportion of non-labeled species (B0) from that of the other two species (B1 and B2) and hence to correct for the lack of a complete incorporation of at least one 18O atom.
To check the validity of the kinetic model and the predictions of Equations 13, a digested proteome extract from a preparation of T cells was dried down and subjected to trypsin-catalyzed 18O labeling under incomplete labeling conditions. This was done by performing the labeling incubation for 24 h only, a time that we have observed previously not to be sufficient to achieve a complete exchange for many peptides (13). The labeled peptides were analyzed by linear ion trap mass spectrometry, and the ZoomScan spectra were used for quantification using the standard method. Because in this experiment no non-labeled sample was mixed (i.e. A = 0), this analysis allowed us to determine the relative amounts of the three isotopic species (B0, B1, and B2) and from these the labeling efficiency f for each one of the peptides. In Fig. 3, the observed proportions of the different species (circles) were compared with the theoretical predictions given by Equations 13 (solid lines). As shown, the majority of the peptides had a low proportion of non-labeled species, around or less than 10% (open circles); the mean labeling efficiency was around 0.7. As expected from our previous observations, a noticeable number of peptides showed a significant proportion of non-labeled species, which in some cases reached 70% (Fig. 3). The relative proportion of the three species was in perfect agreement with the predictions of Equations 13 in all the peptides without any exception. The same results have been obtained in our laboratory after the analysis of other samples (not shown), strongly suggesting that the kinetic model is universally valid for this kind of labeling process, and hence predictions given by Equations 13 may be potentially used to correct for an incomplete labeling efficiency.
|
Performance of the Corrected Method for Equimolar Protein Mixtures
The performance of the corrected method was tested in the practice by applying it to the analysis of simple, equimolar protein mixtures subjected to incomplete labeling conditions. A tryptic digest of a mixture containing five different proteins was divided into two identical aliquots, and they were subjected to trypsin-catalyzed enzymatic labeling in either H216O or H218O for 24 h. The two samples were then mixed and analyzed by linear ion trap mass spectrometry. The resulting spectra were then used for relative quantification using the standard and corrected methods. In Fig. 4A, the ratios determined from peptides originating from the same protein were averaged and compared. As shown, by using the standard method only two proteins could be accurately quantified, and the ratios of the other three were clearly overestimated; besides these ratios, when considered at the peptide level, showed a large dispersion. In clear contrast, the corrected method took into account the bias produced by low labeling efficiencies, yielding ratios that in all the cases were very close to the unity; this was reflected in the experimental error of the average ratios, which were much lower than those obtained with the standard method.
|
The corrected algorithm was then applied to the analysis of results obtained with the endothelial cell peptide extract, which was labeled using the optimized conditions and was analyzed previously using the standard method (Fig. 2A). As shown in Fig. 2B, the corrected algorithm eliminated the bias produced by incomplete incorporation, generating a symmetric distribution of ratios centered on the 1:1 value, which was satisfactorily fitted by a Gaussian envelope. Statistical analysis of these results failed to provide any peptide ratio showing a significant deviation from the average, thus confirming the internal consistency of the method. The distribution of labeling efficiencies calculated by the corrected method together with a plot of efficiencies versus log2 ratios is shown in Fig. 2C.
Performance of the Corrected Method When Peptide Ratios Take Extreme Values
In a further set of experiments we tested the performance of the method in the extreme situations of peptide ratios. For this purpose we used the same dataset used to test the kinetic model (Fig. 3), containing labeled peptides only (i.e. A = 0 and B = 1), and analyzed the results obtained by subjecting this sample to quantification using the corrected method. This served to determine to what extent the modified algorithm was able to correctly assign the signal coming from the first isotopic cluster to the B0 species. As shown in Fig. 5A, where the estimated labeling efficiencies (i.e. the efficiencies computed by the method) are plotted as a function of the real ones (i.e. the efficiencies calculated in Fig. 3), the parameter f was calculated with good accuracy but only when it was higher than 0.4; below this value, it was overestimated so that part of the intensity of the first isotopic cluster is erroneously assigned to the A component. Consistently when the estimated labeling efficiencies were plotted as a function of the ratios calculated using the standard method, the shift toward higher values was more marked than that predicted according to Equation 10 when labeling efficiencies were lower than 0.4 (Fig. 5B, compare empty circles with the curve). Similarly the corrected method was only able to make an efficient correction of the effect of labeling efficiency when it was higher that 0.4 (Fig. 5B, filled circles). These results indicated that in these conditions a minimum labeling efficiency was actually needed in the practice to make an efficient correction; when a corrected peptide ratio is low, it should only be trusted provided that the labeling efficiency estimated by the algorithm is at least higher than 0.5.
|
Although the labeling efficiency has no significant effect on the calculation of corrected ratios when A >> B, it should be noted that in the extreme case of a completely non-labeled peptide it is mathematically impossible to distinguish between a very high ratio (A
1) and a very low labeling efficiency (f
0). In this situation, these two solutions produce the same curve fitting to the isotope envelope. The reason that the corrected algorithm produces consistent and reproducible results with A
1 is that the starting parameters for the curve fitting are estimated from a model that implicitly assumes that there are no non-labeled B species (B0
0), i.e. that labeling efficiency is high; hence the corrected method converges to the first solution. Although this may in theory confound the corrected algorithm, in the practice it only happens when the labeling efficiency is very close to zero, and this is a situation that we have never observed experimentally even when using incomplete labeling conditions. Besides this situation is easily avoided in the practice by checking that labeling efficiencies do not take values that are too low. In a situation like that of Fig. 4B, for instance, it would be enough to keep the efficiencies above 0.2, whereas in conditions with more extreme ratios, like those of Fig. 5, B and C, labeling efficiencies must be cautiously kept above 0.5. We conclude that our analysis gives solid evidence that the corrected method makes very efficient corrections for labeling efficiency in any range of ratios provided that a reasonable level of labeling is experimentally achieved.
Application of the Corrected Method to the Analysis of Differential Expression Events Induced by Stimulation of T Cells
After the encouraging results obtained with controlled experiments where the peptide ratios were a priori known, we applied this method to analyze changes in the protein pattern of small amounts of a crude proteome from a preparation of T cells when they were stimulated with anti-CD3 antibody. For this end, samples from control and stimulated preparations were adjusted for the total protein content (5 µg each), trypsin-digested, desalted, and subjected to enzymatic labeling for 48 h. The stimulated sample was labeled with H218O, whereas the control was incubated with non-labeled water. The mixtures were then mixed and analyzed by linear ion trap mass spectrometry in only one HPLC run using a long gradient. MS/MS spectra were then subjected to a database search using SEQUEST, and the FDR of peptide identification was calculated using a previously published empirical method (21). 849 MS/MS spectra were encountered having a FDR of 5%; they allowed the identification of 570 unique peptides belonging to 235 unique proteins. Among the ZoomScan spectra corresponding to these peptides, 495 were of sufficient quality to allow quantification; these spectra corresponded to 317 unique peptides belonging to 119 different proteins.
When quantification data obtained using the standard method was analyzed, despite the long labeling times used, the distribution of log2 ratio values was again found to be significantly biased toward the control sample and showed an extended right tail (Fig. 6A). Analyzing the data using the corrected method, it was possible to estimate the labeling efficiency f of each one of the quantified peptides. A semilog plot of f versus the ratios obtained by the standard method (Fig. 6C) revealed that the vast majority of peptides were labeled with an efficiency around 0.75; this is a typical result using the labeling conditions optimized in a previous study (13), and at this efficiency the proportion of non-labeled sample (B0) only accounts for 6% of the total amount of B. However, Fig. 6C also revealed the presence of a subpopulation of peptides labeled with a lower efficiency whose ratios are shifted toward higher values. These peptides were responsible for the bias of the distribution toward higher ratios and the presence of a right tail. In contrast, when the corrected method was used, the bias in the distribution was efficiently corrected and appeared centered at the 1:1 ratio, indicating that the two samples were in fact correctly adjusted for their protein content, and the log2 ratio histogram could be fitted to a symmetric Gaussian distribution (Fig. 6, B and D) with no evidence of a right tail, indicating that it was produced by peptides with a low labeling efficiency and not by true differential expression events.
|
|
|
The expression changes detected by this method appear to reflect true differential expression events because the three proteins identified have been implicated previously in T cell activation and other functions. Thus, thrombospondin-1, an adhesive secreted glycoprotein that mediates cell-to-cell and cell-to-matrix interactions, has been shown to play a role in T cell activation by anti-CD3 (30); interaction of this protein with CD47 has also been shown to mediate expansion of inflammatory T cells (31). Histone H4 gene activation has been related to cell cycle activation (32), and it has also been implied in chromatin remodeling in response to T cell activation (33). Similarly actin cytoskeletal rearrangement is known to occur upon T cell stimulation (34). In this regard, the non-tryptic actin peptide is probably the result of a proteolytic event. Besides actin presents a rather large number of isoforms in databases, and the four identified peptides are not homogeneously distributed among them, making it impossible to assign unequivocally the observed changes at the protein level; in addition, each one of the peptides is present in at least one isoform that does not contain the other ones. Cytoskeletal rearrangements probably affect the several actin types in a different manner, thus explaining the differences in expression ratios observed at the peptide level that were not observed in any other protein in this experiment (see supplemental information). In conclusion, the method was able to detect specific expression changes in three proteins, which are consistent with the process of T cell activation, among a pool of more than 100 proteins that did not exhibit significant variation in their cellular content.
| DISCUSSION |
|---|
|
|
|---|
In a previous work we reported an optimized labeling protocol and demonstrated that this method may be used for peptide quantification by using linear ion trap mass spectrometry and performing high resolution scans in narrow mass ranges (ZoomScan) (13). In the present work, we tried to further automate this method by developing a peak-fitting algorithm that takes into account all the information contained in the whole isotopic envelope and not just the heights of the first, third, and fifth peaks (Fig. 1, left panels). This algorithm was more robust than previous ones, and by assessing the goodness of fit, it allowed the evaluation of the accuracy of fit and hence the introduction of statistically relevant information about the estimation of ratios. However, when this improved algorithm was applied in the practice to the analysis of a complex mixture of peptides derived from real proteomes, we noticed that a small but significant proportion of the peptides had no complete incorporation of at least one 18O atom even when optimized protocols were used, producing a significant bias in the ratio distribution and false positive changes in expression ratios. Apparently when labeling very complex peptide pools, the probability of finding specific peptides showing a particularly low rate of 16O/18O exchange increased with the number of different peptides present in the sample so that some of them may be spuriously found not to be completely labeled with at least one 18O atom.
To deal with this situation, we considered that the labeling process should actually be treated as a kinetic reaction where each one of the peptides had its own specific rate and reached a certain labeling efficiency after the exchange reaction step. Consistently the peptide population is expected to display a certain distribution of labeling efficiencies so that the proportion of completely non-labeled species should never be neglected; it may be more or less close to zero depending on the labeling efficiency of each peptide. When peptide labeling behavior was analyzed in the practice, we found that labeling of all peptides analyzed followed precisely a kinetic exchange model where the fraction of non-labeled and mono- and dilabeled species could be accurately predicted as a function of the labeling efficiency. This relation, given by Equations 13, was independent of kinetic exchange rates and hence of peptide sequence or catalytic rate of the enzyme. The existence of a fixed relation between the three B species (B0, B1, and B2) allowed the development of a modified algorithm that by direct curve fitting simultaneously determined the correct proportion of the A and B species and also of the labeling efficiency f that better explained the isotope distribution (Fig. 1). This procedure maintains the degrees of freedom and overcomes the limitation of current methods; it is in theory of potential application in any point of the exchange reaction process even when low purity [18O]water is used.
Our results also indicate that the modified method is accurate in the practice even when peptide ratios take extreme values provided that a minimum labeling efficiency, as calculated by the algorithm itself, is reached. Although peptides labeled with lower efficiencies could be accurately quantified, a conservative criterion is to use a minimum f value of about 0.4 for a trustable quantification. This means that the modified algorithm is able to efficiently compensate for a proportion of non-labeled peptide species as high as 36%. Expression ratios for peptides with lower f values should be considered with caution or even rejected. Direct calculation of labeling efficiency is also extremely useful in the practice to determine the goodness of the experiment and the general confidence to interpret quantification of peptide pairs. For this purpose the efficiencies estimated by the algorithm itself may be used to ascertain the degree of labeling achieved in the experiment; the efficiency versus log2 ratios plots are particularly informative to inspect the validity of the corrected ratios. Some examples of these plots, obtained under different conditions of labeling, are presented in this work where it is shown how experimental bias related to labeling efficiency is effectively removed, and there is a complete absence of false positives when performing the quantification of complex peptide samples against themselves.
The validity of the modified method was tested in the practice by determining significant expression changes produced by the activation of T cells with only 5 µg of protein extracts. More than 300 peptide pairs, corresponding to 100 proteins, could be quantified, yielding a symmetric Gaussian distribution of ratios closely centered around 1:1. Although the width of the ratio distribution suggested that statistically significant changes in expression ratios even lower than 2-fold could be accurately detected at the 95% confidence level, we used a more rigorous statistical criterion. First, we took advantage of the curve-fitting algorithm, which allowed estimation of the error associated to calculation of the ratio in each determination; this error was taken into account in the calculation of the p value with respect to the null hypothesis. Second, we considered all the expression data obtained in the experiment as a whole and calculated the FDR, or proportion of peptides that were expected to display just by chance a similar or higher change in expression ratio, at the same p threshold. The importance of taking into account all these factors becomes evident by the following considerations. Using the conventional statistical significance level of 95%, i.e. a p threshold of 0.05, 43 peptide pairs would have been detected as having a significant deviation in the ratio, corresponding to the 19 unique peptides shown in Table I. However, the 5% of the total population of quantified peptide pairs used to construct the Gaussian distribution amounts to 25 peptides; this means that 58% of the observed changes are expected to occur by chance alone (Table I). In addition, if the propagation of fit errors and the labeling efficiency correction are also omitted, the number of spectra in the set of differential expression candidates rises to 87 among which 66.3% are expected to occur by chance alone. Obviously and despite that it is widely used in this kind of quantitative experiments, using a prefixed p value is not an acceptable criterion by itself and leads to a false impression of high sensitivity at the expense of error rates that may become very large. In this particular experiment, we found it necessary to lower the p threshold down to 0.003, i.e. to use a 99.7% confidence level, until we achieved a satisfactory maximum error rate of 5%. The validity of this procedure is validated by the fact that all the expression changes detected at the peptide level are coherent among peptides belonging to the same proteins and are assigned to proteins that are known to be associated with T cell activation.
In conclusion, our results show that by introducing a correction for labeling efficiency in the quantification algorithm, 18O labeling is a good quantitative method for shotgun proteomics analysis of highly complex peptide mixtures. The method described here is not only more robust and accurate that others described previously, but it also provides a means to determine the labeling performance obtained in the experiment. We think that this is an important advantage in the practice because stable isotope dilution techniques for quantitative proteomics are still not widely implemented, and skilled users with experience in controlling all the experimental factors related to labeling incorporation and quantification software are required. We also think that our approach makes the 18O labeling method particularly attractive due to its simplicity, its almost universal applicability, the reasonable tradeoff between performance and costs, and the fact that this may be performed in the practice without high resolution machines using linear ion traps.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, February 23, 2007, DOI 10.1074/mcp.T600029-MCP200
1 The abbreviations used are: RP, reverse phase; FDR, false discovery rate. ![]()
* This work was supported by Comisión Interministerial de Ciencia y Tecnología Grants BIO2003-01805 and BIO2006-10085, Comunidad Autonoma de Madrid Grants GR/SAL/0141/2004 and S2006/BIO-0194, by the Spanish Cardiovascular Research Network (RECAVA), and by an institutional grant by Fundación Ramón Areces to Centro de Biología Molecular Severo Ochoa. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
Present address: Centro Nacional de Biotecnología, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain. ![]()
Present address: Biological Sciences Division and Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99352. ![]()
¶ To whom correspondence should be addressed: Centro de Biología Molecular Severo Ochoa, Universidad Autónoma de Madrid, 28049 Cantoblanco, Madrid, Spain. Tel.: 34-91-497-8276; Fax: 34-91-497-8087; E-mail: jvazquez{at}cbm.uam.es
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
F.-M. Lin, B. Qiao, and Y.-J. Yuan Comparative Proteomic Analysis of Tolerance and Adaptation of Ethanologenic Saccharomyces cerevisiae to Furfural, a Lignocellulosic Inhibitory Compound Appl. Envir. Microbiol., June 1, 2009; 75(11): 3765 - 3776. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Jorge, P. Navarro, P. Martinez-Acedo, E. Nunez, H. Serrano, A. Alfranca, J. M. Redondo, and J. Vazquez Statistical Model to Analyze Quantitative Proteomics Data Obtained by 18O/16O Labeling and Linear Ion Trap Mass Spectrometry: Application to the Study of Vascular Endothelial Growth Factor-induced Angiogenesis in Endothelial Cells Mol. Cell. Proteomics, May 1, 2009; 8(5): 1130 - 1149. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Ye, B. Luke, T. Andresson, and J. Blonder 18O Stable Isotope Labeling in MS-based Proteomics Brief Funct Genomic Proteomic, March 1, 2009; 8(2): 136 - 144. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |