|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 6:860-881, 2007.
© 2007 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,¶,||
,
,¶,**
,

,
,

From the
Department of Biochemistry and
Biotechnology Center, University of Wisconsin, Madison, Wisconsin 53706
| ABSTRACT |
|---|
|
|
|---|
In general, strategies for isotope-assisted quantitative proteomics can be separated into two groups, depending on whether the isotopic tag is incorporated in vitro during sample preparation or in vivo during growth of the model organisms. Each approach has its relative strengths and weaknesses.
Some of the most widely applicable techniques for quantitative proteomics involve introduction of an isotopic tag in vitro via either chemical or enzymatic means. Although these approaches are highly versatile and may be readily applied for comparison of proteins from virtually any source, they are also particularly prone to effects of sample handling error because the relatively late introduction of the isotopic tag requires independent sample processing through highly variable steps such as protein extraction before the samples can be mixed. Common chemical isotopic labeling strategies include ICAT (2) and isobaric tag for relative and absolute quantification (iTRAQ) (3), which differentially label sulfhydryls (cysteine) and primary amines (lysine and N termini), respectively. The most common enzymatic means is introduction of an isotopic label through tryptic digestion of protein samples in either H216O or H218O prior to mixing (4).
In contrast, in vivo isotopic labeling involves the introduction of an isotopic tag into an organism through its food or medium. As the organism grows it consumes the label, naturally incorporating it into its proteins. To date, no obvious deleterious effects have been observed following isotopic labeling of any model organism with 15N or 13C even when high isotope abundance is achieved. However, introduction of an isotopic label in this way can be challenging for some larger organisms and is most suited for use with small model organisms that can be quickly grown on a defined medium or tightly controlled diet. These techniques can be especially useful for experiments involving extensive sample preparation. Because they are labeled with either heavy or light tags during growth, both experimental and control samples may be combined immediately at harvest or sacrifice prior to all steps in sample preparation including tissue homogenization, protein extraction, protein digestion, and all kinds of fractionation in addition to mass spectrometric analysis. Thus in vivo isotopic labeling provides the ideal internal control for all steps in a proteomics experiment. Cells grown in culture are often specifically labeled with essential amino acids provided in their medium via a technique called stable isotope labeling by amino acids in cell culture (SILAC) (5). Alternatively in cases where use of a labeled amino acid is impractical, organisms can be labeled through growth on medium containing 15N or 13C in place of their natural abundance counterparts via full metabolic labeling.
Full metabolic labeling, most often with 15N, has been successfully used for quantitative proteomics characterization of numerous prokaryotic species as well as many eukaryotes including Saccharomyces cerevisiae (6), Caenorhabditis elegans (7), Drosophila melanogaster (7), Arabidopsis thaliana (8), and Rattus norvegicus (9). For a recent review describing the application of metabolic labeling with stable isotopes to a variety of experimental systems, see Beynon and Pratt (10). This review describes not only the use of static metabolic labeling for comparison of protein abundances as considered here but also details the use of stable isotopes to monitor protein turnover via changes in incorporation of selected isotopes over time.
Several variations on metabolic labeling have been published in recent years. In one case, researchers combined the use of both 13C and 15N labeling to allow simultaneous comparison of three biological samples while also providing elemental constraints to aid peptide identification (11). Additionally full 15N metabolic labeling has been applied for top-down proteomics (12) and has been adapted for use in conjunction with two-dimensional gel electrophoresis (13). Finally as an indication of its reliability, metabolic labeling has served as the standard by which other proposed quantitative proteomics techniques including DIGE (14) and spectral counting (15) have been evaluated.
A key challenge for any quantitative proteomics experiment involving stable isotopes is the development of an effective automated system for determination of peptide ratios from LC-MS analyses. Because the mass difference between labeled and unlabeled peptides is dependent on the number of labeled atoms in the heavy peptide and thus varies with amino acid sequence, the automated analysis of data incorporating full metabolic labeling can be especially challenging. An algorithm for automated analysis of quantitative proteomics data involving stable isotope labeling has been published by MacCoss et al. (16) that addresses many of these challenges and specifically allows interpretation of 15N-labeled samples. The key features of this algorithm are detailed in Fig. 1 as adapted for our own metabolic labeling experiments.
|
In contrast, labeling of larger eukaryotes can be considerably more difficult, requiring either unnatural growth conditions or considerable expense to achieve complete 15N labeling. For example, the model plant Arabidopsis has recently been labeled with 15N for quantitative proteomics to >98% incorporation (8). However, to achieve complete labeling the plants were grown in liquid culture, essentially submerged in medium. Because these growth conditions differ significantly from the natural growth conditions of the plant, they may limit to some extent the biological questions to which complete metabolic labeling may be applied in Arabidopsis. Similarly metabolic labeling has been achieved in mammals including rats by feeding a 15N-enriched diet. Relatively long growth times and large amounts of labeled food are required for labeling of mammals, thus leading to considerable expense. Even after extended periods, the extent of 15N incorporation varies greatly from tissue to tissue in rats (9); this can preclude use of metabolic labeling for study of some biological questions. These challenges would likely be far greater for labeling of other larger mammals. Clearly the requirement for complete replacement of a selected atom with its heavy isotope makes full metabolic labeling difficult to apply in many biological experiments.
Recently Whitelegge et al. (17) presented an alternative strategy to use isotopic labeling for quantitative proteomics. They observed that when the relative abundance of a selected heavy isotope such as 15N or 13C is increased slightly over natural abundance, the result is an isotopic envelope with distinct shape (see Fig. 2). Although the isotopic envelopes for both natural abundance and partially labeled forms of the same peptide are easily distinguished, they should both be compatible with existing MS/MS data acquisition and sequencing software if the change in 15N or 13C abundance is small enough. Their key observation was that when natural abundance and partially labeled forms of the same peptide are combined, the shape of the resulting composite isotopic envelope can be used to determine the relative amounts of each peptide form that are present in the mixed sample. Similar types of measurement approaches involving overlapping envelopes have been used in the analysis of biopolymerization via mass isotopomer distribution analysis (MIDA) (18). Thus, quantitative proteomics information can be extracted through partial rather than full metabolic labeling.
|
Despite these potential benefits, a number of challenges currently prevent use of partial metabolic labeling in quantitative proteomics experiments. First no approaches for automated interpretation of data from partial labeling experiments have been developed. More fundamentally, the effectiveness of partial metabolic labeling as a quantitative technique has not been investigated. Although full metabolic labeling involves comparison of two distinct isotopic envelopes, partial metabolic labeling would require the interpretation of much more subtle changes in the shape of a single composite envelope. Because it relies upon such subtle measurements, one might expect partial metabolic labeling to display less precision and thus to provide a semiquantitative rather than quantitative indication of changes in protein abundance. To date this has not been investigated.
We present a novel algorithm for the automated interpretation of individual isotopic envelopes to extract quantitative information via partial metabolic labeling. Furthermore we compare this new approach with standard full 15N metabolic labeling. First we analyze mixtures of labeled and unlabeled Arabidopsis peptides using both techniques to assess the consistency, dynamic range, and reproducibility of each technique under controlled conditions. Additionally we use both full and partial metabolic labeling to compare protein expression in light- versus dark-grown Arabidopsis. Thus we evaluate the performance of both techniques under conditions more closely resembling those of a typical biological experiment. Finally we explore the potential of partial metabolic labeling for use in future quantitative proteomics experiments.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Full and Partial Incorporation Control Experiments: Plant Growth and Harvest
Arabidopsis seedlings (Columbia ecotype; Lehle) were grown in liquid culture using media containing Murishige and Skoog salts and MES (2.5 mM; pH 5.7) with 1% sucrose. Labeled media were prepared to 6% 15N ("partial" labeling), or >98% 15N ("full" labeling) by substituting 15NH415NO3 and K15NO3 for natural abundance salts to the appropriate proportions. Plants were grown at room temperature with constant shaking. All plants used for the control experiments were grown under continuous light.
After 10 days of growth plant tissue was recovered, spun to remove excess water, and weighed. Natural abundance, partial, and full 15N-labeled plants were harvested and prepared separately to allow subsequent combination of samples at a range of ratios. All subsequent steps up to digestion were performed on ice or in a cold room (48 °C). Each sample was combined with grinding buffer (250 mM Tris-HCl, pH 7.6, 290 mM sucrose, 25 mM EDTA, 1 mM DTT, 1 mM PMSF, 1 µg/ml pepstatin, 1 µg/ml E64, 100 µM 1,10-phenanthroline) at a ratio of 3 ml/g of tissue. Tissue was homogenized using a mortar and pestle followed by a Polytron (Brinkman). Following filtration through four layers of Miracloth (Calbiochem) the homogenate was separated into soluble, organellar, and microsomal protein fractions via centrifugation at 1500 x g and 100,000 x g. All pellets were resuspended in grinding buffer as described above. Only the soluble protein fractions were used for subsequent characterization of the full and partial incorporation quantification techniques.
Protein Digestion and Preparation of Control Mixtures
Natural abundance, partial, and total 15N-labeled soluble fraction protein concentrations were measured (4.5, 6.0, and 4.2 mg/ml, respectively) using a BCA protein assay kit (Pierce). Protein (800 µg) was precipitated by addition of acetone to 80% with incubation on ice for 30 min and recovered by centrifugation. The air-dried pellets were dissolved in 8 M urea, 8 mM DTT to a protein concentration of 8 mg/ml prior to an 8-fold dilution with 50 mM ammonium bicarbonate. Proteolysis was initiated by addition of 16 µg of sequencing grade trypsin (Promega) to each 800-µl reaction containing 1 mg/ml protein, 1 M urea, 1 mM DTT to a final ratio of trypsin to protein of 1:50. Proteolysis continued overnight (14 h) at
2223 °C with gentle rocking. The reactions were quenched by addition of formic acid to 5% (v/v) and the resulting peptides were purified by C18 solid phase extraction using SPEC·PT·C18 units and the manufacturers recommended protocol (Varian).
Light and Dark Growth Comparison: Plant Growth and Harvest
Plants were grown in liquid culture as described above. "Light-grown" plants were grown under continuous light, whereas "dark-grown" plants were kept in complete darkness wrapped in aluminum foil throughout growth. After 10 days growth plants were harvested. Natural abundance and either full or partial 15N-labeled plants were combined at harvest to known ratios by dry mass and processed together. The partial incorporation samples were homogenized as described above to produce soluble, organellar, and microsomal protein fractions. Due to smaller amounts of material, the full incorporation samples were prepared using a slightly different protocol. Following tissue homogenization in grinding buffer with a mortar and pestle, the full incorporation protein samples were spun in a microcentrifuge for 2 min at 500 x g to pellet debris. Samples were then spun for 5 min at 1500 x g to pellet organellar material. Microsomal proteins were isolated from soluble proteins by centrifugation for 120 min at
16,000 x g.
Digestion of Combined Light- and Dark-treated Samples
Soluble protein fractions were processed and digested as described for the preparation of control mixtures above. Membranous protein fractions derived from the 1500 x g (organellar) and 100,000 x g (microsomal) pellets were processed identically as follows. Insoluble material was pelleted by centrifugation in a microcentrifuge at 16,000 x g for 90 min. The supernatant was removed, and the pellet was resuspended in 50 mM ammonium bicarbonate and homogenized using a Potter-Elvehjem grinder. Protein concentrations were then measured using a BCA protein assay kit (Pierce), and 0.4 mg of each membrane homogenate was diluted to 2 mg/ml protein concentration with 50 mM ammonium bicarbonate. The samples were adjusted to 2 mM DTT, heated to 90 °C for 5 min, and allowed to cool to below 50 °C prior to addition of 1 volume of MeOH. Proteolysis was started by addition of 8 µg of sequencing grade trypsin (Promega) to each 400-µl reaction containing 1 mg/ml protein, 50% MeOH (v/v), 1 mM DTT to a final ratio of trypsin to protein of 1:50. Proteolysis was allowed to proceed overnight (14 h) at
2223 °C with gentle rocking and was terminated by addition of formic acid to 5% (v/v); the resulting peptides were purified by C18 solid phase extraction using SPEC·PT·C18 units (Varian) using the manufacturers protocol.
Mass Spectrometric Analysis
After digestion and desalting, samples containing
10 µg of peptides were analyzed using a QTOF-2 mass spectrometer (Micromass) and an HP 1100 HPLC instrument (Agilent). Peptides were separated using home-packed fused silica columns (100 µm x 11 cm) containing Eclipse C18 resin (Agilent). Samples were separated via reversed phase chromatography using buffers A (0.1% (v/v) formic acid in water) and B (0.1% (v/v) formic acid in 100% acetonitrile) at a constant flow rate of 500 nl/min. After loading each sample in 5% B, peptides were eluted at 500 nl/min with the following gradient: 512% B in 10 min, 1250% B over 105 min, 5060% B over 5 min, 60100% B in 5 min, hold at 100% B for 5 min, and return to 5% B over 5 min. MS and MS/MS spectra were collected in data-dependent mode with one 5.5-s sequencing attempt following every MS spectrum to maintain regular MS sampling over the chromatographic time frame to provide information needed for quantification. Precursor ions were selected for MS/MS fragmentation within a ±3 m/z window. Dynamic exclusion was applied for peaks within 1.2 Da over a 120-s period following each sequencing attempt.
Protein Identification
Peak lists were generated from each LC-MS analysis using Protein Lynx Global Server 2.1.5 (Micromass). A seven-point Savitzky-Golay smooth was applied twice to each MS spectrum followed by background subtraction using a 35% threshold with a first order polynomial. MS product spectra were similarly smoothed twice via the seven-point Savitzky-Golay method, although the adaptive algorithm was used for background subtraction. No deisotoping was performed.
Peptides were identified using Mascot 2.0 (19) with the following search parameters: tryptic digestion with up to two missed cleavages, variable methionine oxidation, and variable N-terminal acetylation of proteins. Although Mascot searches of full 15N incorporation data were performed using a 0.2-Da mass tolerance for both MS and MS/MS, partial incorporation data used a 1.2-Da mass tolerance for both MS and MS/MS to account for occasional problems our peak list generation software encountered interpreting unusual isotopic envelopes. All partial incorporation identifications were subsequently filtered to remove any peptides for which the mass error was not within 0.2 Da of either the monoisotopic peak or first isotope for the predicted envelope. Searches of full incorporation sample data were performed twice, first using natural abundance mass definitions and second using masses calculated for 100% 15N. Mascot .DAT files were parsed with Java and Perl scripts that used tools provided in the msParser 1.22 Toolkit (Matrix Science).
All Mascot searches were performed against a composite database containing the forward and reversed sequences for all predicted proteins in the Arabidopsis genome (Version 4, The Institute for Genomic Research, www.tigr.org, released June 2003) as well as the sequences of common contaminant proteins including porcine trypsin and several human keratins (final database size, 57,973 entries). To ensure comparable and defined levels of confidence in peptide identifications for both partial and full incorporation datasets, a reversed database strategy was used to determine 1% false positive thresholds for peptides after separation by charge state. For peptides identified from the full incorporation samples, minimum Mascot scores of 44, 24, 13, and 10 were required for singly, doubly, triply, and quadruply charged peptides, respectively. Minimum thresholds for partial incorporation samples were 58, 44, 38, and 37 for singly, doubly, triply and quadruply charged peptides, respectively. The large difference in score thresholds between the partial and full incorporation datasets is due to the wider MS mass tolerances used for partial incorporation searches. When the size of each dataset and numbers of associated reversed database hits are considered, the expected false positive rates (average ± S.D.) are 1.42 ± 0.3% for full incorporation and 1.33 ± 0.3% for partial incorporation (20). Because the expected uncertainty in false positive rate is relatively small and similar between both datasets, the peptide identifications they contain can be compared.
Calculation of 15N Incorporation
Full Metabolic Labeling
The mean level of 15N incorporation achieved in the fully labeled plants was calculated based on the incorporations observed for those peptides identified by Mascot in the 1:1 control mixture of labeled and unlabeled Arabidopsis. Incorporations for each individual peptide were calculated in an automated fashion using an adaptation of an algorithm from MacCoss et al. (21). All calculations were performed via a series of scripts written in Mathematica 5.2 (Wolfram Research). Briefly the shape of the observed isotopic envelope for each heavy peptide was compared with predicted isotopic envelopes for 15N incorporations ranging from 1 to 99% in 1% increments, assuming all other elements were present at natural abundance. The best matching predicted envelope was found through regression analysis and provided an estimate of the true isotopic incorporation. A minimum correlation coefficient of 0.85 was required for acceptance. An average incorporation of 98.2% was observed (range, 9799%; n = 29).
Natural Abundance and Partial Metabolic Labeling
A similar approach was used to calculate the 15N enrichment in both the natural abundance and 6% samples. A portion of each sample was analyzed individually, and mean levels of incorporation were determined based on the individual incorporations of those peptides identified by Mascot following single LC-MS experiments. However, predicted envelopes were calculated for incorporations ranging from 0.1 to 10% in 0.1% increments to allow greater precision. An average incorporation of 0.367% was calculated for natural abundance peptides (n = 251) in excellent agreement with the expected value of 0.368% (22). Similarly an average incorporation of 5.2% was calculated for the partially labeled peptides (n = 117). Although this is somewhat lower than the target incorporation of 6%, it is still within a range where discrimination between light and heavy envelopes should be reasonable for tryptic peptides.
Quantitative Analysis via Full 15N Metabolic Labeling
Ratios describing the relative abundance of heavy and light forms of each peptide were calculated for all peptides matched to forward protein sequences with scores above the specified Mascot 1% false positive thresholds. First, extracted ion chromatograms (EICs)1 were generated corresponding to the monoisotopic peaks from both light and heavy forms of each peptide identified from the observed m/z for the sequenced peptide and its predicted nitrogen count. The EICs included values within ±0.125 Da of the predicted m/z and encompassed 2.5 min on either side of the MS/MS sequencing event from which that peptide was identified. The EIC corresponding to the monoisotopic peak from the sequenced peptide (either heavy or light) was then fitted to a Gaussian distribution to determine its mean and S.D. Only MS scans within 1 S.D. of the mean were included for peptide quantification.
Relative quantification of each peptide pair was performed via least squares regression analysis as in MacCoss et al. (16). When the intensity of the heavy monoisotopic peak is plotted with respect to light monoisotopic peak intensity all points should fall on a line whose slope indicates the ratio of heavy to light monoisotopic peak intensities and whose intercept is related to the relative noise between heavy and light m/z values. The slope and intercept for each peptide was determined via linear least squares regression with the correlation coefficient providing an indication of the quality of the fit. Ratios reflecting the relative abundance of heavy and light forms of each peptide were then derived from these slopes through application of a correction factor to account for differences in the proportion of each isotopic envelope represented by the heavy and light monoisotopic peaks. These ratios were only accepted if the correlation coefficient exceeded 0.8. These calculations were predominantly performed using scripts written in Mathematica 5.2 (Wolfram Research), although EIC extraction was performed with a program written in Visual C++ (Microsoft Visual Studio 6.0).
Quantitative Analysis via Partial 15N Metabolic Labeling
Ratios indicating the relative contributions of light and partially labeled peptides to each composite envelope were derived via a novel algorithm that incorporates elements of both the algorithm for calculating 15N incorporation and the algorithm for determining ratios in full incorporation samples (described above). Ratios were calculated for all peptides identified by Mascot that were matched with forward protein sequences and had scores above the minimum 1% false positive thresholds as described earlier.
Description of the Observed Composite Envelope
First extracted ion chromatograms were produced for the monoisotopic peak and the first eight isotopes of each observed composite envelope. Each chromatogram included values with ±0.125 Da of the predicted m/z and included all scans within 2.5 min of the sequencing event from which that peptide was identified. The monoisotopic EIC was then fitted to a Gaussian distribution to identify the mean and S.D. associated with the chromatographic peak. Only those points within 1 S.D. of the mean were considered in further calculations. In cases where the monoisotopic distribution failed to fit a Gaussian distribution, only points within ±1.0 min of the observed MS/MS sequencing event were used for the analysis.
Next the ratio of each peak within the isotopic envelope was compared with the monoisotopic peak using linear regression in a manner analogous to the approach used to compare heavy and light monoisotopic peaks within paired envelopes in the full incorporation case. These regressions provide slopes that indicate the height of each isotope with respect to the monoisotopic peak. A minimum correlation of 0.7 was required to accept a ratio for each peak; peaks not meeting this restriction were ignored for further analysis. The result is a description of all peaks in the envelope normalized with respect to the monoisotopic peak.
Calculation of Predicted Composite Envelopes
Because each peptide has been sequenced, its elemental formula is known. Furthermore having measured the 15N incorporations achieved for both labeled and unlabeled samples, the shape of the isotopic envelope that would be produced by the labeled or unlabeled peptides individually can be calculated via binomial expansion (23). These predicted light and heavy spectra were combined at a range of ratios (1:99, 2:98, 3:97, . . . 97:3, 98:2, 99:1) covering 4 orders of magnitude to form a library of composite spectra whose ratio (heavy:light) was known. Distributions representing only natural abundance and only partially labeled peptides were included as well for cases when the contribution of one sample or the other was minimal. Each of these predicted spectra was normalized with respect to the monoisotopic peak to match the description of the observed spectrum obtained previously.
Quantification by Identifying the Best Match
To determine the relative contributions of heavy to light envelopes, each observed composite spectrum was compared with the entire library of normalized predicted spectra using linear regression. The predicted spectrum that best matched the observed spectrum as judged by correlation was used to estimate the ratio of heavy to light peptides in the original sample. A minimum correlation of 0.8 was required to accept each ratio. This approach for finding the value of some variable based on comparison of an observed spectrum to a library of predicted spectra encompassing a range of values for the variable in question is similar to the approach used to estimate 15N incorporation as discussed previously. Calculations were performed via a series of scripts written in Mathematica 5.2 (Wolfram Research), and EIC extraction was performed using a program written in Visual C++ (Microsoft Visual Studio 6.0).
Light and Dark Experiment: Identification and Quantification of Proteins
The comparison of full and partial labeling quantification was performed on a simple highly perturbed biological system (plant grown in 24 h light versus dark) to evaluate the methodologies at the intact protein level. A reciprocal labeling scheme was used for both full and partial labeling approaches to control for label-specific effects due to either differences in isotope or reagent quality. Mascot database searches were performed, described under "Protein Identification," prior to quantification as both techniques require defined molecular formulas for each peptide quantified. Relative abundance ratios (labeled over unlabeled) were derived as described above for each MS/MS event for two reciprocally labeled datasets for both full and partial labeling techniques (four datasets total). Using a combination of Mathematica scripts and Excel spreadsheet manipulation, the following data manipulation steps were accomplished. First, reverse database and keratin peptides were removed from each dataset, and the remaining peptide sequences were filtered to include only amino acid sequences unique to single protein species. Second, ratios with correlation coefficients less than 0.8 were excluded from the analysis, and values outside of the apparent linear range of the technique were sorted into bins at each extreme and designated greater than 10-fold and less than 0.1-fold. Then the ratio measurements for each dataset were normalized to the median of the remaining ratios in each dataset to correct for deviations from normal (1.0) in the labeled to unlabeled plant material mixing ratio. The peptide ratios were then combined by calculating the mean value for all of the filtered and normalized peptide ratio measurements for each protein species. Positive identification of proteins from extracts of plants grown with and without illumination required a minimum of two unique peptide sequences with acceptable Mascot scores as discussed above (using a 1% false positive rate score cutoff for each charge state via decoy database strategy).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Previously Whitelegge et al. (17) suggested that raising the 13C incorporation to around 0.751.5% above natural abundance would be useful for relative quantification of tryptic peptides based on subtle differences in envelope shape. However, controlled metabolic labeling of intact plants such as Arabidopsis with 13C may be problematic given their ability to obtain carbon from the air. Furthermore having demonstrated an easy and efficient approach for complete 15N labeling of Arabidopsis, we opted to apply the concept to 15N labeling instead. The extent to which a given change in isotopic enrichment alters the envelope shape for a peptide depends on how many atoms of that element it contains. Because nitrogen is less abundant in peptides than carbon, a larger change in isotopic enrichment is required to achieve a similar modification of envelope shape. Our working assumption is that the optimal 15N incorporation for quantification by partial metabolic labeling will be that incorporation at which the height of the M + 1 ion is greatest. At this incorporation, labeled and unlabeled forms of each peptide should have distinctly different isotope envelope shapes, yet both labeled and unlabeled forms should be compatible with existing peak extraction and peptide sequencing software. Using calculus we can then calculate exactly what this incorporation should be.
The height of the M + 1 peak in an isotopic distribution depends predominantly on the numbers of carbons and nitrogens in a peptide. Because the 13C contribution will be fixed in our labeling regime, we need only consider the nitrogen contribution to the second isotope resulting from presence of a single 15N, which can be described by Equation 1 in terms of the total number of nitrogens in the peptide (N) and the 15N incorporation (I), derived from the binomial distribution.
![]() |
To find the 15N incorporation that maximizes this function, we take the derivative of the expression with respect to 15N incorporation (Equation 2). Then we set the derivative equal to 0 and solve for I (Equation 3).
![]() |
![]() |
It can be proven that this critical point is in fact a maximum by evaluating the second derivative for I = 1/N (not shown). Now we need an estimate of the average number of nitrogens in the peptides we are likely to identify. Using data from an extensive survey of the Arabidopsis proteome we completed recently (8), we determined that the median nitrogen count among identifiable peptides in a tryptic digest of Arabidopsis is 18 for an optimal 15N incorporation of 5.6%. Given the observed variability of nitrogen counts among tryptic peptides, a 15N incorporation between 5 and 6% should be appropriate. Although a range of 15N incorporation values may be suitable for this kind of experiment, it is essential that the observed 15N incorporation is consistent within a particular sample for all labeled proteins. Thus, labeling must be continued long enough to achieve a steady-state, equilibrium level of incorporation if quantitative measurements are to be made.
Plotted in Fig. 3 are the expected isotopic distributions for a typical tryptic peptide after combination of its natural abundance and
6% 15N-labeled envelopes at various proportions. After normalization of the distributions to the monoisotopic peaks, large differences in the M + 1, M + 2, etc. peaks become obvious. These significant changes suggest that consideration of the entire peptide envelope will be important for distinguishing varying combinations of light and heavy envelopes.
|
|
Next the observed composite distribution for each peptide is defined. Extracted ion chromatograms are generated corresponding to all isotopes within the composite envelope for each peptide (Fig. 4, part 2). Each of these EICs is then compared with the monoisotopic EIC via linear regression, similar to MacCoss et al. (16) (Fig. 4, part 3). These regression analyses return lines whose slopes reflect the intensities of each peak in the isotopic envelope normalized with respect to the monoisotopic peak. Furthermore the correlation coefficients for each regression provide an indication of the quality of the fit and allow exclusion of particular isotopes that are disrupted by noise.
Because we know the elemental composition of each peptide and we know the 15N incorporations for both labeled and unlabeled samples, we can predict the shapes of each of these envelopes through binomial expansion. We can then combine these distributions in a range of different proportions, creating a library of predicted envelopes of known heavy to light ratios (Fig. 4, part 4). Note that these composite distributions have a distinct shape from the distributions that result when the 15N is uniformly distributed throughout all forms of the same peptide in a sample. In practice we included ratios across 4 orders of magnitude in our library for each peptide, ranging from 100:1 to 1:100. We also included the envelopes expected from the labeled or unlabeled peptides alone to account for cases where the peptide was only detectable in one sample or the other. All isotopic peaks within each distribution in the library are then normalized with respect to the monoisotopic peak for ease of comparison with the observed envelope described earlier.
Finally the observed distribution is compared with each composite envelope in the library via linear regression (Fig. 4, part 5). The best match within the library of predicted envelopes is then selected based on the resulting correlation coefficient and is used as an estimate of the ratio between labeled and unlabeled peptides in the original sample. This comparison of an observed isotopic envelope with a library of predicted envelopes is conceptually similar to an approach used previously for determination of 15N incorporation in metabolically labeled peptides (21).
In practice a strong correlation is generally seen for the best match between the theoretical and experimental distributions, and these correlations fall quickly for the other theoretical spectra. However, the sharpness of this decline varies depending on the relative contributions of the heavy and light envelopes to the composite distribution. Plotted in Supplemental Fig. 1 are the distributions of correlation coefficients resulting from quantification of a selected peptide at a variety of ratios of heavy to light forms. When the mixing ratio is close to 1:1, the correlation distributions tend to be smooth with a peak at the best matching theoretical spectrum. However, as the ratio approaches all unlabeled or all labeled forms, the distributions of correlation coefficients then assume a sigmoidal shape with the correlation approaching 1.0 asymptotically as the mixing ratio approaches 0 or infinity, respectively.
Although computationally quite involved, this approach has some positive features. First because the observed isotopic envelopes are defined via linear regression, they are relatively tolerant of noise. Additionally because several peaks within each isotopic envelope are used for quantification, our ability to determine relative contributions of labeled and unlabeled peptides should be maximized.
Comparison of Full Versus Partial Metabolic Labeling
Although both full metabolic labeling and partial metabolic labeling are intended to provide a relative quantitative comparison of two biological samples, they differ in a number of key respects. Typical MS spectra from combined labeled and unlabeled forms of a selected peptide are plotted in Fig. 5, part A, either using partial metabolic labeling (top) or full metabolic labeling (bottom). Most fundamentally, whereas full metabolic labeling involves the comparison of two completely separate isotopic envelopes for quantification of each peptide, partial incorporation involves interpretation of the shape of a single composite envelope. As a result spectral complexity is dramatically reduced when partial metabolic labeling is used in the sense that each peptide is represented by only a single envelope rather than two. This is especially clearly illustrated in Fig. 5, part B, where MS spectra from analysis of similar biological samples are compared with either partial or full metabolic labeling. Note that envelopes from at least four chemical species are clearly visible in the partial incorporation sample. As expected, twice as many envelopes are visible in the full incorporation example with brackets to indicate labeled and unlabeled forms of each peptide. Note that peptide iv is almost completely obscured.
|
The use of full versus partial metabolic labeling also has implications for peptide database searching using Mascot or other similar search engines. First whereas full incorporation requires two independent searches against either 14N-amino acid mass definitions or 15N-amino acid mass definitions, partial incorporation requires only a single search against 14N-amino acid mass definitions. This eliminates the possibility for duplicate assignments of particular MS/MS spectra and avoids the subtle differences in confidence of peptide identifications that have been documented between natural abundance and fully 15N-labeled samples (8). Implications for peak extraction prior to peptide database searching also exist for these labeling regimes. In our experience full metabolic labeling has generally been compatible with peak extraction algorithms designed for natural abundance samples as long as complete 15N labeling is achieved. However, pilot experiments indicated occasional problems with monoisotopic peak identification in the partial incorporation samples. These were addressed after the fact using wider mass tolerances for database searching with subsequent filtration steps to find misidentified peaks.
Partial and full metabolic labeling may also be expected to differ with respect to quantification. First, because full metabolic labeling involves the comparison of two independent envelopes whereas partial metabolic labeling requires the detection of subtle changes in shape for a single envelope, one might expect quantification via full metabolic labeling to be more accurate and to display a greater dynamic range. Second, whereas full metabolic labeling returns peptide ratios as continuous values, our current implementation for partial metabolic labeling returns discrete ratios. Depending on the spacing between ratios, this could introduce additional error in relative quantification. However, provided the spacing between composite spectra in the theoretical library considered for each peptide is small compared with the magnitude of other sources of error, this should not seriously compromise this approach. It is worth noting that statistical approaches must be adapted to accommodate the discrete data from partial metabolic labeling experiments. Finally these two approaches are likely to differ with respect to how they deal with extreme changes where a peptide is present in only labeled or unlabeled form. For full metabolic labeling these cases involve comparison of a defined chemical species with a signal that is often obscured by noise, resulting in poor quality quantification that is often thrown out based on poor correlation. Salvaging such peptides can often require significant visual inspection of MS spectra from peptides whose quantification is suspect; this can be an onerous task. In contrast, our current algorithm for partial metabolic labeling naturally handles these extreme cases by considering the envelopes for each peptide present exclusively in labeled or unlabeled form. Thus partial metabolic labeling as currently implemented may have some advantage when characterizing large changes in peptide abundance.
Clearly there may be significant differences in the performance of full versus partial metabolic labeling for quantitative proteomics. Our intention is to evaluate each of these approaches with respect to accuracy, dynamic range, and usefulness in a biological setting. First, each technique will be used to characterize control samples of labeled and unlabeled Arabidopsis peptides combined at known ratios over several orders of magnitude. These quantitative characterizations will allow us to evaluate the quantitative accuracy of each technique over a wide dynamic range. Additionally we will examine the distributions of error that result. Second, we will use both partial and full metabolic labeling to characterize differences in protein abundance between light- and dark-grown Arabidopsis. This characterization will allow us to evaluate the performance of each technique under more realistic experimental conditions where labeled and unlabeled samples are mixed immediately before analysis. Additionally by incorporating reciprocal labeling into our experimental design we will evaluate the extent of any possible side effects for each labeling procedure, including nutritional differences between labeled and unlabeled food preparations as well as possible isotope effects. Finally we will evaluate both partial and full metabolic labeling with respect to reproducibility and total numbers of peptide and protein identifications using a biological comparison that is well understood and can be expected to produce changes in protein abundance over a wide dynamic range.
Quantitative Analysis of Controlled Peptide Mixtures: Overview
To evaluate the quantitative accuracy of both partial and full metabolic labeling across a wide dynamic range, Arabidopsis plants were grown in liquid culture containing either natural abundance, partially labeled, or fully labeled media. Each population of plants was then harvested, homogenized, and digested separately. The resulting peptides were then combined at known ratios based on total protein concentration covering a range from 100:1 to 1:100 (labeled:unlabeled), and the resulting mixed samples were characterized on a Q-TOF mass spectrometer. A summary of peptide identifications from these experiments is given in Table I, while the median ratios for all peptide mixtures are plotted in Fig. 6. A complete list of all peptides identified in these LC-MS analyses including observed ratios is provided in Supplemental Table 1.
|
|
Several interesting trends emerge when we consider total numbers of peptides that were identified and quantified by each technique. First numbers of peptides and scans resulting in successful quantification show a variable trend across a range of peptide ratios. In general, the greatest percentage of peptides is successfully quantified via full metabolic labeling at ratios close to 1:1. As the ratios become more extreme, the proportion of successfully quantified peptides drops dramatically. In contrast, the percentage of peptides that are successfully quantified by partial incorporation does not vary as the peptide ratio is changed and remains well above 90% across the entire dynamic range. Even near 1:1 when full metabolic labeling is at its best, partial metabolic labeling still outperforms it by more than 10%. Ultimately although full metabolic labeling produces more peptide identifications, partial incorporation performs better overall by successfully identifying and quantifying a larger number of peptides across the entire dynamic range.
Overview of Quantification Accuracy: Median Ratios
Having considered the total numbers of peptides identified and quantified by each technique, we now need to consider the accuracy of quantification. As an initial examination of accuracy, we consider the median ratios observed for all peptides in each peptide mixture. These values are plotted in Fig. 6. Upon examination it appears that both approaches are returning median mixing ratios that roughly correspond to the intended mixing ratios. Both approaches generally do well for ratios near 1:1, although full incorporation may perform slightly better overall. Although partial incorporation does quite well for ratios near 1:1, it starts to overestimate ratios at more extreme values. In contrast, the full incorporation sample may underestimate large changes.
Based on these results it appears that both partial and full metabolic labeling allow reasonable quantification within the range between 12:1 and 1:12 (labeled:unlabeled). Outside this range both approaches provide values that are of more qualitative than quantitative value. Although these observations would still be very useful in a biological sense, providing evidence for large changes in protein abundance, they are of less use as quantitative measurements. Thus, we will limit our further quantitative evaluation of each technique to those measurements falling within the range between 12:1 and 1:12.
It is worth noting that this quantitative range is consistent with the distributions of correlation coefficients produced via the partial incorporation approach when an observed distribution is compared with a library of predicted spectra at varying ratios of labeled to unlabeled forms (see Supplemental Fig. 1). Between 12:1 and 1:12 these distributions reach smooth peaks at the appropriate ratio. However, outside this range these distributions begin to level off, eventually approaching correlations of 1.0 asymptotically as the mixing ratio approaches either 0 or infinity. This is consistent with our observation that partial metabolic labeling provides relatively consistent ratios near 1:1 but that these values are of more qualitative than quantitative value at extreme ratios.
Evaluation of Quantification for Individual Peptides
The preceding quantitative comparison of full versus partial metabolic labeling has been based upon median ratio measurements that combine multiple observations of many different peptides to provide a general indication of the performance of each algorithm across multiple samples. However, we are not so much interested in the average performance of each algorithm overall but rather in the consistency of measurements from each algorithm across multiple measurements of individual peptides. We anticipate significant variation between the labeled and unlabeled peptide populations mixed for these experiments due to separate handling through growth, harvest, protein extraction, and digestion. For this reason the best way to characterize the precision of each technique considers ratios obtained for individual peptides as this is independent of variation in sample preparation and only reflects errors in the preparation of the controlled mixtures. In this way one can evaluate the distribution of error associated with each approach across the full dynamic range.
We can use regression analysis on a peptide by peptide basis to evaluate the performance of both partial and full metabolic labeling with respect to accuracy and consistency of quantification. Considering the two techniques separately, we plot the observed ratios for each peptide versus the ratio at which the labeled and unlabeled samples were combined. If measurements for that particular peptide are consistent across multiple ratios, these points will fall on a line whose slope is the ratio of labeled to unlabeled peptide in the 1:1 mixture. The resulting correlation coefficients will provide an indication of how consistent individual peptide measurements are across varying ratios, whereas the residuals observed will provide an indication of any deviation in calculated ratios across the dynamic range associated with each technique.
Example plots of observed versus predicted ratios obtained via either partial or full metabolic labeling are given for selected peptides in Fig. 7, part A. These plots illustrate the very clear linear response of both full and partial metabolic labeling across a range of peptide ratios between 12:1 and 1:12 (labeled:unlabeled). As expected, the slopes observed for each peptide vary significantly. The slope deviations are most likely due to differences in sample handling. When the standard peptide mixtures were prepared, the labeled and unlabeled peptides were processed separately until immediately prior to MS analysis. This allowed easy generation of consistent mixtures of peptides across a range of ratios. However, differences in any prior step in sample preparation including protein extraction, digestion, or desalting could lead to peptide- or protein-specific variation between labeled and unlabeled samples. Although these factors may increase the variability when median peptide measurements at each mixing ratio are considered (as in Fig. 6), we may still judge consistency of measurements for specific peptides across multiple mixing ratios. Because the same labeled and unlabeled samples were used to prepare all control peptide mixtures, differences in ratio from sample to sample for each peptide should still be proportional to the anticipated change in ratio. The slope of the resulting regression line simply reflects the variation between the original samples for each peptide prior to mixing. By using linear regression to analyze the results for each peptide individually, we can evaluate the performance of these algorithms independently of any biases in the original labeled and unlabeled peptide populations that were used.
|
Although the distribution of correlation coefficients provides some indication of the overall variability of measurements on a peptide by peptide basis, it does not allow us to observe how these errors are distributed across varying ratios. To address this issue we examine the residuals from the regression analysis for each peptide to observe how individual measurements vary from their expected linear trend (see Fig. 8, part A). Note that in general both partial and full metabolic labeling produce similar distributions of errors across the quantitative range for these techniques.
|
Although the previous two error assessments show little difference between the two techniques, when the distribution of error is plotted as a cumulative percentage with respect to the relative error observed, a subtle yet significant difference is revealed (see Fig. 8, part B, inset). The curve for partial incorporation consistently lags behind the curve for full incorporation. For full metabolic labeling we find that 95% of measurements give errors less than about 0.7. However, for partial incorporation we must go as high as a relative error of 1.3 to include 95% of all measurements. Thus it appears that partial metabolic labeling is associated with somewhat greater relative error than full incorporation, although the difference is subtle.
Conclusions: Quantitative Evaluations of Partial and Full Metabolic Labeling
Overall our characterization of full and partial metabolic labeling thus far indicates co