Advertisement
MCP
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Originally published In Press as doi:10.1074/mcp.M600148-MCP200 on October 26, 2006.
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
M600148-MCP200v1
6/2/305    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Glossary
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mason, C. J.
Right arrow Articles by Bergen, H. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mason, C. J.
Right arrow Articles by Bergen, H. R., III
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?

Molecular & Cellular Proteomics 6:305-318, 2007.
© 2007 by The American Society for Biochemistry and Molecular Biology, Inc.


Research

A Method for Automatically Interpreting Mass Spectra of 18O-Labeled Isotopic Clusters*,S

Christopher J. Mason{ddagger}, Terry M. Therneau§, Jeanette E. Eckel-Passow§, Kenneth L. Johnson{ddagger}, Ann L. Oberg§, Janet E. Olson, K. Sreekumaran Nair||, David C. Muddiman** and H. Robert Bergen, III{ddagger},{ddagger}{ddagger}

From the {ddagger} Mayo Proteomics Research Center, Divisions of § Biostatistics and Epidemiology, and || Department of Endocrinology, Mayo Clinic College of Medicine, Rochester, Minnesota 55905 and ** Department of Chemistry, W. M. Keck FT-ICR Mass Spectrometry Laboratory, North Carolina State University, Raleigh, North Carolina 27695


    ABSTRACT
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
16O/18O labeling is one differential proteomics technology among many that promises diagnostic and prognostic biomarkers of disease. Although the incorporation of 18O in the C-terminal carboxyl group during endoproteinase digestion in the presence of H218O makes the process of labeling facile, the ease and effectiveness of label incorporation have in some regards been outweighed by the difficulties in interpreting the resulting spectra. Complex isotope patterns result from the composition of unlabeled (18O0), singly labeled (18O1), and doubly labeled species (18O2) as well as contributions from the naturally occurring isotopes (e.g. 13C and 15N). Moreover because labeling is enzymatic, the number of 18O atoms incorporated can vary from peptide to peptide. Finally it is difficult to distinguish highly up-regulated from highly down-regulated or C-terminal peptides. We have developed an algorithm entitled regression analysis applied to mass spectrometry (RAAMS) that automatically, rapidly, and confidently interprets spectra of 18O-labeled peptides without requiring chemical composition information derived from product ion spectra. The algorithm is able to measure the effective 18O incorporation rate due to variable enzyme substrate specificity of the pseudosubstrate during the isotope exchange reaction and corrects for the 18O0 abundance that remains in the labeled sample when using a two-step digestion/labeling procedure. We have also incorporated a method for distinguishing pure 18O0 from pure 18O2 peptides utilizing impure H218O. The algorithm operates on centroided peak lists and is therefore very fast: nine chromatograms of, on average, 1,168 spectra and containing, on average, 6,761 isotopic clusters were interpreted in, on average, 45 s per chromatogram. RAAMS is fast enough (average, 38 ms/spectrum) to allow the possibility of performing information-dependent MS/MS on a chromatographic time scale on species exceeding predetermined ratio thresholds. We describe in detail the operation of the algorithm and demonstrate its use on datasets with known and unknown ratios.


Differential proteomics technologies promise diagnostic and prognostic biomarkers to reduce the burden of disease (14). 16O/18O stable isotope labeling is one tool among many with which to quantify the relative expression differences of proteins between two biological samples (5). 16O/18O labeling, in contrast to other popular differential labeling methodologies such as CAT, isobaric tag for relative and absolute quantitation (iTRAQTM)1, and stable isotope labeling of amino acids in cell culture (SILAC), (a) does not select for peptides containing certain amino acids (a limitation of ICAT), (b) does not require fragmentation spectra (tandem MS) to record an abundance ratio (a limitation of iTRAQ), and (c) is applicable to biological specimens such as serum, plasma, and cerebrospinal fluid (a limitation of SILAC) (68). In a typical 18O experiment, two samples (or two pools of samples) of a biological matrix such as serum are collected with one typically representing a normal or control population and the other representing a disease state, although other arrangements are possible. One sample is digested with an endoproteinase, typically trypsin (9), in the presence of H216O, whereas the other sample is digested with the same enzyme in H218O. The trypsin-catalyzed exchange of up to two 16O atoms for two 18O atoms at the C-terminal carboxyl group of the peptide provides a mass shift of 2 (18O1) or 4 (18O2) daltons. The two digests are then mixed and subjected to mass spectrometry where the isotopic clusters for each peptide appear as a set of peaks corresponding to the unlabeled (18O0) and labeled (18O1 and 18O2) species. Measuring the heights of these peaks gives information about the relative abundances of the various peptides in the two samples.

Interpreting the resulting spectra, however, is not completely straightforward. Contributions from the artificially enriched 18O isotopes and naturally occurring isotopes (particularly 13C and 34S) combine to form complex, overlapping isotopic distributions especially for high molecular weight peptides where the monoisotopic peak is no longer the most abundant isotope. Schnolzer et al. (10) demonstrated that once the initial cleavage and exchange has occurred the proteolytic peptide forms a pseudosubstrate for the enzyme, causing the incorporation of a second 18O atom. However, different peptide pseudosubstrates have different reaction rates, Km, with trypsin, which causes peptide-to-peptide variability in the incorporation of 18O. The reaction rate is determined by such factors as peptide length (10) and sequence composition (11).

Recent digestion protocols (11, 12) introduce a further complication because two digestion steps are utilized: a first digestion using solution-based trypsin in H216O and a second digestion using immobilized trypsin in H218O. For peptides with high incorporation rates, the two-step procedure is effectively the same as the single step digestion; however, with low incorporation rate peptides, the possibility arises that a peptide molecule cleaved in H216O may never reassociate with trypsin in H218O. Bantscheff et al. (13) observed that, when using the two-step digestion procedure, it is important to account for the peptide molecules with no 18O atoms at their C termini that remain in the labeled sample. In addition to these complications, other factors that can affect the labeling process include back exchange and pH (14).

Several different methods are described in the literature for calculating 16O/18O ratios while accounting for variable incorporation. Yao et al. (5) used a ratio combining the experimental abundances of the 18O0, 18O1, and 18O2 peaks along with the theoretical isotopic abundances of those peaks; chemical composition information derived from known peptide sequence or from product ion spectra was used to obtain the theoretical isotopic abundances. Johnson and Muddiman (15) removed the requirement for sequence identification/product ion spectra by utilizing the average amino acid averagine (16) to calculate approximate chemical compositions, by modeling the contributions of 13C and 34S with a power function, and by including this in the ratio calculation. Both of these calculations assume a single step digestion and do not account for 18O0 abundance in the labeled sample.

At least two software implementations exist that perform ratio calculations; both have limitations. Halligan et al. (17) automated the process of combining the peptide sequence information from Sequest (18) with either the Yao et al. (5) or Johnson and Muddiman (15) ratio calculations. They required zoom scans, utilized the protein identification data to determine the monoisotopic mass, and assumed that this mass appeared experimentally. Qian et al. (19) utilized THRASH (20) to detect peptide pairs and then applied the ratio equations of Johnson and Muddiman (15). THRASH, however, uses a single theoretical isotopic distribution and must separately detect the 18O0 and 18O2 species; therefore it will have difficulty with more massive peptides or those with incomplete incorporation.

Our current work describes a completely automated method for locating and interpreting 18O-labeled isotopic clusters in parent ion chromatograms without requiring product ion or selected ion monitoring/zoom spectra. Central to our algorithm is the use of linear regression to simultaneously fit all peaks in the isotope cluster rather than just the peaks representing the 18O0, 18O1, and 18O2 species. We use the residuals from the regression to compute a fitting score between the theoretical and experimental isotopic distributions that can be used to rank potential candidate biomarkers. Mirgorodskaya et al. (21) also used linear regression with 18O-labeled internal standards but measured the relative abundances of the 18O1 and 18O2 species for a given peptide in a separate experiment instead of allowing these abundances to vary in the regression.

We also describe a method of distinguishing highly up-regulated peptides from highly down-regulated or C-terminal peptides. When using highly enriched H218O, peptides that have fully incorporated two 18O atoms and that appear only in the labeled sample appear identical to peptides that have incorporated no 18O atoms either because they were C-terminal (and thus not a substrate for trypsin) or because they were significantly down-regulated. By lowering the enrichment level of the H218O to, for example, 90%, a unique peak "signature" is generated that the algorithm can recognize to distinguish these two cases. Furthermore our algorithm accounts for the residual 18O0 abundance that remains in the labeled sample from peptides with low incorporation rates; this is particularly relevant when utilizing the two-step digestion procedure.

In a manner similar to THRASH, the algorithm performs automated "reduction" of spectra. No assumption is made regarding monoisotopic mass, which is determined using an alignment procedure similar to that described by Senko et al. (16) and utilized by THRASH. However, THRASH operates directly on the raw spectral data; this allows it to detect species at very low signal to noise ratios but also makes it very slow. Instead we have focused on centroided peak data supplied by the vendor software from parent ion mass spectra for reasons of speed and because such data were readily available. The algorithm is indeed quite fast, interpreting a chromatogram of ~1,300 spectra in less than 1 min; THRASH might take several hours to process such data. The algorithm was developed using data from FT-ICR instrumentation but should be generally applicable to any instrument with sufficient resolving power.

We will describe in detail the operation of our algorithm, entitled regression analysis applied to mass spectrometry (RAAMS), and evaluate its performance using datasets with both known and unknown ratios.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Datasets—
Three datasets were used to demonstrate the effectiveness of the algorithm: two with known 18O/16O ratios and one with unknown ratios; these datasets are summarized in Fig. 1. The first two datasets consisted of purified proteins at known 18O/16O ratios but digested in H218O of different purities. In one dataset human transferrin and bovine serum albumin were trypsin-digested together in either 99% H218O or 100% H216O and then mixed in seven 18O/16O ratios (20, 10, 3, 1, 0.3, 0.1, and 0.05). Digests of the two proteins in either 99% H218O or 100% H216O were used to represent ratios of 0 and {infty}, respectively. The second dataset was comprised of the same two proteins digested in 90% H218O or 100% H216O and mixed in the same ratios. Each sample was then analyzed by LC-FT-ICR. Thus, a dataset was comprised of nine chromatograms at nine different 18O/16O ratios. Note that the two pure proteins were present at approximately equal amounts in both the labeled and unlabeled samples; only the ratios of labeled to unlabeled digests of these proteins were varied. It is important to understand that these known ratios were generated for verification purposes only; a typical 18O experiment would consist of a single mixed sample with a different (and unknown) 18O/16O ratio for each protein. These two datasets are available upon request (see below) in mzXML format.


Figure 1
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 1. Experimental methodology. Three datasets were prepared: the first two datasets consisted of purified proteins (human transferrin and serum albumin) mixed at known 18O/16O ratios. The third dataset consisted of human plasma from disease and control pools.

 
To verify that the algorithm was able to detect differences in complex samples a third dataset was prepared. This set consisted of two pools of human plasma each composed of five disease or five control patients. The disease pool was trypsin-digested in 90% H218O, whereas the control pool was digested in H216O. These pools were then combined, and strong cation exchange (SCX) chromatography was performed, collecting 40 1-min fractions. We examined by nano-LC-FT-ICR that single fraction with the highest UV absorbance response because it was more likely to provide a complex sample.

Sample Preparation—
Human albumin, human apotransferrin, DTT, and iodoacetamide were purchased from Sigma. Sequencing grade trypsin for solution-based trypsin digestion was purchased from Promega (Madison, WI). Trypsin immobilized on agarose beads was purchased from Pierce. Disposable size exclusion columns (Micro Bio-Spin, P6 in Tris) were purchased from Bio-Rad. 18O-Enriched water (99 atom % enrichment) was purchased from Isotec (Miamisburg, OH). Solvents for LC-MS were purchased from Burdick and Jackson. Formic acid was from Fluka.

Two sample tubes, each containing 400 µg each of albumin and transferrin, were initially reduced with DTT and alkylated with iodoacetamide. Samples were digested with solution-based trypsin in H216O, 10% acetonitrile overnight at 37 °C. After this initial digestion, a second digestion using immobilized trypsin was used to exchange respective isotope labels into each sample. A 45-µl volume of immobilized trypsin beads, previously washed with 50 mM Tris, was added to each sample, and samples were concentrated to dryness on a Savant vacuum centrifuge. 50 µl of either H216O or the 18O-enriched water were added to their respective samples and were again concentrated to dryness by vacuum centrifuge. For the isotope exchange digestions, 100 µl of either H216O, 99% 18O-enriched water, or 90% 18O-enriched water were added to samples followed by 20 µl of acetonitrile that had been dried by storing it over a bed of sodium sulfate. 90% 18O-enriched water was prepared by adding a 10% volume of HPLC grade H216O to 99% 18O-enriched water. Samples were then vortexed for 7 h at room temperature on a Genie Vortex 2. After the exchange digestion, 2 µl of neat formic acid and 2 µl of 5% heptafluorobutyric acid were added to each sample, and samples were vortexed briefly and centrifuged for 5 min at 6,000 x g. 95 µl of supernatant from each sample were transferred to new microcentrifuge tubes for use in preparing prescribed ratios between the two samples. Two steps of 10-fold dilution (50 µl diluted to 500 µl) were performed for each albumin/transferrin sample using HPLC grade water containing 2% acetonitrile, 0.1% heptafluorobutyric acid, and 0.0005% Zwittergent 3-16. 18O/16O samples with ratios ranging from 100% 18O to 100% 16O were prepared from these individual standards. Sample mixtures were prepared with 18O/16O ratios of 20, 10, 3, 2, 1, 0.5, 0.33, 0.1, and 0.05.

Plasma samples were first depleted of the six most abundant proteins using an Agilent multiaffinity column. The non-retained fraction from each sample was assayed for total protein content using a Bradford assay and was denatured with urea, reduced, alkylated, digested, and labeled using a protocol from ProlyticaTM (Stratagene, La Jolla, CA). The labeled digests were then mixed based upon their total protein content, desalted on a reversed phase cartridge, and fractionated on a 4.6-mm-inner diameter PolySULFOETHYL aspartamide SCX column using a potassium phosphate/potassium chloride buffer system at pH 3.0 with 20% acetonitrile. SCX aliquots were vacuum concentrated to dryness and reconstituted in water containing 2% acetonitrile, 0.1% heptafluorobutyric acid, and 0.0005% Zwittergent 3-16.

Nanoscale LC-MS Analyses—
Nanoscale LC-MS analyses were performed on a Thermo LTQ-FT mass spectrometer (Thermo Electron, San Jose, CA) interfaced to an Eksigent NanoLC-1D system (Eksigent Technologies, Livermore, CA). An Eksigent AS1 autosampler and an auxiliary isocratic pump flowing at 10 µl/min were used to load 10-µl sample injections onto a 0.25-µl Opti-Pak precolumn (Optimize Technologies, Oregon City, OR) packed with Michrom Magic C8, 5 µm, 200 Å (Michrom BioResources, Inc., Auburn, CA). Peptide separations were performed on a 15-cm by 75-µm-inner diameter column packed with Magic C18, 5 µm, 200 Å (also Michrom BioResources, Inc.). A gradient from 5 to 40% B over 60 min was used with a column flow rate of 400 nl/min where mobile phase A was water/acetonitrile/n-propyl alcohol/formic acid (98:1:1:0.2 by volume), and mobile phase B was acetonitrile/n-propyl alcohol/water/formic acid (80:10:10:0.2 by volume).

FT-ICR-MS data and data-dependent linear ion trap MS/MS data were collected in parallel on an LTQ-FT mass spectrometer (Thermo Electron Corp., Bremen, Germany). FT-ICR-MS survey data were collected in full profile mode from m/z 350 to 1,900 at 100,000 resolving power (full width, half-maximum at m/z 400) using two microscans and a target population of 1 x 106 charges for automatic gain control (AGC) in the ICR cell. The linear ion trap performed MS/MS experiments on the three most abundant ions from the FT-ICR survey scans using 2 microscans, a precursor isolation width of 2 m/z units, 35% normalized collision energy, and 30 ms activation time. An automatic gain control target value of 1 x 104 was used for MS/MS experiments in the LTQ mass analyzer.

Software Implementation—
Software was implemented in C++ on Linux and MacOS X. Performance measurements were made on a computer with a 3-GHz Xeon processor running Linux 2.6. Software and datasets are available as open source upon request.


    COMPUTATIONAL DATA INTERPRETATION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
The algorithm takes as input the centroided peaks from a single spectrum as provided by the instrument vendor software and produces as output a list of isotope clusters detected therein along with their monoisotopic, neutral masses, their centroid retention times, and the ion abundances of both the unlabeled (16O, sample A) and labeled (18O, sample B) species. These two abundances, denoted {theta}A and {theta}B,2 respectively, can be used to find differentially expressed peptides in either an absolute ({theta}A{theta}B) or relative ({theta}B/{theta}A) scale. The estimated variance-covariance matrix of {theta} is also returned, allowing confidence intervals to be formed.

The algorithm operates in three steps. First, a list of all potential isotope clusters is generated based on peak spacing consistent with any allowed charge state. Some isotope peaks will belong to multiple potential isotope clusters. Second, the ion abundances of each of the potential clusters are fit to a theoretical isotopic model using linear regression, giving a best fit to each in terms of the relative amounts of 18O0, 18O1, and 18O2 for that peptide. Finally the algorithm decides between potential isotope clusters that share peaks based on their residuals from the regression fits. Each step is described in further detail below and summarized in Table I.


View this table:
[in this window]
[in a new window]

 
TABLE I RAAMS algorithm overview

 
Step One: Examine Peak Spacing—
In the first step of the algorithm, potential isotope clusters are found with spacing approximately equal to 1/z. To accomplish this, the data are divided into disjoint groups of peaks where a "group" is defined by a leading and trailing peak-free region of at least 3 m/z units. A distance matrix is computed between all pairs of peaks in this group. Charge states up to 6 are considered; elements in the distance matrix with values of 1.00235 ± error, 1.00235/2 ± error, ..., 1.00235/6 ± error are of interest. The average charge state spacing of 1.00235 is from Horn et al. (20). Error in charge state spacing is allowed to be as high as 40 ppm to account for partially or unresolved peaks. For each of the possible charge states from highest to lowest, the algorithm identifies adjacent pairs of peaks that have the requisite spacing. For each such pair, in order of abundance, the cluster is extended by adding those peaks that best fit the spacing of the original pair. The algorithm stops walking outward when one of two conditions is met: (a) the next peak with appropriate spacing is greater than 3 m/z units away or (b) the average error in peak spacing for the whole run exceeds 40 ppm. In this way, a potential isotope cluster is found that has m/z spacings of approximately 1/z for a given charge state z. At least three peaks must be found for a potential isotope cluster to be passed to later stages of the algorithm. Isotope clusters resulting from low abundance species with mass less than ~500 Da may legitimately have only two peaks (the A (18O0) and A + 1 (13C118O1) peaks with negligible 13C contribution) above the limit of detection, but these are very difficult to distinguish from noise spikes.

Three special cases converge to make it difficult to determine, based only on peak spacing, the correct charge state for a given run of peaks. (a) Noise peaks can interdigitate between real peaks. (b) Isotope clusters can legitimately overlap and share one or more peaks. (c) Low abundance peaks can fall below the signal to noise threshold of the peak detection algorithm. Examples of these are shown in Supplemental Fig. S1. To address the issue of noise peaks, legitimate peak sharing, and missing peaks, the algorithm attempts to fit all charge states that are consistent with the peak spacing data. The algorithm also tolerates up to two missing peaks internal to a candidate isotope cluster. In each of the three cases, the result is one or more isotope clusters with correct charge state assignments and one or more incorrect assignments that share peaks with the correct assignments. We resolve between these correct and incorrect isotope clusters in step three of the algorithm (described below).

Step Two: Examine Peak Abundances—
In step two of the algorithm, the candidate isotope clusters from step one are evaluated based on their abundances. The purpose of step two is to determine the contributions of the three label states (exchange of zero, one, or two oxygen atoms at the C terminus of the peptide) to the experimentally observed abundances of the isotope cluster by least squares fitting of a linear combination of expected isotopic abundances for each of these states. For each candidate isotope cluster, its charge state (determined in step one) and the mass of its first peak are used to compute an approximate neutral mass, and then an approximate chemical composition is determined using the average amino acid averagine, which has the chemical formula C4.9384H7.7583N1.3577O1.4773S0.0417 (16). Optionally the algorithm can adjust up or down the number of sulfurs predicted by averagine and perform the steps below for these additional expected chemical compositions.

As shown in Fig. 2a, for each chemical composition, three theoretical isotopic distributions are generated using Mercury (22) that correspond to the three label states. The first theoretical isotopic distribution is that which would result if all the molecules of peptide had not exchanged either of their C-terminal oxygen atoms (the "18O0 label state") and is shown as black bars in Fig. 2a. The second theoretical distribution is that which would result if all of the molecules of peptide exchanged exactly one of their C-terminal oxygen atoms for an oxygen atom from the H218O used for labeling (the 18O1 label state). If the purity of the H218O were 100%, the 18O1 distribution would be exactly the same as that of the 18O0 label state except shifted to the right by 2 Da and thus would be indistinguishable. As will be explained later, H218O of 90% purity is used instead, resulting in the 18O1 distribution shown with light gray bars in Fig. 2a. Similarly the third theoretical distribution (18O2 label state) would result if all of the peptides exchanged both of their C-terminal oxygen atoms for two oxygen atoms drawn from the 90% pure H218O. The three theoretical isotopic distributions are arranged in the columns of a matrix, X, as shown in Fig. 2b (note that in this figure, the matrices are shown transposed). The X matrix is scaled so that each column sums to 1. A column of ones is also added to represent the intercept. For a further discussion of the regression model, see Eckel-Passow et al. (23).


Figure 2
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 2. Theoretical isotopic distributions and resulting matrices. a, for each potential isotope cluster, the algorithm generates three theoretical isotopic distributions corresponding to the complete exchange of zero (black bars), one (light gray bars), or two (dark gray bars) of the C-terminal oxygen atoms with atoms drawn from the 90% H218O. b, these three distributions are arranged in the columns of a matrix X, shown here transposed. A column of ones is added for the intercept. Four rows of zeros are added before the monoisotopic mass and after the relative isotopic abundances are all less than 1% to prevent problematic alignments at the tails of the distribution. The experimental peak heights are arranged in a vector y, also shown transposed. c, an example isotope cluster with an expected 18O/16O ratio of 5. The algorithm utilizes the peak data (indicated by gray bars), not the spectral data (black line). The predicted isotope abundances (y) are indicated by circles.

 
The experimental peak abundances are arranged in a column vector, y, also shown in Fig. 2b. Because it is not known beforehand which experimental peak is the monoisotopic mass (all 12C, all 14N, all 16O, etc.), a number of different y vectors, corresponding to different alignments of theoretical isotopes to experimental peaks, are attempted. Indeed the monoisotopic mass may not even be detected experimentally as in the case of species highly up-regulated in the labeled sample. Fig. 2b shows only the y vector of the correct alignment with the monoisotopic mass marked A; different alignments can be imagined by shifting this vector to the left or right. Where an experimental peak is not detected for a given theoretical isotope, 0 is inserted for the corresponding element of y.

The goal then is to determine the linear combination of the columns of X that most closely resembles all the experimental peaks. This is done using non-negative least squares regression (24, 25), which defines the best fit as that with the lowest sum-of-squares error. The result of this regression is a column vector ß with elements representing the amount of experimental abundance that can be accounted for by exchange of zero (ß1), one (ß2), or two (ß3) of the C-terminal oxygen atoms or by an abundance offset (ß0). The abundance parameters ß1–3 are constrained to be non-negative because they represent the amount of abundance in each of the three label states. By using a least squares fit, the abundances of all peaks in the isotope cluster are used in determining the abundances from sample A and sample B ({theta}A and {theta}B; described below) instead of just the abundances of A, A + 2, and A + 4 peaks as in previous work (5, 15). Mirgorodskaya et al. (21) also used regression analysis for quantification with 18O-labeled internal standards but measured the relative abundances of the 18O1 and 18O2 species in a separate experiment rather than allowing them to vary in the regression.

The regression is performed for each possible alignment of theoretical distribution to experimental peaks. The alignment with the lowest residual sum-of-squares error in the fit is chosen as correct; this is similar in concept to the alignment procedure described by Senko et al. (16). In choosing the correct alignment, it is important to account for the fact that the isotopic distribution of complete incorporation of two 18O atoms closely resembles that of the natural abundance distribution and that the two distributions are thus indistinguishable. This means that very low residuals can be obtained by a partial match to only the rightmost peaks of a cluster (incorrectly assuming, for instance, that the experimental 18O2 is the theoretical 16O2) because this fit will have fewer peaks and therefore less error. To prevent this, four zeros are added to each end of the X matrix when determining the correct alignment; these zeros (shown in gray in Fig. 2b) introduce high residuals for those problematic alignments at the tails of the distribution that would otherwise effectively ignore a large amount of experimental abundance. However, they also bias the parameter estimates. Therefore, once the correct alignment is found, a version of X lacking these extra "isotopes" is fit to the y vector for the correct alignment, and the resulting ß is used for all further calculations.

The result of this final non-negative least squares fitting is an estimate of the vector ß where elements 1–3 represent the abundances contributed by the exchange of zero, one, or two of the C-terminal oxygen atoms, respectively, and element 0 represents an intercept term. These values take into account and correct for the presence of both the naturally occurring isotopes (13C, 34S, etc.) and also for the e.g. 90% impurity of the H218O. However, to convert these values into the original sample abundances {theta}A (unlabeled; 16O) and {theta}B (labeled; 18O), we must take into account the fact that trypsin has different affinities for different peptide pseudosubstrates. Two events must occur for a peptide molecule to incorporate an 18O atom. 1) That peptide must react (exchange) with trypsin, and 2) an 18O atom must be chosen from the 90% H218O. The probability of a given peptide molecule having at least one interaction with trypsin is equal to (1 – eKt) where K is a reaction constant that varies from peptide to peptide and t is reaction time; we denote this probability pc. If a given peptide molecule reacts at least once with trypsin, the probability of obtaining an 18O atom is equal to the purity of the H218O, denoted p. Thus, for a given peptide molecule c, the probability of incorporating a C-terminal 18O atom is as follows.

Formula 1(Eq. 1)

Conversely the probability of not incorporating an 18O atom is (1 – sc). The probability sc can be thought of as an incorporation rate or "effective purity."

There are four sets of molecules we must take into consideration: 1) molecules from sample A, which never contain 18O, 2) molecules with no exchange events in sample B, 3) molecules where exactly one of the C-terminal oxygen atoms has had at least one exchange event in sample B, and 4) molecules where both C-terminal oxygen atoms have had at least one exchange in sample B. In the single step digestion procedure, the probability of a molecule having no exchange events in sample B is zero. This is because, in the single step procedure, peptide cleavage and labeling both occur simultaneously in H218O. However, in the two-step digestion, there is the possibility that a peptide may be cleaved in H216O but then never (re)react with trypsin in H218O. In this procedure, the observed abundance in the A peak will be a combination of abundances from unlabeled molecules in both sample A and sample B. Thus, ß1 is equal to the abundance of molecules from sample A plus those molecules from sample B where neither of the C-terminal oxygen atoms has exchanged. ß2 is equal to the abundance of molecules from sample B where only one of the C-terminal oxygen atoms has exchanged. The probability of this occurring is equal to the probability of having one 18O atom exchange and the other not exchange; there are two ways this could be done, and thus the final probability is 2pc(1 pc). ß3 is equal to the abundance of molecules from sample B where both C-terminal oxygen atoms have exchanged as follows.

Formula 2(Eq. 2)


Formula 3(Eq. 3)


Formula 4(Eq. 4)

Solving sequentially for pc, {theta}B, and {theta}A yields the following.

Formula 5(Eq. 5)


Formula 6(Eq. 6)


Formula 7(Eq. 7)

Thus, by assuming that the rate of incorporation of an 18O atom at the C terminus of a peptide is not influenced by the presence there of another 18O atom, we can use a function of the ratio of one to two incorporations of 18O (visible experimentally as the ratio ß23) to estimate the incorporation rate sc. Then the amounts of material in control ({theta}A) and disease ({theta}B) samples can be corrected for the incorporation rate using our estimate of sc. When the three parameters ß1–3 are non-zero, we calculate the ratio of material in sample B (labeled) to that in sample A (unlabeled) as {theta}B/{theta}A, making use of the correction for sc as shown above. This requires that we detect at least the 18O0, 18O1, and 18O2 peaks to obtain estimates for ß1, ß2, and ß3 and that these estimates are not constrained to zero by the non-negative regression. In cases where at least one of the parameter estimates is constrained to zero, we directly use the ratio (ß2 + ß3)/ß1. In both cases we constrain to a minimum or maximum ratio to avoid dividing by zero. We could also use sc to filter out those peptides that, due to a low incorporation rate, do not accurately report the true ratio of amounts. Fig. 2c demonstrates that the regression also provides predicted isotopic abundances, denoted y and shown as circles.

To evaluate the goodness of fit between the experimental and predicted isotopic abundances, we compute a fitting "score" as in Equation 8,

Formula 8(Eq. 8)

where the sum of the squared residuals from the fit is divided by the sum of experimental peak abundances. The denominator serves to normalize the error from the regression for comparison across isotope clusters with varying abundances and numbers of peaks. Lower scores are better in that clusters that more closely agree with their predicted abundances will have lower scores. Optimal scoring is an area of active investigation for our group.

Step Three: Decide between Charge Assignments—
The third and final step of the algorithm further examines the candidates from step one that share common peaks. Recall that missing peaks, noise peaks, and overlapping isotope clusters introduce complications into the charge assignment process. Additional fits are attempted to decide which of these assignments are correct, and these additional fits result in isotope clusters that share peaks. Furthermore this peak sharing can be either legitimate (when two real isotope clusters overlap) or artifactual as in the case of noise peaks or missing peaks (see examples in Supplemental Fig. S1). A strict rule might be to never allow a peak to participate in more than one isotope cluster, but this would incorrectly eliminate some clusters that legitimately shared peaks.

Two competing factors must be accounted for in deciding between isotope clusters that share peaks. Isotope clusters with better (lower) fitting scores should be preferred over those with worse fitting scores. However, fitting score is influenced by the number of peaks in the isotope cluster: clusters with fewer peaks will have less error. This means that, for some isotope clusters, an incorrect assignment with lower charge state (and therefore fewer peaks) will have a better fitting score than a correct assignment with higher charge state. Therefore, we must also evaluate the amount of abundance that the isotope cluster accounts for out of the surrounding peaks. Obviously isotope clusters that use more of the surrounding peaks (i.e. a higher charge state) should be preferred over those that use fewer of them (i.e. a lower charge state).

We combine these two competing factors by computing a charge assignment score for each isotope cluster in the transitive closure of isotope clusters that share peaks (if an isotope cluster A shares peaks with B and B shares peaks with C, then ABC constitutes the transitive closure of isotope clusters that share peaks). We compute a charge assignment score for each isotope cluster in the transitive closure as the squared error from the fitting for this isotope cluster plus the squared abundances for all the other peaks in the transitive closure that were not used by that cluster (denoted y') divided by the sum of abundances of all peaks in the transitive closure as follows.

Formula 9(Eq. 9)

We then rank each isotope cluster by charge assignment score from lowest to highest and examine each in turn in this order. Isotope clusters that share peaks and have charge states that are even multiples of each other are not allowed to co-exist. The algorithm decides among clusters that violate this sharing rule by picking the one with the best (i.e. lowest) charge assignment score. In this way, legitimate peak sharing is allowed (Supplemental Fig. S1b has overlapping clusters with z = 2 and z = 3, which are allowed), but the incorrect, duplicate clusters introduced by noise or missing peaks are removed.

Evaluation of Alignment Correctness—
A key factor influencing the correctness of the ratios produced by the algorithm is whether it determines the correct monoisotopic mass, that is to say whether it determines the correct alignment between the theoretical and experimental isotopic distributions. When the algorithm incorrectly assigns the monoisotopic mass, it generally assigns either the 18O1 or 18O2 (A + 2 or A + 4) isotope as the monoisotopic mass, introducing a shift in mass of 2 or 4 Da. Although there are indeed some species that differ by exactly a small, integer number of daltons, the most likely explanation for such a mass error is that the algorithm has incorrectly assigned the monoisotopic mass. Because peptides with masses less than ~4,000 daltons have readily detectable monoisotopic masses, determining the correct alignment is trivial using the chromatogram from the sample digested in H216O. We used these two pieces of knowledge (±4-Da shift and correct monoisotopic mass from 16O sample) to verify the algorithm when fitting variable ratios.

We ran the algorithm, using a single theoretical isotopic distribution corresponding to the natural isotopic abundances for the chemical composition predicted by averagine, against the 16O chromatogram and then combined the appearance of species at multiple charge states, in multiple spectra chromatographically, and across each of the samples by using mass and retention time thresholds of 20 ppm and 120 s, respectively, as described previously for our label-free approach (26). (We also used thresholds in retention time to remove artifact species, such as polymer introduced by the chromatography, as described by Mason et al. (27).) For each species we determined a consensus monoisotopic mass across all isotopic clusters for that species as their abundance weighted average of monoisotopic mass.

Next we applied the algorithm to all nine chromatograms each of the 90 and 99% enrichment datasets but allowed the algorithm to fit unknown 18O/16O ratios. We then used the same grouping procedure on this "unknown" data with one exception: we allowed two isotope clusters A and B to be grouped across multiple spectra and multiple charge states if mass(A) = mass(B) + n x 1.00235 Da ± 20 ppm where n varies from 0 to 4. We then rated the correctness of the algorithm in determining the monoisotopic mass for each cluster in the unknown data as whether its monoisotopic mass is the same (within 20 ppm) as the consensus monoisotopic mass from the 16O data.


    RESULTS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
Fig. 3 characterizes the ratios found in the chromatograms (see "Materials and Methods" for description of datasets) of purified proteins digested in 90% H218O. There is one row for each of seven different, known ratios with all 18O at the top and all 16O at the bottom; the 10 and 0.1 ratios were omitted for clarity (the complete figure is available upon request). Fig. 3a shows scatter plots of the isotope clusters found, with fitting score (Equation 8) on the x axis (plotted on a log base 10 scale) versus predicted 18O/16O ratio on the y axis (also log base 10 scale). The ratio expected in each row is printed in the left-hand margin and is indicated by the solid line; the dashed line indicates a ratio of 1. Lower scores represent better agreement between theoretical and experimental isotopic distributions. Points are shaded by the abundance of the most abundant peak in each isotope cluster. Between scores of ~0.1 and 0.5, very few isotope clusters of high abundance are observed, and their ratios vary widely from the expected. Manual inspection of 50 isotope clusters with a range of abundances and with scores between 0.1 and 0.105 revealed that 42 of 50 were correctly assigned. However, at scores between 0.2 and 0.205, only 5 of 50 were correct. Based on this we have used a threshold value of 0.1, shown by a solid vertical line. Note that at very high score (~0.5) more abundant peaks are again observed; manual inspection revealed that many of these occur when noise peaks are separated (by random chance) from a highly abundant peak by exactly 1/z. This results in very high residuals for these incorrect fits.


Figure 3
View larger version (31K):
[in this window]
[in a new window]

 
FIG. 3. Observed 18O/16O ratios from nine chromatograms with known ratios. a, scatter plots of isotope clusters detected by the algorithm in seven LC-FT-ICR chromatograms of purified proteins trypsin digested in H216O and in 90% H218O and then mixed to form the expected 18O/16O ratios indicated in the left margin (from all 18O at the top to all 16O at the bottom; note that the 10 and 0.1 ratios are omitted for clarity). Fitting score is plotted on the x axis, and the calculated 18O/16O ratio is on the y axis (both log10 scale). Lower scores indicate better fit; the vertical line indicates the threshold level of 0.1 used in this work. Points are colored according to whether the algorithm correctly determined the alignment between theoretical and experimental isotopic distributions (see text for automated procedure used to evaluate correctness): green indicates that the alignment was correct, red indicates incorrect, and black indicates that correctness could not be determined. The abundance of the most abundant peak in the isotope cluster is indicated by shade (log10 scale shown at bottom). The very poor fits at the right-hand edge of the plot occur when noise peaks are separated (by random chance) from a highly abundant peak by exactly 1/z. b, histograms of the 18O/16O ratio of those isotope clusters from a with fitting score <0.1. The y axis is the same as in a, and the stacked bars are colored to indicate alignment correctness as in a. Truncated bars (indicated by slashes) are one-third their correct length. Inf, infinity.

 
One factor in the ability of the algorithm to determine the correct ratio is whether it can determine which of the peaks of an isotope cluster is the monoisotopic mass, particularly for peptides highly up-regulated in the labeled sample; put another way, it is important for the algorithm to determine the correct "alignment" between the theoretical and experimental isotopic distributions. The isotope clusters in Fig. 3a are colored green if the alignment was determined to be correct using the procedure described above and red if it was incorrect; isotope clusters whose correct alignment could not be determined because their mass and retention time could not be matched to the 16O sample are colored black.

Fig. 3b shows a histogram, for each of the nine expected ratios, of the number of isotope clusters (horizontal axis) that have a given ratio (vertical axis, which has the same scale as in a). Only those clusters with scores less than the threshold of 0.1 are shown. Stacked bars are colored red, green, or black to indicate alignment correctness (note: truncated bars are one-third the correct length). In general, the experimental ratios correspond well with the expected ones out to expected ratios of about 20 and 0.05; the distributions do broaden, however, at the more extreme ratios. Some bias toward 16O is observed in the ratios; reverse labeling experiments and/or normalization may be useful in correcting this.

Fig. 4 shows FT-ICR mass spectra of a single tryptic peptide of BSA, having been digested in (a) normal H216O, (b) H218O reported by the vendor as >99% pure, and (c) 90% pure H218O. In Fig. 4b the majority of the peptide molecules have incorporated two 18O atoms, producing the expected 4-Da mass shift. As can be seen from comparing Fig. 4, a and b, the isotopic abundance distributions of 16O- and that of 18O-labeled peptides have a very similar shape (imagine shifting the isotopic distribution in Fig. 4a 4 daltons higher). Thus, when a peptide exists in high abundance in one sample but is at a level below the detection limit in the other, it is not possible to tell from which sample it originated. This determination depends in large part on the alignment step in the algorithm that attempts to determine which peak is the monoisotopic peak; if the wrong peak is decided upon, the algorithm will compute incorrect ratios. Moreover the C-terminal peptide of each protein in the sample will not incorporate 18O label because it is (typically) not a substrate for trypsin. A spectrum such as in Fig. 4, a or b, could then be the result of a peptide that is highly up-regulated, is highly down-regulated, or exists only in one of the samples; it could also result from a C-terminal peptide.


Figure 4
View larger version (7K):
[in this window]
[in a new window]

 
FIG. 4. Examples of the same peptide labeled in H216O (a), >99% H218O (b), and ~90% H218O (c). Note that the small peaks at m/z ~654.4 and 654.9 are due to incomplete exchange and correspond to the 18O1 and 18O113C1 species, respectively.

 
Comparison of Fig. 4, a and c, shows that this ambiguity can be partially resolved by using H218O of less than 100% purity, resulting in a distinct pattern of peaks. Fig. 4c shows the same peptide digested in 90% H218O. In this example, ~1/3 of the peptide molecules have incorporated only a single 18O atom, and a small fraction, ~4%, have not incorporated any. Thus, utilization of 90% H218O incorporates a signature to discriminate, for example, a C-terminal peptide that is unlabeled from a highly up-regulated, labeled (18O2) peptide.

Fig. 5 shows the advantage of using impure H218O in distinguishing extreme ratios by depicting histograms of the number of correct versus incorrect alignments for each of the nine expected 18O/16O ratios in both 99 and 90% H218O. Across a range of abundance bins (reported as the base 10 logarithm of the abundance of the most abundant peak in each isotope cluster), the number of isotope clusters where the alignment was determined to be correct is shown in green, whereas the number with incorrect alignments are shown in red. The number of isotope clusters where the correct alignment could not be determined (see above for procedure) is shown in black. Only isotope clusters with scores less than the threshold of 0.1 are included. In Fig. 5a the digest was performed in 99% pure H218O. At lower ratios (more 16O), the algorithm has difficulty determining the correct alignment, as shown by the large number of isotope clusters with incorrect alignments, because the expected distributions are indistinguishable to the algorithm. However, when the purity of the H218O is dropped to 90%, as shown in Fig. 5b, the algorithm does better as indicated by the lower number of incorrect alignments at low 18O/16O ratios. This is because the additional impurity causes a distinct pattern of peaks in the extreme 18O case where most of the abundance in the combined sample comes from the labeled sample (where the 18O/16O ratio is high). The algorithm uses this fact to distinguish the two extremes of labeling during the alignment process in step two. At abundances levels where the small (~1%) peaks that distinguish the pure 18O case fall below the specific signal to noise level of the instrument, the algorithm again struggles to determine the correct alignment as indicated by the increasing fraction of red at the lower abundances in both a and b at all ratios. Ratios closer to 1 are easier to detect at lower abundance because they are more likely to have all three of the A, A + 2, and A + 4 peaks. Work to determine the optimum level of impurity is ongoing. Note that the total number of isotope clusters is not equal between Fig. 5, a and b, due to differences in chromatography.


Figure 5
View larger version (17K):
[in this window]
[in a new window]

 
FIG. 5. Effect of impure H218O on alignment correctness. a, histograms of abundances of the most abundant peak from isotope clusters found in nine LC-FT-ICR chromatograms of purified proteins trypsin-digested in H216O and in 99% H218O and then mixed to form the expected 18O/16O ratios indicated in the left margin (from all 18O at the top to all 16O at the bottom). Stacked bars are colored by the number of isotope clusters with correct (green), incorrect (red), and unknown (black) alignment between theoretical and experimental isotopic distributions. b, as in a but with 90% H218O. By decreasing the purity of the water the performance of the algorithm in finding the correct alignment is improved at low 18O/16O ratios. Note that due to differences in chromatography, the total number of clusters is not comparable between a and b. Inf, infinity.

 
The variable incorporation rate of 18O into individual peptides is another factor that can influence the correctness of the ratios computed by the algorithm. Different peptides can have different incorporation rates because trypsin has different affinities for them. A peptide with a low incorporation rate will show an artificially low 18O/16O ratio because the labeled sample will contain an artificially high amount of 18O0 and 18O1 and an artificially low amount of 18O2. The effect of this is the same as if the purity of the H218O used for labeling were lower. Various efforts have been made to drive the reaction further toward complete incorporation of two 18O atoms (11, 28, 29). Increasing the concentration of trypsin in the labeling reaction will also increase the reaction rate. One commercially available kit (12) (Prolytica, Stratagene) achieves this by utilizing two digestion steps: a longer, high volume digestion in H216O with solution-based trypsin followed by a second, shorter digestion in H218O with immobilized trypsin. The use of immobilized trypsin allows for higher trypsin concentrations without increasing the number of autolytic fragments resulting from simply increasing the trypsin concentration. However, in the single step digestion procedure, if a peptide is seen, it must have had at least one opportunity for exchange; this is not true in the two-step digestion procedure. The algorithm attempts to correct for low incorporation rates by computing an effective purity, denoted s, using a function of the 18O2/18O1 ratio. It then subtracts some abundance from the unlabeled sample and adds it to the labeled sample to account for peptide molecules with no 18O atoms that remain in the labeled sample (Equations 57).

This approach is demonstrated in Fig. 6, which shows a histogram of the distribution of the rate of incorporation of 18O, indicated by the effective purity sc, across isotope clusters found in the nine chromatograms with known ratios. Higher sc values indicate more efficient labeling, which would suggest higher affinity of trypsin for a given peptide. Three representative peptides are shown in a, b, and c in both the 16O sample (top) used to determine the correct monoisotopic mass and the all 18O sample (bottom), which should show complete incorporation; the predicted abundance distribution (y) is indicated by circles. Moving from right to left, Fig. 6c shows a peptide with a high sc value, indicating substantial incorporation of 18O. Fig. 6b shows a peptide at intermediate sc where less than half the molecules have incorporated one 18O atom and almost none have incorporated two. Finally in Fig. 6a a peptide with a low sc value is shown, indicating poor incorporation; indeed very few molecules have incorporated one 18O atom, and essentially none have incorporated two 18O atoms. Peptides with such a low incorporation rate are problematic because it is difficult to correctly estimate the incorporation rate when so little 18O2 is seen. In general, these peptides are simply not amenable to quantification.


Figure 6
View larger version (29K):
[in this window]
[in a new window]

 
FIG. 6. Histogram of the number of isotope clusters with a given incorporation rate, indicated by the effective purity (sc). Data from all nine chromatograms of purified proteins digested in H216O and 90% H218O and mixed in varying ratios are shown. Higher incorporation rates indicate more of the peptide molecules have had both C-terminal 16O atoms exchanged for 18O atoms. The algorithm calculates the effective purity (sc) using a function of the ratio of abundance resulting from incorporation of two 18O atoms (ß3) over the abundance resulting from one 18O incorporation (ß2). Insets show examples of peptides with low (a), intermediate (b), and high (c) incorporation rates digested in both 100% H216O (top) and 90% H218O (bottom).

 
Table II gives, for each of the nine chromatograms at different expected ratios, information about the number of clusters found and the number of species (peptides) identified by grouping those clusters using tolerances in mass and retention time. Only those isotope clusters with scores less than 0.2 are considered. Over 500 species were found even in a digest containing only two proteins; we believe that large amounts of chymotryptic cleavage, nonspecific cleavage, post-translational modification, and salt adduction account for this high number of species (data not shown). We report the percent of total ion current (TIC) that could be determined to belong to peaks involved in isotope clusters detected by the algorithm both before and after grouping. On average, 61% of the ion current could be accounted for by the algorithm. The percentage after grouping reflects the fact that we require at least three isotope clusters to find a species; on average about 53% of TIC could be accounted for by the algorithm after grouping. Finally the algorithm took on average 45 s to process an 80-min chromatogram (of about 1,150 spectra) or about 26 spectra/s.


View this table:
[in this window]
[in a new window]

 
TABLE II Performance of RAAMS on 9 chromatograms with known ratios

 
To give some sense of how the algorithm performs on more complex mixtures, Fig. 7 shows peptides found in a single SCX fraction of pooled human plasma that had been depleted of the six most abundant proteins. Isotope clusters detected by the algorithm that represented occurrences of peptides at multiple charge states and in multiple spectra chromatographically were combined using tolerances of 20 ppm and 120 s as described by Mason et al. (27). Fig. 7a plots the 18O/16O ratio of the most abundant ion for each peptide versus the fitting score of that ion. Lower scores indicate better fit, and larger ratios indicate species up-regulated in the disease sample. An example of a highly up-regulated peptide with an 18O/16O ratio of ~50 is shown in Fig. 7b where circles indicate the predicted isotopic abundances (y). Fig. 7c shows a slightly down-regulated peptide (18O/16O {approx} 0.4) with a very good fit (score, 0.007). Fig. 7d shows a peptide with a neutral mass of 4,224 Da; the algorithm calculates a 18O/16O ratio of 2.8. This isotopic distribution would be challenging to interpret manually or with previous ratio calculations. It should be noted that, as the mass of peptides increase, their isotopic distributions become dominated by 13C isotopes, and they broaden considerably. Above ~2,500 Da, the naturally occurring 13C makes it more difficult to detect the presence of the 18O introduced by the labeling.


Figure 7
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 7. Ratios observed in a complex sample. a, scatter plot of peptides detected in a single SCX fraction from two pools of depleted human plasma, one trypsin-digested in H216O and the other in 90% H218O (dataset 3, Fig. 1). Isotope clusters from the algorithm were grouped into species using mass (20 ppm) and retention time tolerances (120 s) as described in the text; for each peptide, the fitting score (x axis) and 18O/16O ratio (y axis) of the most abundant isotope cluster in that peptide are plotted. Several examples are highlighted on the plot and shown in b–d where circles indicate the expected isotopic abundances (y). A highly up-regulated peptide (18O/16O {approx} 50; b) and a slightly down-regulated peptide (c) both have extremely good fitting scores. d, a peptide with neutral mass 4,233.9 would be difficult to correctly interpret either manually or using previous ratio metrics. As the mass of the peptide increases, it becomes increasingly difficult for the algorithm to both find the correct alignment and accurately calculate the ratio.

 
Parenthetically we note that averagine yields only an approximate chemical composition for a given neutral mass; the predicted chemical composition can be wrong, particularly in the number of sulfurs. We examined the effect of allowing the algorithm to vary the number of sulfurs up or down from that predicted by averagine and determined that the effect of this, although improving scores for some peptides, was minimal (data not shown).


    DISCUSSION
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
We borrow many concepts from the THRASH algorithm (20), which we have used previously as part of a label-free biomarker discovery strategy (15). The chief distinction between this work and THRASH is that this work is amenable to 18O-labeled data where complex overlapping isotopic distributions preclude the single theoretical isotopic distribution used by THRASH. However there are other, possibly more subtle, differences we wish to point out.

Our algorithm makes use of centroided peak lists. This is in sharp contrast to THRASH, which does not perform a priori peak detection and instead directly examines the full-profile spectral data. Whereas we only attempt alignments that correspond to the apexes of peaks (typically 10–12 alignments per cluster), THRASH, in contrast, tries a series of several thousand alignments in very small m/z increments assuming each of the several thousand samples in the spectral region under examination may contain the monoisotopic mass. This makes THRASH much more robust in its ability to detect low abundance clusters with peaks below the signal to noise threshold of a peak picking algorithm but also makes it orders of magnitude slower. Also although the present work uses linear regression to find the best fit in a least squares manner in both m/z and abundance, THRASH iterates an overall abundance parameter to scale the theoretical isotopic distribution being aligned.

Our algorithm attempts all allowed charge states that are consistent with the peak data; THRASH, in contrast, uses a Fourier-Patterson charge state determination algorithm (30) to determine the prevailing charge state in a spectral region. THRASH only falls back to fitting all allowed charge states if the fit resulting from the charge state determined by the Fourier-Patterson is poor. Our method relies on the fact that performing the regression is very fast and that far fewer possible alignments are attempted. THRASH also depends on a subtraction step in which the predicted isotopic abundances are subtracted from the original spectrum. Experience has shown this subtraction to be problematic in that it often leaves residuals behind that may have spectral characteristics, resulting in the detection of artifactual isotope clusters with acceptable fitting scores. We avoid the subtraction by simply fitting the two clusters separately.


    CONCLUSIONS
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 
In conclusion, we have described and demonstrated the effectiveness of an algorithm that automatically interprets spectra of 18O-labeled isotope clusters. It utilizes linear regression to model the complex, overlapping isotopic distributions that result in these spectra. The algorithm allows for correction of peptide to peptide variability of incorporation rate, particularly when using a common two-step digestion/labeling procedure. It also provides for the use of impure H218O to distinguish strongly up-regulated peptides from strongly down-regulated and C-terminal peptides. The algorithm operates on centroided peak data and does not require fragmentation spectra; it is fast, interpreting an entire 1,500-spectrum chromatogram in less than a minute. This provides for the possibility that the algorithm could be used on line during acquisition to direct the instrument to perform MS/MS on peptides with interesting ratios. This could help ameliorate the fact that in a typical MS/MS experiment one must extensively separate complex samples or risk missing low abundance peptides due to the limited capacity of the instrument to perform fragmentation.

The algorithm was applied here to FT-ICR data but should be generally applicable to any instrument with sufficient resolving power. Also although it was designed for 18O data, it handles unlabeled isotopic distributions as a special case and could be used to interpret label-free experiments as well. It could also be easily adapted for other labeling moieties. In the future we hope to improve our scoring scheme, to incorporate knowledge of the expected distribution of the incorporation rate (sc), and to incorporate protein identification data when available.


    ACKNOWLEDGMENTS
 
We thank Douglas Mahoney for helpful discussions and Keith Purrington for early work on a Javascript spectrum viewer used in examining the output of the algorithm. Rudi Chiarito refined the viewer and contributed to the software library (libmprc) that underlies our work.


   FOOTNOTES
 
Received, April 19, 2006, and in revised form, October 20, 2006.

Published, MCP Papers in Press, October 26, 2006, DOI 10.1074/mcp.M600148-MCP200

1 The abbreviations used are: iTRAQ, isobaric tag for relative and absolute quantitation; SILAC, stable isotope labeling of amino acids in cell culture; THRASH, thorough high resolution analysis of spectra by Horn; RAAMS, regression analysis applied to mass spectrometry; SCX, strong cation exchange; TIC, total ion current. Back

2 The variables used are: {theta}A, predicted amount of material in sample A (16O, unlabeled sample); {theta}B, predicted amount of material in sample B (18O, labeled sample); X, matrix of theoretical isotopic abundances, see Fig. 2; y, column vector of experimentally observed peak heights, see Fig. 2; y, column vector of predicted peak heights; ß, column vector of parameter estimates with ß1, ß2, and ß3 corresponding to the amounts of 18O0, 18O1, and 18O2 (zero, one, or two incorporations), respectively, and ß0 corresponding to the intercept; p, fractional purity of H218O, in this work 90% H218O (p = 0.9) is prepared by mixing H216O and the vendor’s >99% H218O; s, effective purity of 18O in the isotopic distribution of an individual peptide, higher s values indicate more incorporation of 18O. Back

* This work was supported by NIDDK Grant R01DK 70179-2 and NCI Grants R33 CA105295 (to D. C. M.) and R25 CA92049 (to J. E. E.-P.) from the National Institutes of Health, the Fraternal Order of Eagles Cancer Research Fund, the Lustgarten Foundation for Pancreatic Cancer Research, the James M. Kemper Biomarker Research Fund, the Gordon C. and Elizabeth W. Gilroy Fellowship in Proteomic Research, and the Mayo Clinic College of Medicine. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Back

S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. Back

{ddagger}{ddagger} To whom correspondence should be addressed: 3-115 Medical Sciences Building, Mayo Clinic College of Medicine, 200 1st St. S. W., Rochester, MN 55902. Tel.: 507-538-0381; Fax: 507-284-8433; E-mail: bergen.bob{at}mayo.edu


    REFERENCES
 TOP
 ABSTRACT
 MATERIALS AND METHODS
 COMPUTATIONAL DATA...
 RESULTS
 DISCUSSION
 CONCLUSIONS
 REFERENCES
 

  1. Baker, M. (2005) In biomarkers we trust? Nat. Biotechnol. 23, 297 –304[CrossRef][Medline]

  2. Diamandis, E. P. (2004) Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Mol. Cell. Proteomics 3, 367 –378[Abstract/Free Full Text]

  3. Ransohoff, D. F. (2005) Lessons from controversy: ovarian cancer screening and serum proteomics. J. Natl. Cancer Inst. 97, 315 –319[Abstract/Free Full Text]

  4. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845 –867[Abstract/Free Full Text]

  5. Yao, X., Freas, A., Ramirez, J., Demirev, P. A., and Fenselau, C. (2001) Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal. Chem. 73, 2836 –2842[Medline]

  6. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold, R. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994 –999[CrossRef][Medline]

  7. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., and Pappin, D. J. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3, 1154 –1169[Abstract/Free Full Text]

  8. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A., and Mann, M. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1, 376 –386[Abstract/Free Full Text]

  9. Hedstrom, L. (2002) Serine protease mechanism and specificity. Chem. Rev. 102, 4501 –4524[CrossRef][Medline]

  10. Schnolzer, M., Jedrzejewski, P., and Lehmann, W. D. (1996) Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by electrospray and matrix-assisted laser desorption/ionization mass spectrometry. Electrophoresis 17, 945 –953[CrossRef][Medline]

  11. Yao, X., Afonso, C., and Fenselau, C. (2003) Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates. J. Proteome Res. 2, 147 –152[CrossRef][Medline]

  12. Stratagene (2004) ProlyticaTM 18O Labeling Kit Instruction Manual, Stratagene, La Jolla, CA

  13. Bantscheff, M., Dumpelfeld, B., and Kuster, B. (2004) Femtomole sensitivity post-digest 18O labeling for relative quantification of differential protein complex composition. Rapid Commun. Mass Spectrom. 18, 869 –876[CrossRef][Medline]

  14. Stewart, I. I., Thomson, T., and Figeys, D. (2001) 18O labeling: a tool for proteomics. Rapid Commun. Mass Spectrom. 15, 2456 –2465[CrossRef][Medline]

  15. Johnson, K. L., and Muddiman, D. C. (2004) A method for calculating 16O/18O peptide ion ratios for the relative quantification of proteomes. J. Am. Soc. Mass Spectrom. 15, 437 –445[CrossRef][Medline]

  16. Senko, M. W., Beu, S. C., and McLafferty, F. W. (1995) Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 6, 229 –233

  17. Halligan, B. D., Slyper, R. Y., Twigger, S. N., Hicks, W., Olivier, M., and Greene, A. S. (2005) ZoomQuant: an application for the quantitation of stable isotope labeled peptides. J. Am. Soc. Mass Spectrom. 16, 302 –306[CrossRef][Medline]

  18. Eng, J. K., McCormack, A. L., and Yates, J. R., III (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976 –989[CrossRef]

  19. Qian, W. J., Monroe, M. E., Liu, T., Jacobs, J. M., Anderson, G. A., Shen, Y., Moore, R. J., Anderson, D. J., Zhang, R., Calvano, S. E., Lowry, S. F., Xiao, W., Moldawer, L. L., Davis, R. W., Tompkins, R. G., Camp, D. G., II, and Smith, R. D. (2005) Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach. Mol. Cell. Proteomics 4, 700 –709[Abstract/Free Full Text]

  20. Horn, D. M., Zubarev, R. A., and McLafferty, F. W. (2000) Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom. 11, 320 –332[CrossRef][Medline]

  21. Mirgorodskaya, O. A., Kozmin, Y. P., Titov, M. I., Korner, R., Sonksen, C. P., and Roepstorff, P. (2000) Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using 18O-labeled internal standards. Rapid Commun. Mass Spectrom. 14, 1226 –1232[CrossRef][Medline]

  22. Rockwood, A. L., and Van Orden, S. L. (1996) Ultrahigh-speed calculation of isotope distributions. Anal. Chem. 68, 2027 –2030[CrossRef]

  23. Eckel-Passow, J. E., Oberg, A. L., Therneau, T. M., Mason, C. J., Mahoney, D. W., Johnson, K. L., Olson, J. E., and Bergen, H. R., III (2006) Regression analysis for comparing protein samples with 16O/18O stable-isotope labeled mass spectrometry. Bioinformatics 22, 2739 –2745[Abstract/Free Full Text]

  24. Mantel, N. (1969) Restricted least-squares regression and convex quadratic programming. Technometrics 11, 763 –773[CrossRef]

  25. Lawson, C. L., and Hanson, R. J. (1974). Solving Least Square Problems , Prentice Hall, Englewood Cliffs, NJ

  26. Johnson, K. L., Mason, C. J., Muddiman, D. C., and Eckel, J. E. (2004) Analysis of the low molecular weight fraction of serum by LC-dual ESI-FT-ICR mass spectrometry: precision of retention time, mass, and ion abundance. Anal. Chem. 76, 5097 –5103[Medline]

  27. Mason, C. J., Johnson, K. L., and Muddiman, D. C. (2005) Reproducibility of retention time using a splitless nanoLC coupled to an ESI-FTICR mass spectrometer. J Biomol. Tech . 16, 414 –422[Medline]

  28. Liu, P., and Regnier, F. E. (2002) An isotope coding strategy for proteomics involving both amine and carboxyl group labeling. J. Proteome Res. 1, 443 –450[CrossRef][Medline]

  29. Reynolds, K. J., Yao, X., and Fenselau, C. (2002) Proteolytic 18O labeling for comparative proteomics: evaluation of endoprotease Glu-C as the catalytic agent. J. Proteome Res. 1, 27 –33[CrossRef][Medline]

  30. Senko, M. W., Beu, S. C., and McLafferty, F. W. (1995) Automated assignment of charge states from resolved isotopic peaks for multiply charged ions. J. Am. Soc. Mass Spectrom. 6, 52 –56


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?


This article has been cited by other articles:


Home page
Brief Funct Genomic ProteomicHome page
X. Ye, B. Luke, T. Andresson, and J. Blonder
18O Stable Isotope Labeling in MS-based Proteomics
Brief Funct Genomic Proteomic, March 1, 2009; 8(2): 136 - 144.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow All Versions of this Article:
M600148-MCP200v1
6/2/305    most recent
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Glossary
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mason, C. J.
Right arrow Articles by Bergen, H. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mason, C. J.
Right arrow Articles by Bergen, H. R., III
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 All ASBMB Journals   Journal of Biological Chemistry 
 Journal of Lipid Research   ASBMB Today 
Advertisement
spacer
Advertisement
Advertisement