Abstract
MALDITOF MS has been applied by several groups to relative quantitative measurements. At the same time, the nonquantitative character of this method has been widely reported. We conducted experiments to test the reliability of this technique for quantitation using the statistical method of inverse confidence limit calculation for the first time in this context. The relationship between relative intensities of known amounts of standard peptides and their concentration ratios was investigated. We found that the concentration ratios determined by relative intensity measurements were highly inaccurate and strongly influenced by the molecular milieu of the sample analyzed. Thus, we emphasize the necessity of using the sample itself for calibration. We also performed experiments using an isotopelabeled derivative of the analyte as an internal standard for calibration line generation. As expected, the use of such standard led to a dramatic increase in precision and a less pronounced improvement in accuracy. We recommend performing a similar statistical analysis as a demonstration of reliability for every system where MALDITOF MS is used for quantitative measurements.
MALDITOF MS is a widely used analytical technique because of its excellent sensitivity, relatively high speed, and simplicity. As a consequence of producing mostly singly charged ions (1, 2) MALDI is especially suitable for mixture analysis. It is commonly used in proteomic studies (3) frequently without any prefractionation (4) as well as in the analysis of other biomolecules. Lately MALDITOF MS has become a popular method for quantitative analysis of biomolecules (oligonucleotides, proteins, glycoproteins, etc.) originating from various sample types (5–22) or even for imaging tissue sections (13–18, 23–26). At the same time, several studies discuss the nonquantitative nature of MALDITOF MS (7, 8, 13, 19, 22, 26–30). The main problem of MALDI is its poor reproducibility (sample to sample and shot to shot), which makes most quantitative measurements quite difficult at best (19, 27, 31). This is especially true for complex samples. The presence of socalled hot spots or sweet spots is the main cause of the poor reproducibility and is therefore a factor strongly increasing measurement times and complicating automated measurements. Furthermore this phenomenon is one of the major factors hampering the use of MALDI MS for spatially resolved imaging (27). The reasons for sweet spot formation are not completely understood. It could be caused by variation in the amount and the crystallization of the matrix, the latter being a function of local analyte concentration. Dependence of signal intensities on the orientation of the matrixanalyte crystals relative to the spectrometer axis is also possible (27). The extent of hot spot formation depends strongly on the matrix used. For example, preparations with αcyano4hydroxycinnamic acid (CHCA)^{1} deliver relatively homogenous samples (27) (which is why CHCA was used as the matrix in our experiments and in most other quantitative studies). On the other hand, dramatically nonlinear relationships of relative signal intensities and concentrations have been described despite the preparation of homogenous samples (32, 33). The most common attempt to overcome the problem of poor reproducibility is the use of internal standards (5, 6, 8, 10, 11, 19–22, 34). Using an isotopically labeled analogue of the analyte as the internal standard seems to be the best solution (6, 8). A variety of stable isotope labeling methods have recently been developed for relative quantitation, including isotopecoded affinity tags (ICAT), isobaric tags for relative and absolute quantitation (iTRAQ), stable isotope labeling with amino acids in cell culture (SILAC) (35–39), ^{18}O (40–42), and absolute quantitation of abundance (AQUA) (43). iTRAQ labeling is unique in this list because the differently labeled components yield the same nominal masses, and thus the quantitation can be accomplished only from MS/MS data.
Another phenomenon hindering the use of MALDI for quantitative measurements is the suppression effect, which has also been discussed in a number of publications (19, 29, 30, 32, 33, 44). Suppression can especially distort the data when complex biological samples are analyzed that contain thousands of components in a wide concentration range. Other disadvantages of MALDI are the differential ionization efficiencies of different compounds of even similar chemical nature and the limited dynamic range due to saturation of the mass spectrometer detector (7, 8, 21, 22). Fluctuation of laser power also contributes to the variable signal intensities of analytes. Finally peaks of compounds with a similar mass can overlap.
Recently various factors have been proposed to account for the variability of detected signals (29, 44–49), such as the hydrophobicity and basicity of the peptides, the applied matrix, the amount of various components, or the complexity of the mixture. Several publications claim that peptides that are more hydrophobic or more basic are more detectable (44, 50). Promising results were reported when signal reproducibility was improved by using a matrixcomatrix system (21, 48, 51) or using a “sandwich” (52) or other improved sample preparation methodology (1, 53–55). A single component matrix and solventfree sample separations (28, 56) and ionic liquid matrix also called room temperature ionic liquids (6, 7, 12, 27, 57, 58) were also used to improve the detectability of the analytes and to acquire calibration curves with good linearity and accuracy for quantitation. Because current MALDI “theory” and “models” are still crude and far from having predictive power (59), a number of studies of the MALDI mechanism have been published with the goal of widening the understanding and improving the reproducibility and application of MALDI (2, 29, 45, 46, 59–66). The most current representation is the energy transferinduced disproportionation model that considers that a pseudoproton transfer occurs during the crystallization process in MALDI sample preparation (65). Other publications claim that the final, detected mass spectrum is in many cases predominantly determined by secondary reactions that occur in the MALDI plume (29, 47, 59).
In this study, we used statistical procedures for the quantitative analysis of data sets acquired by MALDITOF MS using the most frequently applied matrix and sample preparation method, CHCA and dried droplets. Relative quantitation based on these measurements proved to be of low accuracy, but we identified possibilities to improve these measures and introduce a novel way to address the reliability (or the lack of it) of the calculated results.
EXPERIMENTAL PROCEDURES
Materials—
The CHCA solution was purchased from Agilent Technologies (Santa Clara, CA). Angiotensin II (MH^{+} = 1046.54 Da), αmelanocytestimulating hormone (MH^{+} = 1664.80), adrenocorticotropic hormone (18–39) (MH^{+} = 2465.20), and bovine insulin βchain (MH^{+} = 3494.7), used for instrument calibration, were obtained from SigmaAldrich. BSA tryptic digest and fetuin tryptic digest were prepared inhouse, synthetic peptides RTAYSQSAY (Argpeptide, MH^{+} at m/z 1046.49) and LTAYSQSAY (Leupeptide, MH^{+} at m/z 1003.47) were kind gifts from C. Turck. Cytochrome c peptide, TGPNLHGLFGR, MH^{+} at m/z 1168.62 (P peptide) and its stable isotopelabeled counterpart, TGPNL*HGLFGR, MH^{+} at m/z 1175.64 (isotope composition: Leu*^{13}C_{6}, 98%; ^{15}N, 98%) (P* peptide) were obtained from Thermo (AQUA Demo kit, catalogue number 300100/lot number 0703_05). Concentrations of the stock solutions of the synthetic peptides were determined by amino acid analysis.
Mass Spectrometry—
A Reflex III MALDITOF mass spectrometer (Bruker Daltonics, Karlsruhe, Germany) equipped with a SCOUT 384 probe ion source and pulsed nitrogen laser (337 nm) was used in our experiments. Spectra were acquired with delayed extraction in reflectron, positive ion mode. In all experiments the matrix deflection high voltage (“matrix suppressor”) was set at 800 Da, and the upper mass limit considered was m/z 4000. The laser was operated slightly above the threshold level. Manual measurements were performed. 1000 laser shots were summed for each spectrum, and the spots were randomly but evenly sampled. The quality of the spectra was controlled during the measurements. Saturated spectra and spectra with base peak absolute intensity <1200 were discarded. Sodium adducts were taken into consideration in all displayed data except where stated otherwise.
Data Analysis—
Data processing was done using Bruker Daltonics FlexAnalysis Version 2.0 (Build 21) software. The Sophisticated Numerical Annotation Procedure (SNAP) peak detection algorithm was applied for processing of the spectrum; this included internal baseline correction and noise determination (signal to noise threshold, 3; quality factor threshold, 30). The relative signal intensities (I_{rel}) were also determined by the FlexAnalysis software.
Plotting of graphs, calculation of coefficients of determination (R^{2}), and F and t tests were done using Microsoft Excel. Calculating the concentration ratio (X′) from relative signal intensity ratios (y′) was performed with Equation 1 (Equation 26 of Ref. 67). The 95% confidence limits of the above value (the inverse confidence limits) were calculated with Equation 2 (Equation 25 of Ref. 67). where x is the concentration ratio; y is the relative intensity ratio; ȳ is the mean of the relative intensity ratios of the original n measurements; x̄ is the mean of the concentration ratios of the original n measurements; ȳ′ is the mean of the relative intensity ratios of the additional n′ measurements corresponding to the unknown concentration ratio (X′); Q_{y} = Σ(y − ȳ)^{2}, Q_{x} = Σ(x − x̄)^{2}, and Q_{xy} = Σ(y − ȳ)·Σ(x − x̄) are sums of the original relative intensity ratios (y) and concentration ratio (x) values; b is the regression coefficient, b = Q_{xy}/Q_{x}; s_{yx} is the standard deviation of the original relative intensity ratios (y) about the regression Y and is based on n − 2 degrees of freedom, s_{yx} = ; t_{0.05} = two sided 5% significance level of t for n − 2 degrees of freedom; and C = b^{2} − t_{0.05}^{2}·s_{yx}·B.
Sample Preparation—
Fetuin tryptic digest was mixed with known amounts of synthetic peptides or BSA tryptic digest or cytochrome c peptides (Table I). Low absorbance plastic vials (Costar, Corning, NY) were used to minimize adsorption of the peptides. A mixture of 1 μl of sample solution and 0.5 μl of the matrix was loaded on the target. To assess reproducibility each sample was prepared in duplicate, and each sample was measured five times, yielding 10 measurements for each data point. Freshly grown Escherichia coli cell suspension was pelleted and resuspended in water three times. Randomly selected spectra are shown in supplemental Fig. S19 for the demonstration of the background and the signal to noise ratio.
RESULTS AND DISCUSSION
To investigate the reliability and reproducibility of MALDITOFMSbased quantitation, we studied complex mixtures of known composition and subjected the data to statistical procedures that so far have not been used in this context.
Studying the Linearity of a Series of Single Measurements—
Sample series (Table I) were measured under carefully controlled conditions. The observed relative intensity ratios of selected components were plotted against their concentration ratios. Supplemental Fig. S1 shows an example of mixing the tryptic digests of two proteins; Fig. 1 displays the mixture of two synthetic peptides and a tryptic digest. Fetuin and BSA tryptic peptides chosen for the analyses and for normalization are listed in supplemental Tables S1 and S2, respectively.
These data show that relative intensitybased quantitation strongly depends on the peptide of interest and on the reference molecule and is highly inaccurate. For example, although the Argpeptide concentration was constant in one of the experiments (Fig. 1a) its relative intensity significantly increased when the Nterminal fetuin peptide (m/z 1072.60) was selected as reference.
Investigating the Reproducibility of Quantitative Measurements—
Further complex samples were examined. Regression lines were fitted to multiple data points, and confidence limits were displayed as a measure of variation and reproducibility. Twentysix such graphs were generated (Fig. 2a and supplemental Figs. S2 through S5) by using various peptides of the fetuin tryptic digest (supplemental Table S1) for normalization.
The significance of the regression and its linearity was calculated using F_{(1,81)} and F_{(7,81)}tests, respectively. Afterward we calculated the significance of the slope using a t_{(88)} test. Finally the R^{2} was determined. In all cases, the regression was highly significant (p < 0.001), whereas the divergence from linearity was not significant (p > 0.05), meaning the regression was linear. The slopes of the lines for the Leupeptide were quite low (0.0099–0.1117), but the slopes for the Argpeptide were higher (0.36–2.94). Nevertheless the slope of every line analyzed was significantly different from 0 (p < 0.001). The R^{2} value ranged from 0.2241 to 0.868 (supplemental Table S3).
The observed 95% confidence limits were relatively wide along with a low R^{2} value. This demonstrates the large variation and low reproducibility of the data despite the significance of the regression.
Examining the Quality of Calibration Lines Used for Quantitative Measurements—
For MALDITOFMSbased quantitation, most groups generate calibration lines considering only the mean values of several relative intensity ratio measurements (6–10, 12). Many groups emphasize the precision of their calibration curves by referring to the high R^{2} values calculated in this manner (6, 8–10, 12). Indeed following this path we also obtained much higher R^{2} values (Fig. 2, b versus a, and supplemental Figs. S6 to S9 versus S2 to S5) ranging from 0.845 to 0.9885 (supplemental Table S3). However, this approach leads to an erroneous overestimation of precision by ignoring the large variation of relative intensities. In our opinion, the confidence limits of the calculated values are a lot more informative about the reliability of the measurements. In our case, this means calculating the 95% confidence limits of the concentration ratio corresponding to the measured relative intensity ratio values. From here on, we refer to such limits as inverse confidence limits. The uncertainty of the calculated concentration ratio values derives from the uncertainty of the regression line and the variation of the measured relative intensity ratio values around their predicted value. The inverse type of confidence limit calculation displayed above (Equation 2) takes both sources into account (67). It is different and slightly more tedious than calculating the confidence limits of the “y” (relative intensity ratio) values shown in Fig. 2a.
Displaying the inverse 95% confidence limits allows the investigator to rate the trustworthiness of the calculated value for each individual estimate of concentration ratio. We thus determined the inverse 95% confidence limits of the Argpeptide/fetuin concentration ratios calculated from 10 relative intensity ratio measurements in each case (Fig. 3 and supplemental Fig. S10). Apart from one result (1.03 versus 1), the calculated ratio fell far from the actual value. Also the wide confidence limits observed in every case would most probably prohibit the use of these results for further calculations. Altogether neither the accuracy nor the reproducibility of the calculated concentration ratios were found acceptable in the system examined.
Examining the Correlation between the Relative Signal Intensities of Two Peptides of Constant Concentrations within a Mixture—
To test the assumption that the signal of the analyte is proportional to that of the standard, we plotted the relative signal intensities of two peptides against each other with both peptide concentrations kept constant (supplemental Fig. S11). The R value (coefficient of correlation) was not significant in six of the nine cases examined, indicating that the factors affecting the signal intensities of the peptides do not act on them to an equal extent.
This means that in our system the use of a standard did not fulfill its purpose and marks one of the basic reasons leading to the low reproducibility of data. In the cases examined, the two peptides were of similar molecular mass (1385.56 versus 1046.48 Da) but of different composition. Several studies reported the use of structural analogues (8, 22), derivatives (8, 10, 11, 19), functional analogues (9), molecules of similar hydrophobicity (12), or isotopically labeled derivatives (5, 6, 8, 11) of the analyte as internal standards. Some of these have indeed published calibration lines with quite small mean relative standard deviations (6, 12). However, direct statistical comparison of the use of different internal standards was still needed to provide evidence for the advantage of using structural analogues, derivatives, or isotopically labeled molecules for such purposes (see below).
Comparing the Slopes of Regression Lines Obtained for the Same Peptide Pair under Different Conditions—
Several studies use external calibration to calculate unknown analyte concentrations. However, the molecular composition of the calibration mixtures is different from that of the samples analyzed. To assess the effect of the molecular milieu on the slope of the calibration line, slopes of standard lines obtained in the presence or absence of E. coli cells were compared. Table II displays the slope of the relative intensity ratio plotted versus the concentration ratio of the Argpeptide with different fetuin peptides used as standards (lines “a”). Below each is the data of the same calibration line acquired in the presence of E. coli cells (lines “b”). Their pairwise comparison was performed with a t test, the result of which is indicated as t_{ab} values in the last column. Similarly calibration lines were made for the Argpeptide using various fetuin peptides as standards but with a simultaneously increasing concentration of Leupeptide within the mixture (lines “c”). The slopes of these lines were also compared with those without Leupeptides using t tests, marked with t_{ac}.
In all cases, the change of the molecular milieu resulted in a highly significant alteration of the slopes of the calibration lines, indicating the error of using an external calibration for concentration calculations. This stands in good agreement with the observations of Tholey et al. (7). To circumvent this problem, several groups use the sample itself for calibration after adding an “internal standard” molecule. However, these studies neglect the endogenous presence of the analyte, leading to a decrease in accuracy of the calibration line. The obvious solution is to use two isotopically labeled versions of the molecule of interest for the calibration process, one as the variable component of calibration and another as the internal standard. However, this approach may not provide a solution for protein quantitation because such large molecules are better detected in linear mode where the isotopically labeled molecules may not be effectively resolved. It is also expensive and may not be practical or feasible in many cases.
Statistical Analysis of Quantitative Measurements Using Stable Isotopelabeled Peptides—
To numerically express the advantage of using a stable isotopelabeled derivative of the analyte for standardization, one of the tryptic peptides of cytochrome c (P peptide) and its stable isotopelabeled derivative (P* peptide) were used in the following experiments. Two series of samples (Table I) were measured, and the same statistical analysis was performed on the data as above.
Again the regression between the concentration ratio and the relative intensity ratio of the two peptides was highly significant and linear with the slope being significantly different from 0 (supplemental Table S4). Yet the R^{2} (0.7842 and 0.9247 without or with E. coli, respectively) (Fig. 4a and supplemental Fig. S12a) and the 95% confidence range (46.77–47.84 and 10.32–10.55 wide, respectively) (Fig. 4a and supplemental Fig. S12a) displayed only a slight improvement compared with the nonisotope measurements (Fig. 4a and supplemental Fig. S12a versus Fig. 2a and supplemental Figs. S2 to S5). However, taking only the relative concentration range of 0.5–5 into account dramatically narrowed the confidence belt (4.01–4.09 and 1.37–1.40 wide, respectively) (Fig. 5 and supplemental Fig. S13a) and increased the R^{2} value (0.961 and 0.9871, respectively). The low variation (narrow confidence belt) can be explained by the high correlation between the signal intensities of the two peptides within this concentration ratio range (e.g. supplemental Figs. S14, a–g, and S15, a–g), indicating that the factors influencing their detectability act on them to an equal extent. Correlation decreases above the concentration ratio of 5 (supplemental Figs. S14, h and i, and S15, h and i) and the consequently increasing variation of the relative intensity ratios and their nonlinear relation to the concentration ratios (Fig. 4b and supplemental Figs. S16 and S17) explain the deterioration of the R^{2} value and the wide confidence range. Interestingly the increase in variation was less profound when E. coli cells were included in the peptide mixture (supplemental Fig. S17). This was probably the result of the apparently more uniform crystallization of the samples that contained E. coli cells. Nevertheless the presence of the bacterial cells significantly altered the slope of the calibration line (supplemental Table S5), again emphasizing the need of using the sample itself for calibration.
Finally multiple series of 10 measurements were made with peptide mixtures of known concentration, and the concentration ratios and their 95% inverse confidence limits were calculated from the measured relative intensity values using Equations 1 and 2, respectively. Calibration lines fitted to the concentration ratio range of 0.5–5 were used. The calculated concentration ratios fell closer to the actual values than they did when calibration was made using unrelated internal standards (Fig. 6, A–D, and supplemental Fig. S18, A and B, versus Fig. 3 and supplemental Fig. S10). Furthermore the 95% inverse confidence ranges were significantly narrower than before (Fig. 6, A–D, and supplemental Fig. S18, A and B, versus Fig. 3 and supplemental Fig. S10), indicating the increased reliability of the results calculated this way. However, accuracy and precision (reproducibility) of the calculated values deteriorated when the calibration line was fitted to a wider range of concentration ratios (Fig. 6, E–H, and supplemental Fig. S18, C and D). This indicates that in our system the difference in the amount of the analyte and the standard must not be greater than 5fold.
Conclusions—
Based on the above observations thorough statistical analysis should be performed whenever MALDITOFMS data are used for quantitation. We recommend the calculation and publishing of the 95% inverse confidence limits of the estimated concentration ratios. Should these confidence limits be too far apart, one might improve the regression by increasing the number of measurements (n) used for calibration. Averaging of many laser shots over a large sample area is also clearly desirable. Another way to narrow confidence limits is to increase the number of measurements (n′) corresponding to the unknown concentration ratio (X′). This means that if the inhomogeneity of the sample is being analyzed (e.g. in the case of imaging tissue sections) a large number of measurements must be averaged in each coordinate. Finally because mixtures can exhibit complex interferences that are not completely understood, quantitation should be performed ideally within the sample with isotopically labeled versions of the molecule(s) of interest. The use of isotopelabeled compounds as standards for calibration can lead to a dramatic increase in precision and a less pronounced improvement in accuracy. However, our results indicate that the linearity can be maintained only within an ∼5fold concentration difference. Thus, preliminary measurements are needed to approximate the concentration range of the compounds investigated prior to generation of the calibration line used for exact quantitation.
Acknowledgments
We thank Csaba Vizler for useful discussions and Bioscience Inc for providing a Thermo Inc. AQUA Demo kit.
Footnotes

Published, MCP Papers in Press, July 24, 2008, DOI 10.1074/mcp.M800108MCP200

↵ ^{1} The abbreviations used are: CHCA, αcyano4hydroxycinnamic acid; I_{rel}, relative signal intensities; R^{2}, coefficient of determination; cfu, colonyforming units.

↵* This work was supported by a fellowship from the G. Richter Centenarium Foundation (to E. S.) and by a grant from the Hungarian National Office for Research and Technology (to K. F. M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

↵S The online version of this article (available at http://www.mcponline.org) contains supplemental material.
 Received March 13, 2008.
 Revision received July 7, 2008.
 © 2008 The American Society for Biochemistry and Molecular Biology