Investigating the Quantitative Nature of MALDI-TOF MS*S

MALDI-TOF MS has been applied by several groups to relative quantitative measurements. At the same time, the non-quantitative character of this method has been widely reported. We conducted experiments to test the reliability of this technique for quantitation using the statistical method of inverse confidence limit calculation for the first time in this context. The relationship between relative intensities of known amounts of standard peptides and their concentration ratios was investigated. We found that the concentration ratios determined by relative intensity measurements were highly inaccurate and strongly influenced by the molecular milieu of the sample analyzed. Thus, we emphasize the necessity of using the sample itself for calibration. We also performed experiments using an isotope-labeled derivative of the analyte as an internal standard for calibration line generation. As expected, the use of such standard led to a dramatic increase in precision and a less pronounced improvement in accuracy. We recommend performing a similar statistical analysis as a demonstration of reliability for every system where MALDI-TOF MS is used for quantitative measurements.

MALDI-TOF MS has been applied by several groups to relative quantitative measurements. At the same time, the non-quantitative character of this method has been widely reported. We conducted experiments to test the reliability of this technique for quantitation using the statistical method of inverse confidence limit calculation for the first time in this context. The relationship between relative intensities of known amounts of standard peptides and their concentration ratios was investigated. We found that the concentration ratios determined by relative intensity measurements were highly inaccurate and strongly influenced by the molecular milieu of the sample analyzed. Thus, we emphasize the necessity of using the sample itself for calibration. We also performed experiments using an isotope-labeled derivative of the analyte as an internal standard for calibration line generation. As expected, the use of such standard led to a dramatic increase in precision and a less pronounced improvement in accuracy. We recommend performing a similar statistical analysis as a demonstration of reliability for every system where MALDI-TOF MS is used for quantitative measurements.

Molecular & Cellular Proteomics 7: 2410 -2418, 2008.
MALDI-TOF MS is a widely used analytical technique because of its excellent sensitivity, relatively high speed, and simplicity. As a consequence of producing mostly singly charged ions (1,2) MALDI is especially suitable for mixture analysis. It is commonly used in proteomic studies (3) frequently without any prefractionation (4) as well as in the analysis of other biomolecules. Lately MALDI-TOF MS has become a popular method for quantitative analysis of biomolecules (oligonucleotides, proteins, glycoproteins, etc.) originating from various sample types (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22) or even for imaging tissue sections (13)(14)(15)(16)(17)(18)(23)(24)(25)(26). At the same time, several studies discuss the non-quantitative nature of MALDI-TOF MS (7, 8, 13, 19, 22, 26 -30). The main problem of MALDI is its poor reproducibility (sample to sample and shot to shot), which makes most quantitative measurements quite difficult at best (19,27,31). This is especially true for complex samples. The presence of so-called hot spots or sweet spots is the main cause of the poor reproducibility and is therefore a factor strongly increasing measurement times and complicating automated measurements. Furthermore this phenomenon is one of the major factors hampering the use of MALDI MS for spatially resolved imaging (27). The reasons for sweet spot formation are not completely understood. It could be caused by variation in the amount and the crystallization of the matrix, the latter being a function of local analyte concentration. Dependence of signal intensities on the orientation of the matrix-analyte crystals relative to the spectrometer axis is also possible (27). The extent of hot spot formation depends strongly on the matrix used. For example, preparations with ␣-cyano-4-hydroxycinnamic acid (CHCA) 1 deliver relatively homogenous samples (27) (which is why CHCA was used as the matrix in our experiments and in most other quantitative studies). On the other hand, dramatically nonlinear relationships of relative signal intensities and concentrations have been described despite the preparation of homogenous samples (32,33). The most common attempt to overcome the problem of poor reproducibility is the use of internal standards (5, 6, 8, 10, 11, 19 -22, 34). Using an isotopically labeled analogue of the analyte as the internal standard seems to be the best solution (6,8). A variety of stable isotope labeling methods have recently been developed for relative quantitation, including isotope-coded affinity tags (ICAT), isobaric tags for relative and absolute quantitation (iTRAQ), stable isotope labeling with amino acids in cell culture (SILAC) (35)(36)(37)(38)(39), 18 O (40 -42), and absolute quantitation of abundance (AQUA) (43). iTRAQ labeling is unique in this list because the differently labeled components yield the same nominal masses, and thus the quantitation can be accomplished only from MS/MS data.
Another phenomenon hindering the use of MALDI for quantitative measurements is the suppression effect, which has also been discussed in a number of publications (19,29,30,32,33,44). Suppression can especially distort the data when complex biological samples are analyzed that contain thousands of components in a wide concentration range. Other disadvantages of MALDI are the differential ionization efficiencies of different compounds of even similar chemical nature and the limited dynamic range due to saturation of the mass spectrometer detector (7,8,21,22). Fluctuation of laser power also contributes to the variable signal intensities of analytes. Finally peaks of compounds with a similar mass can overlap.
Recently various factors have been proposed to account for the variability of detected signals (29, 44 -49), such as the hydrophobicity and basicity of the peptides, the applied matrix, the amount of various components, or the complexity of the mixture. Several publications claim that peptides that are more hydrophobic or more basic are more detectable (44,50). Promising results were reported when signal reproducibility was improved by using a matrix-comatrix system (21,48,51) or using a "sandwich" (52) or other improved sample preparation methodology (1,(53)(54)(55). A single component matrix and solvent-free sample separations (28,56) and ionic liquid matrix also called room temperature ionic liquids (6,7,12,27,57,58) were also used to improve the detectability of the analytes and to acquire calibration curves with good linearity and accuracy for quantitation. Because current MALDI "theory" and "models" are still crude and far from having predictive power (59), a number of studies of the MALDI mechanism have been published with the goal of widening the understanding and improving the reproducibility and application of MALDI (2, 29, 45, 46, 59 -66). The most current representation is the energy transfer-induced disproportionation model that considers that a pseudo-proton transfer occurs during the crystallization process in MALDI sample preparation (65). Other publications claim that the final, detected mass spectrum is in many cases predominantly determined by secondary reactions that occur in the MALDI plume (29,47,59).
In this study, we used statistical procedures for the quantitative analysis of data sets acquired by MALDI-TOF MS using the most frequently applied matrix and sample preparation method, CHCA and dried droplets. Relative quantitation based on these measurements proved to be of low accuracy, but we identified possibilities to improve these measures and introduce a novel way to address the reliability (or the lack of it) of the calculated results.
Mass Spectrometry-A Reflex III MALDI-TOF mass spectrometer (Bruker Daltonics, Karlsruhe, Germany) equipped with a SCOUT 384 probe ion source and pulsed nitrogen laser (337 nm) was used in our experiments. Spectra were acquired with delayed extraction in reflectron, positive ion mode. In all experiments the matrix deflection high voltage ("matrix suppressor") was set at 800 Da, and the upper mass limit considered was m/z 4000. The laser was operated slightly above the threshold level. Manual measurements were performed. 1000 laser shots were summed for each spectrum, and the spots were randomly but evenly sampled. The quality of the spectra was controlled during the measurements. Saturated spectra and spectra with base peak absolute intensity Ͻ1200 were discarded. Sodium adducts were taken into consideration in all displayed data except where stated otherwise.
Data Analysis-Data processing was done using Bruker Daltonics FlexAnalysis Version 2.0 (Build 21) software. The Sophisticated Numerical Annotation Procedure (SNAP) peak detection algorithm was applied for processing of the spectrum; this included internal baseline correction and noise determination (signal to noise threshold, 3; quality factor threshold, 30). The relative signal intensities (I rel ) were also determined by the FlexAnalysis software.
Plotting of graphs, calculation of coefficients of determination (R 2 ), and F-and t tests were done using Microsoft Excel. Calculating the concentration ratio (XЈ) from relative signal intensity ratios (yЈ) was performed with Equation The 95% confidence limits of the above value (the inverse confidence limits) were calculated with Equation 2 (Equation 25 of Ref. 67).
where x is the concentration ratio; y is the relative intensity ratio; y is the mean of the relative intensity ratios of the original n measurements; x is the mean of the concentration ratios of the original n measurements; y Ј is the mean of the relative intensity ratios of the additional nЈ measurements corresponding to the unknown concentration ratio (XЈ); Q y ϭ ⌺(y Ϫ y ) 2 , Q x ϭ ⌺(x Ϫ x ) 2 , and Q xy ϭ ⌺(y Ϫ y )⅐⌺(x Ϫ x ) are sums of the original relative intensity ratios (y) and concentration ratio (x) values; b is the regression coefficient, b ϭ Q xy /Q x ; s yx is the standard deviation of the original relative intensity ratios (y) about the regression Y and is based on n Ϫ 2 degrees of freedom, s yx ϭ ͱ͑Q y Ϫ Q 2 xy /Q x ͒/͑n Ϫ 2͒; t 0.05 ϭ two sided 5% significance level of t for n Ϫ 2 degrees of freedom; and C ϭ b 2 Ϫ t 0.05 2 ⅐s yx ⅐B. Sample Preparation-Fetuin tryptic digest was mixed with known amounts of synthetic peptides or BSA tryptic digest or cytochrome c peptides (Table I). Low absorbance plastic vials (Costar, Corning, NY) were used to minimize adsorption of the peptides. A mixture of 1 l of sample solution and 0.5 l of the matrix was loaded on the target. To assess reproducibility each sample was prepared in duplicate, and each sample was measured five times, yielding 10 measurements for each data point. Freshly grown Escherichia coli cell suspension was pelleted and resuspended in water three times. Randomly selected spectra are shown in supplemental Fig. S-19 for the demonstration of the background and the signal to noise ratio.

RESULTS AND DISCUSSION
To investigate the reliability and reproducibility of MALDI-TOF-MS-based quantitation, we studied complex mixtures of known composition and subjected the data to statistical procedures that so far have not been used in this context.
Studying the Linearity of a Series of Single Measurements-Sample series (Table I) were measured under carefully controlled conditions. The observed relative intensity ratios of selected components were plotted against their concentration ratios. Supplemental  Tables S-1 and S-2, respectively.
These data show that relative intensity-based quantitation strongly depends on the peptide of interest and on the reference molecule and is highly inaccurate. For example, although the Arg-peptide concentration was constant in one of the experiments (Fig. 1a) its relative intensity significantly increased when the N-terminal fetuin peptide (m/z 1072.60) was selected as reference.
Investigating the Reproducibility of Quantitative Measurements-Further complex samples were examined. Regression lines were fitted to multiple data points, and confidence limits were displayed as a measure of variation and reproducibility. Twenty-six such graphs were generated ( Fig. 2a and supplemental Figs. S-2 through S-5) by using various peptides of the fetuin tryptic digest (supplemental Table S The significance of the regression and its linearity was calculated using F (1,81) -and F (7,81) -tests, respectively. Afterward we calculated the significance of the slope using a t (88) test. Finally the R 2 was determined. In all cases, the regression was highly significant (p Ͻ 0.001), whereas the divergence from linearity was not significant (p Ͼ 0.05), meaning the regression was linear. The slopes of the lines for the Leu-peptide were quite low (0.0099 -0.1117), but the slopes for the Arg-peptide were higher (0.36 -2.94). Nevertheless the slope of every line analyzed was significantly different from 0 (p Ͻ 0.001). The R 2 value ranged from 0.2241 to 0.868 (supplemental Table S-3).
The observed 95% confidence limits were relatively wide along with a low R 2 value. This demonstrates the large vari-ation and low reproducibility of the data despite the significance of the regression.
Examining the Quality of Calibration Lines Used for Quantitative Measurements-For MALDI-TOF-MS-based quantitation, most groups generate calibration lines considering only the mean values of several relative intensity ratio measurements (6 -10, 12). Many groups emphasize the precision of their calibration curves by referring to the high R 2 values calculated in this manner (6, 8 -10, 12). Indeed following this path we also obtained much higher R 2 values (Fig. 2, b versus a, and supplemental Figs. S-6 to S-9 versus S-2 to S-5) ranging from 0.845 to 0.9885 (supplemental Table S-3). However, this approach leads to an erroneous overestimation of precision by ignoring the large variation of relative intensities. In our opinion, the confidence limits of the calculated values are a lot more informative about the reliability of the measurements. In our case, this means calculating the 95% confidence limits of the concentration ratio corresponding to the measured relative intensity ratio values. From here on, we refer to such limits as inverse confidence limits. The uncertainty of the calculated concentration ratio values derives from the uncertainty of the regression line and the variation of the measured relative intensity ratio values around their predicted value. The inverse type of confidence limit calculation displayed above (Equation 2) takes both sources into account (67). It is different and slightly more tedious than calculating the confidence limits of the "y" (relative intensity ratio) values shown in Fig. 2a.
Displaying the inverse 95% confidence limits allows the investigator to rate the trustworthiness of the calculated value for each individual estimate of concentration ratio. We thus determined the inverse 95% confidence limits of the Arg-peptide/ fetuin concentration ratios calculated from 10 relative intensity ratio measurements in each case ( Fig. 3 and supplemental Fig.  S-10). Apart from one result (1.03 versus 1), the calculated ratio fell far from the actual value. Also the wide confidence limits observed in every case would most probably prohibit the use of these results for further calculations. Altogether neither the accuracy nor the reproducibility of the calculated concentration ratios were found acceptable in the system examined.

TABLE I Samples prepared for the investigation of the quantitative nature of MALDI-TOF
The quantities of fetuin digest, Arg-peptide, and E. coli cell suspension were held constant, whereas the quantity of the synthetic peptide was increased from 50 fmol to 1 pmol. For all of these experiments, CHCA was used as the matrix. Mixtures analyzed are marked by "ϩ."

Examining the Correlation between the Relative Signal Intensities of Two Peptides of Constant Concentrations within a
Mixture-To test the assumption that the signal of the analyte is proportional to that of the standard, we plotted the relative signal intensities of two peptides against each other with both peptide concentrations kept constant (supplemental Fig.   S-11). The R value (coefficient of correlation) was not significant in six of the nine cases examined, indicating that the factors affecting the signal intensities of the peptides do not act on them to an equal extent.
This means that in our system the use of a standard did not fulfill its purpose and marks one of the basic reasons leading to the low reproducibility of data. In the cases examined, the two peptides were of similar molecular mass (1385.56 versus 1046.48 Da) but of different composition. Several studies reported the use of structural analogues (8,22), derivatives (8,10,11,19), functional analogues (9), molecules of similar hydrophobicity (12), or isotopically labeled derivatives (5,6,8,11) of the analyte as internal standards. Some of these have indeed published calibration lines with quite small mean relative standard deviations (6,12). However, direct statistical Known amounts of Arg-peptide were mixed with 100 fmol of fetuin digest. a, all data were plotted. Dashed lines delimit the 95% confidence belts of the measured intensity ratios of the indicated peptides. b, the means of the relative intensity ratios corresponding to common relative concentrations were plotted. Error bars represent S.D. comparison of the use of different internal standards was still needed to provide evidence for the advantage of using structural analogues, derivatives, or isotopically labeled molecules for such purposes (see below).
Comparing the Slopes of Regression Lines Obtained for the Same Peptide Pair under Different Conditions-Several studies use external calibration to calculate unknown analyte concentrations. However, the molecular composition of the calibration mixtures is different from that of the samples analyzed. To assess the effect of the molecular milieu on the slope of the calibration line, slopes of standard lines obtained in the presence or absence of E. coli cells were compared. Table II displays the slope of the relative intensity ratio plotted versus the concentration ratio of the Arg-peptide with different fetuin peptides used as standards (lines "a"). Below each is the data of the same calibration line acquired in the presence of E. coli cells (lines "b"). Their pairwise comparison was performed with a t test, the result of which is indicated as t ab values in the last column. Similarly calibration lines were made for the Arg-peptide using various fetuin peptides as standards but with a simultaneously increasing concentration of Leupeptide within the mixture (lines "c"). The slopes of these lines were also compared with those without Leu-peptides using t tests, marked with t ac .
In all cases, the change of the molecular milieu resulted in a highly significant alteration of the slopes of the calibration lines, indicating the error of using an external calibration for concentration calculations. This stands in good agreement with the observations of Tholey et al. (7). To circumvent this problem, several groups use the sample itself for calibration after adding an "internal standard" molecule. However, these studies neglect the endogenous presence of the analyte, leading to a decrease in accuracy of the calibration line. The obvious solution is to use two isotopically labeled versions of the molecule of interest for the calibration process, one as the variable component of calibration and another as the internal standard. However, this approach may not provide a solution for protein quantitation because such large molecules are better detected in linear mode where the isotopically labeled molecules may not be effectively resolved. It is also expensive and may not be practical or feasible in many cases.
Statistical Analysis of Quantitative Measurements Using Stable Isotope-labeled Peptides-To numerically express the advantage of using a stable isotope-labeled derivative of the analyte for standardization, one of the tryptic peptides of  cytochrome c (P peptide) and its stable isotope-labeled derivative (P* peptide) were used in the following experiments. Two series of samples (Table I) were measured, and the same statistical analysis was performed on the data as above.
Again the regression between the concentration ratio and the relative intensity ratio of the two peptides was highly significant and linear with the slope being significantly different from 0 (supplemental Table S Fig. S-13a) and increased the R 2 value (0.961 and 0.9871, respectively). The low variation (narrow confidence belt) can be explained by the high correlation between the signal intensities of the two peptides within this concentration ratio range (e.g. supplemental Figs. S-14, a-g, and S-15, a-g), indicating that the factors influencing their detectability act on them to an equal extent. Correlation decreases above the concentration ratio of 5 (supplemental Figs. S-14, h and i, and S-15, h and i) and the consequently increasing variation of the relative intensity ratios and their nonlinear relation to the concentration ratios ( Fig. 4b and supplemental Figs. S-16 and S-17) explain the deterioration of the R 2 value and the wide confidence range. Interestingly the increase in variation was less profound when E. coli cells were included in the peptide mixture (supplemental Fig. S-17). This was probably the result of the apparently more uniform crystallization of the samples that contained E. coli cells. Nevertheless the presence of the bacterial cells significantly altered the slope of the calibration line (supplemental Table S-5), again emphasizing the need of using the sample itself for calibration.
Finally multiple series of 10 measurements were made with peptide mixtures of known concentration, and the concentration ratios and their 95% inverse confidence limits were calculated from the measured relative intensity values using Equations 1 and 2, respectively. Calibration lines fitted to the concentration ratio range of 0.5-5 were used. The calculated FIG. 4. I rel of normal (TGPNLHGLFGR, MH ؉ at m/z 1168. 62, P) and stable isotope-labeled derivative (TGPNL*HGLFGR, MH ؉ at m/z 1175.64, P*) peptides of cytochrome c versus their relative concentration (c rel ; c P /c P* ) in the mixture. Known amounts of P peptide were mixed with 100 fmol of fetuin digest and 100 fmol of P* peptide. a, all data were plotted. Dashed lines delimit the 95% confidence belts of the measured intensity ratios of the given peptides. b, the means of the relative intensity ratios corresponding to common relative concentrations were plotted. Error bars represent S.D.
FIG. 5. I rel of P and P* peptides of cytochrome c versus their relative concentration (c rel ; c P /c P* ) in the mixture within the concentration ratio range of 0.5 < c P /c P* < 5. Known amounts of P peptide were mixed with 100 fmol of fetuin digest and 100 fmol of P* peptide. All data were plotted. Dashed lines delimit the 95% confidence belts of the measured intensity ratios of the given peptides.
concentration ratios fell closer to the actual values than they did when calibration was made using unrelated internal standards (Fig. 6, A-D, and supplemental Fig. S-18, A and B, versus Fig. 3 and supplemental Fig. S-10). Furthermore the 95% inverse confidence ranges were significantly narrower than before (Fig. 6, A-D, and supplemental Fig. S-18, A and B, versus Fig. 3 and supplemental Fig. S-10), indicating the increased reliability of the results calculated this way. However, accuracy and precision (reproducibility) of the calculated values deteriorated when the calibration line was fitted to a wider range of concentration ratios (Fig. 6, E-H, and supplemental Fig. S-18, C and D). This indicates that in our system the difference in the amount of the analyte and the standard must not be greater than 5-fold.
Conclusions-Based on the above observations thorough statistical analysis should be performed whenever MALDI-TOF-MS data are used for quantitation. We recommend the calculation and publishing of the 95% inverse confidence limits of the estimated concentration ratios. Should these confidence limits be too far apart, one might improve the regression by increasing the number of measurements (n) used for calibration. Averaging of many laser shots over a large sample area is also clearly desirable. Another way to narrow confidence limits is to increase the number of measurements (nЈ) corresponding to the unknown concentration ratio (XЈ). This means that if the inhomogeneity of the sample is being analyzed (e.g. in the case of imaging tissue sections) a large number of measurements must be averaged in each coordinate. Finally because mixtures can exhibit complex interferences that are not completely understood, quantitation should be performed ideally within the sample with isotopically labeled versions of the molecule(s) of interest. The use of isotope-labeled compounds as standards for calibration can lead to a dramatic increase in precision and a less pronounced improvement in accuracy. However, our results indicate that the linearity can be maintained only within an ϳ5-fold concentration difference. Thus, preliminary measurements are needed to approximate the concentration range of the compounds investigated prior to generation of the calibration line used for exact quantitation.
FIG. 6. Reliability of calculated data using isotope-labeled analogue for calibration. The concentration ratios (XЈ) calculated from the mean values of 10 relative signal intensity ratio measurements (yЈ) are displayed along with the 95% confidence limits of the former value, represented by error bars. P and P* peptide concentrations were held constant: c P /c P* ϭ 1. The calibration line was generated using concentration ratio range between 0.5 and 5.0 (measurements A-D) or 0.5 and 10.0 (measurements E-H). 100 fmol of fetuin digest was present in each sample, and 2 ϫ 10 6 cfu of E. coli were present in case of B, D, F, and H measurements.