Absolute Quantification of Proteins by LCMSE

Relative quantification methods have dominated the quantitative proteomics field. There is a need, however, to conduct absolute quantification studies to accurately model and understand the complex molecular biology that results in proteome variability among biological samples. A new method of absolute quantification of proteins is described. This method is based on the discovery of an unexpected relationship between MS signal response and protein concentration: the average MS signal response for the three most intense tryptic peptides per mole of protein is constant within a coefficient of variation of less than ±10%. Given an internal standard, this relationship is used to calculate a universal signal response factor. The universal signal response factor (counts/mol) was shown to be the same for all proteins tested in this study. A controlled set of six exogenous proteins of varying concentrations was studied in the absence and presence of human serum. The absolute quantity of the standard proteins was determined with a relative error of less than ±15%. The average MS signal responses of the three most intense peptides from each protein were plotted against their calculated protein concentrations, and this plot resulted in a linear relationship with an R2 value of 0.9939. The analyses were applied to determine the absolute concentration of 11 common serum proteins, and these concentrations were then compared with known values available in the literature. Additionally within an unfractionated Escherichia coli lysate, a subset of identified proteins known to exist as functional complexes was studied. The calculated absolute quantities were used to accurately determine their stoichiometry.

The study of proteins is crucial in understanding and combating disease through identification of proteins, discovering disease biomarkers, studying protein involvement in specific metabolic pathways, and identifying protein targets in drug discovery (1,2). An important technique that is used in these studies to quantify and identify peptides and/or proteins present in simple and complex mixtures is ESI-LCMS.
To date a majority of the quantitative proteomic analyses have been performed using stable isotope labeling strategies such as ICAT (3), iTRAQ TM (4), SILAC (stable isotope labeling by amino acids in cell culture) (5), and 18 O labeling (6,7). These methodologies require complex, time-consuming sample preparation and can be relatively expensive.
Recently there have been numerous reports applying labelfree methods to monitor the relative abundance of protein between different conditions (8 -11). Relative quantification provides information regarding specific protein abundance changes between two conditions caused by an induced perturbation (environment-induced, drug-induced, and diseaseinduced). These studies require comparison of identical proteolytic peptides in each of the two experiments to accurately determine relative ratios of the particular protein(s) of interest. Relative abundance values for each peptide to a given protein can then be obtained to quantitatively characterize the differential expression of proteins between different sample states. Many of these methods are based on determining the ratios of the peak area of identical peptides between different conditions. One critical factor limiting the quantitative reproducibility of these methods includes the ability to efficiently cluster the detected peptides. This in turn relies on the accuracy of the mass measurement and the chromatographic reproducibility. Although relative quantification monitors changes in protein abundance between two conditions, it does not determine the absolute quantity of these proteins.
The ability to determine the absolute concentration of a protein (or proteins) present within a complex protein mixture is valuable for the understanding of the underlying molecular biology guiding the response to an applied perturbation. Cellular responses are often controlled through direct and indirect interactions of proteins present in the cell. These coordinated interactions allow the cell to communicate a response across many cellular compartments. The cell can thereby execute an efficient and expeditious recruitment and production of critical proteins needed for adaptation. A method for determining the absolute quantity of proteins in a complex sample would enable determination of the stoichiometry of proteins within a sample and would facilitate understanding of the complicated biological network of cooperative protein interactions that guide cellular responses.
To date a technique capable of determining the absolute concentration of proteins in complex mixtures from a simple LCMS analysis without using specific internal standards for each protein has not been described. Recently Ishihama et al. (12) reported an emPAI 1 value that the authors suggest is directly proportional to the protein content in a protein mixture. The authors reported a quantitative deviation of 63% from the actual abundance using the described emPAI method. The method describes the correlation between the number of observed peptides to a protein and its absolute amount. As the amount of protein increases, the number of observed peptides to the protein also increases. This method is useful within a narrow protein concentration range whereby the observed peptides continue to increase linearly as a function of the amount of protein. However, once a higher protein concentration is reached and all the observable peptides have been identified, the relationship deteriorates to an asymptotic limit. Additionally this method relies on characterizing peptides using traditional data-directed MS/MS and may therefore be sensitive enough only to quantify the more abundant proteins present in a mixture.
A more traditional approach to determine the absolute concentration of a protein (or proteins) in a complex mixture involves the use of stable isotope-labeled peptides spiked into the mixture. This allows direct correlation between the stable isotope-labeled peptide and its naturally occurring analog (13). Kuhn et al. (13) have carried out this stable isotope dilution strategy by using synthetic 13 C-labeled peptides and multiple reaction monitoring as an analytical method for prescreening candidate protein biomarkers in human serum prior to antibody and immunoassay development.
Typically absolute quantification of proteins requires the use of one or more external reference peptides to generate a calibration-response curve for specific polypeptides from that protein (i.e. synthetic tryptic polypeptide product). The absolute quantification of the given protein is determined from the observed signal response for the specific polypeptide in the sample relative to that generated in the calibration curve. If the absolute quantification of a number of different proteins is to be determined, separate calibration curves are necessary for each specific external reference peptide for each protein.
Absolute quantification allows one not only to determine changes between two conditions but also to perform quantitative protein comparisons within the same sample.
Gerber et al. (14) describe a conventional technique for absolute quantification of proteins and their corresponding modified states in complex mixtures using a synthesized peptide as a reference standard. The reference peptide is chemically identical to one of the naturally occurring tryptic peptides of a given protein. The reference standard is introduced to a complex mixture. The mixture is analyzed using LCMS to measure the corresponding signal intensity for the derivative peptide along with the endogenous peptide. This intensity signal response is compared with an intensity calibration curve created using the introduced synthetic molecule to determine the amount of the endogenous protein in the mixture. A disadvantage with using synthetic peptides is that extra steps are required to synthesize an authentic sample and to later "spike" the synthetic standard prior to being able to determine the absolute quantity of the protein itself. To perform the absolute quantification for a number of proteins within a mixture would require one to provide a synthetic standard for each protein of interest.
Another technique for absolute quantification of proteins uses radiolabeled amino acids such as [ 35 S]methionine, whose specific activity is known (15,16). In this type of experiment, an amino acid, such as [ 35 S]methionine, is incorporated in the culture medium of the growing cell(s). As proteins are synthesized, [ 35 S]methionine is incorporated into the cellular proteins. Based on the extent of incorporation of the radiolabel, the absolute amount of the peptide or protein can be determined. These types of experiments are costly, require good standard operating procedures and specialized quantitative techniques, and can also be deleterious to the subject under study. Consequently determining absolute quantification of proteins using radiolabel techniques is limited to expendable biological systems such as microbes, plants, and cell cultures.
In this work we describe a method that provides absolute quantification of proteins from LCMS data of simple or complex mixtures of tryptic peptides without requiring the use of numerous external reference peptide(s) or the implementation of radiolabeling methods. The method describes how to obtain a single point calibration for the mass spectrometer that is applicable to the subsequent absolute quantification of all other characterized proteins within the complex mixture.

EXPERIMENTAL PROCEDURES
Preparation of Simple Protein Mixture-A 50 pmol/l stock of each predigested protein was prepared in 50 mM ammonium bicarbonate (pH 8.5) with 0.05% RapiGest TM to assist in the redissolving the lyophilized tryptic peptides (17). The samples were incubated at 60°C for 15 min to redissolve the lyophilized samples. The stock solutions of the individual protein digests were used to prepare a single stock solution of the six standard digested proteins (yeast enolase and alcohol dehydrogenase, rabbit glycogen phosphorylase, and bovine serum albumin and hemoglobin). The single stock solution of the six predigested proteins was prepared in 50 mM ammonium bicarbonate with 0.05% RapiGest such that each protein was present at the following concentration: glycogen phosphorylase B, 2.4 pmol/l; hemoglobin, 4.0 pmol/l; alcohol dehydrogenase, 4.0 pmol/l; serum albumin, 5.0 pmol/l; and enolase, 6.0 pmol/l. Six additional protein samples were prepared from a dilution series of the following stock solutions in ammonium bicarbonate with 0.05% RapiGest: 2-, 5-, 10-, 20-, and 50-fold. Each sample was diluted with an equal volume of 50 mM ammonium bicarbonate (pH 8.5) to reduce the concentration of RapiGest to 0.025%. The simple protein mixtures were centrifuged at 13,000 rpm for 10 min, and the supernatant was transferred into an autosampler vial for peptide analysis via LCMS E .

Preparation of Simple Protein Mixture in Human
Serum-An additional protein digest stock solution of the six standard proteins was prepared in human serum (ϳ1.5 g/l of total serum protein) containing 0.05% RapiGest such that each protein digest was at the following concentration: glycogen phosphorylase B from rabbit, 2.4 pmol/l; hemoglobin from a cow, 4.0 pmol/l; alcohol dehydrogenase from yeast, 4.0 pmol/l; serum albumin from a cow, 5.0 pmol/l; and enolase from yeast, 6.0 pmol/l. Six additional samples were prepared from a dilution series of the following stock solution in human serum with 0.05% RapiGest: 2-, 5-, 10-, 20-, and 50-fold.
Preparation of Human Serum for Biological Replicates-Human serum was prepared from seven individuals using BD Biosciences Vacutainers TM with clot activator as suggested by the manufacturer. A total of 300 -400 g of total serum protein (5 l) was digested according to the procedure outlined under "Protein Digest Preparation." After digesting the serum proteins, 1 pmol of purified tryptic enolase digest was added to each sample prior to LCMS analysis.
Media and Escherichia coli Growth Conditions-Frozen E. coli (ATCC10798, K-12) cell stocks were streaked onto Luria-Bertani (LB) plates and grown at 37°C. An individual colony was subsequently streaked onto a plate of M9 minimal medium supplemented with 0.5% sodium acetate and grown at 37°C. Seed cultures were generated by transferring single colonies into flasks of M9 minimal medium supplemented with 0.5% sodium acetate. Seed culture flasks were shaken at 250 rpm at 37°C until midlog phase (A 600 ϭ 0.9 -1.1). The seed culture was diluted 1 ml:500 ml into separate M9 minimal medium supplemented with 0.5% glucose. Flasks were shaken at 250 rpm at 37°C until midlog phase (A 600 ϭ 0.9 -1.1). The E. coli cell cultures were harvested by centrifugation, and pellets were frozen at Ϫ80°C. Frozen cells were suspended in 5 ml/1 g of biomass in lysis buffer (Dulbecco's phosphate-buffered saline ϩ 1:100 protease inhibitor mixture (Sigma catalog number 8340)) in a 50-ml Falcon tube. The cells were lysed by sonication in a Microson XL ultrasonic cell disrupter (Misonix, Inc.) at 4°C. The cell debris were removed by centrifugation at 15,000 ϫ g for 30 min at 4°C, and the resulting soluble protein extracts were diluted to 5 mg/ml with lysis buffer, dispensed into 50-l aliquots (250 g), and stored at Ϫ80°C for subsequent analysis. The E. coli protein extract was spiked with tryptic peptides from yeast enolase to a final concentration of 400 fmol/l before storing at Ϫ80°C.
Protein Digest Preparation-A 100-l aliquot of the human serum samples was reduced in the presence of 10 mM dithiothreitol at 60°C for 30 min. The protein was alkylated in the dark in the presence of 50 mM iodoacetamide at room temperature for 30 min. Proteolytic digestion was initiated by adding modified trypsin (Promega) at a concentration of 75:1 (total protein to trypsin, by weight) and incubated overnight at 37°C. Each digestion mixture was diluted to a final volume of 200 l with 50 mM ammonium bicarbonate (pH 8.5) to reduce the concentration of RapiGest detergent to 0.025%. Each sample was analyzed in triplicate. The tryptic peptide solution was centrifuged at 13,000 rpm for 10 min, and the supernatant was transferred into an autosampler vial for peptide analysis via LCMS. The LCMS E analysis was performed using 5 l of the final peptide mixture.
Approximately 250 g (50 l) of E. coli protein was suspended in a final volume of 100 l containing 50 mM ammonium bicarbonate (pH 8.5) and 0.05% RapiGest. The protein mixture was reduced and alkylated as described above.
HPLC Configuration-Capillary LC (CapLC) of tryptic peptides was performed with a Waters CapLC system equipped with a Waters NanoEase TM Atlantis TM C 18 , 300-m ϫ 15-cm reverse phase column. The aqueous mobile phase (mobile phase A) contained 1% acetonitrile in water with 0.1% formic acid. The organic mobile phase (mobile phase B) contained 80% acetonitrile in water with 0.1% formic acid. Samples (5-l injection) were loaded onto the column with 6% mobile phase B. Peptides were eluted from the column with a gradient of 6 -40% mobile phase B over 100 min at 4.4 l/min followed by a 10-min rinse of 99% mobile phase B. The column was immediately re-equilibrated at initial conditions (6% mobile phase B) for 20 min. The lock mass, [Glu 1 ]fibrinopeptide at 100 fmol/l, was delivered from the auxiliary pump of the CapLC system at 1 l/min to the reference sprayer of the NanoLockSpray TM source. All samples were analyzed in triplicate.
Mass Spectrometer Configuration-Mass spectrometry analysis of tryptic peptides was performed using a Waters/Micromass Q-TOF Ultima API. For all measurements, the mass spectrometer was operated in V-mode with a typical resolving power of at least 10,000. All analyses were performed using positive mode ESI using a NanoLock-Spray source. The lock mass channel was sampled every 30 s. The mass spectrometer was calibrated with a [Glu 1 ]fibrinopeptide solution (100 fmol/l) delivered through the reference sprayer of the NanoLockSpray source. Accurate mass LCMS data was collected in an alternating, low energy (MS) and elevated energy (MS E ) mode of acquisition. The spectral acquisition time in each mode was 1.8 s with a 0.2-s interscan delay. In low energy MS mode, data was collected at a constant collision energy of 10 eV. In elevated MS E mode, collision energy was ramped from 28 to 35 eV during each 1.8-s data collection cycle. One cycle of MS and MS E data was acquired every 4.0 s. The R F applied to the quadrupole mass analyzer was adjusted such that ions from m/z 300 to 2000 were efficiently transmitted, ensuring that any ions observed in the LCMS E data less than m/z 300 were known to arise from dissociations in the collision cell.
Data Processing and Protein Identification-The continuum LCMS E data were processed and searched using ProteinLynx Global Server (PLGS) version 2.2. Protein identifications were obtained by searching either a human database to which data from the six standard proteins were appended or an E. coli database. The ion detection, clustering, and normalization were processed using PLGS as described earlier (8). Additional data analysis was performed with Spotfire Decision Site version 7.2 and Microsoft Excel.

RESULTS AND DISCUSSION
Method for Absolute Quantification-The method for absolute quantification of proteins requires that a known quantity of intact protein be spiked into the protein mixture of interest prior to digestion with trypsin or that a known quantity of predigested protein be spiked into the mixture after it has been digested. The average MS signal response for the three most intense tryptic peptides is calculated for each well characterized protein in the mixture, including those to the internal standard protein(s). The average MS signal response from the internal standard protein(s) is used to determine a universal signal response factor (counts/mol of protein), which is then applied to the other identified proteins in the mixture to determine their corresponding absolute concentration. The absolute quantity of each well characterized protein in the mixture is determined by dividing the average MS signal response of the three most intense tryptic peptides of each well characterized protein by the universal signal response factor described above.
Analysis of Peptides from the Standard Six-protein Mixture-The serial dilutions of the six standard protein digests were analyzed using an alternate scanning mode of data acquisition (LCMS E ) described previously by us (8). The proc-essing software is capable of generating properly integrated peptide signal intensity measurements (deisotoped and charge state-reduced) and accurately mass-measured, peptide ion lists that are used for subsequent qualitative identifications and relative quantification across 3-4 orders of magnitude dynamic range in ion detection. Because this mode of data acquisition does not bias the LCMS analysis by gas phase preselection of candidate peptide precursors, an inventory of all the peptide components (precursor and fragments) above the limit of detection of the instrument is produced. The time-resolved mass measurements provide the ability to properly align detected fragment ions with their respective precursor ions for accurate identification.
The protein sequence coverage obtained from the PLGSprocessed LCMS E data of the six protein mixtures is outlined in Table I. The total amount of standard protein loaded onto the 300-m column ranged from 100 to 15,000 fmol (scaling the analysis to a 75-m chromatography system would correspond to ϳ6 -900 fmol of total protein for the analysis of the same sample after diluting 16-fold, 1/R 2 ). These detection limits are within acceptable levels for existing technologies. A minimum limit of 100 fmol of a single protein was determined for the purpose of this analysis using a 300-m chromatography system and a standard nanoelectrospray source at 5 l/min. At 100 fmol of a single protein, the top three most intense tryptic peptides to most of the proteins could be identified with a high degree of confidence. The experiment was set up so that each protein spanned a 50-fold dynamic range with an overall dynamic range of ϳ2.2 orders of magnitude for the entire set of protein standards. The protein sequence coverage of each protein is shown to decrease in a concentration-dependent manner, as one would expect, ranging from 84 to 2% throughout the entire data set. Replicate analysis of each sample produced signal intensity meas- The relative standard deviation was determined from the replicate mass, retention time, and intensity measurements of each detected peptide. The mass accuracy is reported for each identified peptide. An overall relative standard deviation is reported for the mass RSD, retention time RSD (rt RSD), and intensity RSD (int RSD).
b An overall root mean square error is reported for all the identified peptides to bovine beta hemoglobin. c Oxidized methionine is denoted as M*.

FIG. 1. Characterized peptides to ␤ hemoglobin (A) and phosphorylase B (B) using LCMS E .
A bar plot of the identified peptides to ␤ hemoglobin (x axis) and their corresponding average signal response (y axis) from the LCMS E analyses of each sample of the six-protein mixture is shown. The absolute quantity of ␤ hemoglobin (A) loaded onto the analytical column is indicated by the following color coding: red, 100 fmol; dark blue, 250 fmol; yellow, 500 fmol; black, 1000 fmol; green, 2500 fmol; and white, 5000 fmol. The average relative standard deviation of the mass, intensity, and retention time measurements for the characterized peptides to ␤ hemoglobin across the triplicate analyses were 1.2 ppm, urements with a coefficient of variation (Cv) of less than 30% for each peptide and an average Cv of less than 15% for all characterized peptides. A summary of the replicate analysis of one of the protein standards, ␤ hemoglobin, can be seen in Table II. The 14 characterized peptides to ␤ hemoglobin provided ϳ91% protein sequence coverage. The replicate analyses typically produced mass measurements with a precision of less than 5 ppm (RSD). An average mass accuracy error of 4.7 ppm (root mean square) was achieved for all of the peptides to ␤ hemoglobin. Similar levels of analytical reproducibility (mass, retention time, and signal response) were observed for the tryptic peptides from the other proteins.
Due to the parallel mode of data acquisition, all tryptic peptides that are of sufficient intensity will produce associated product ion fragmentation data that can be used for structural identification of that peptide and its corresponding parent protein. From this analysis, the MS data from the tryptic peptides of a protein and the associated sequence information from the MS E data for each peptide produce a comprehensive analysis for each protein present in the mixture. Fig. 1A illustrates those peptides identified to ␤ hemoglobin (14 kDa) found in each of the six dilutions. An interesting observation obtained from the comprehensive coverage of ␤ hemoglobin shows that the pattern of intensity measurements of the characterized peptides remains preserved throughout the dilution series. As ␤ hemoglobin decreases in concentration, the total number of observed tryptic peptides and their corresponding signal responses decrease in a predictable fashion while maintaining a consistent relative intensity pattern for the remaining tryptic peptides. Fig. 1B illus-FIG. 2. Signal responses for peptides to the standard proteins. A 5-l injection of the following six standard proteins was analyzed by LCMS: glycogen phosphorylase B (6 pmol, 97 kDa) from rabbit, hemoglobin (10 pmol, 31 kDa total, ␣ (15 kDa) and ␤ (16 kDa)) from a cow, alcohol dehydrogenase (10 pmol, 25 kDa) from yeast, serum albumin (12.5 pmol, 70 kDa) from a cow, and enolase (15 pmol, 50 kDa) from yeast. The LCMS data were processed by PLGS, and the identified peptides were organized by decreasing intensity for each protein. The bar plots illustrate the peptides identified to alcohol dehydrogenase (A), serum albumin (B), enolase (C), ␣ hemoglobin (D), ␤ hemoglobin (E), and phosphorylase B (F). The y axis is the average intensity of the detected peptide from the triplicate analysis, and the x axis is the peptide number (sorted by descending, average intensity). The three most intense peptides for each protein are highlighted in blue. The average signal response for the three most intense peptides was calculated from the three tryptic peptides and is shown in A, B, C, D, E, and F for each of the corresponding proteins, respectively. Three ionization efficiency tiers are labeled for yeast enolase (red bar). 16.7%, and 1.5%, respectively. The median relative standard deviation of the mass, intensity, and retention time measurements across the triplicate analyses were 0.9 ppm, 11.7%, and 1.1%, respectively. The absolute quantity of phosphorylase B (B) loaded onto the analytical column is indicated by the following color coding: red, 120 fmol; dark blue, 300 fmol; yellow, 600 fmol; black, 1200 fmol; green, 3000 fmol; and white, 6000 fmol. Similar statistics were observed for phosphorylase B.
trates those peptides identified to phosphorylase B (97 kDa) found in each of the six dilutions, showing a similar behavior throughout the dilution series. The identified peptides to the other proteins in the sample also illustrate the same phenomenon.
The characterized tryptic peptides from each protein in Sample 1 (Table I) were sorted by descending intensity as shown in Fig. 2 (A-F) to illustrate the observed signal response for all detected tryptic peptides to each protein. The bar plot illustrates the varying signal responses obtained from the observed tryptic peptides to each protein. The comprehensive analysis of the protein samples obtained from the parallel MS acquisition provided the ability to determine a relationship between the observed signal response of the three most intense peptides of a protein and the absolute protein concentration. Alternative LCMS methods, such as data-dependent analysis or even targeted MS/MS, would only provide partial sampling of this sample and would therefore not provide the comprehensive inventory of peptides required to perform this quantitative analysis. The details of this relationship have been summarized in Table III. The average signal response of the three most intense tryptic peptides was determined for each protein. The relationship between the average MS signal response of the three most intense tryptic peptides and the absolute quantity of protein can be immediately inferred from the relative ratio of the average MS signal responses. The relative ratios of the average MS signal responses are proportional to the absolute quantities of each protein present in the sample. An average signal response of 26,121 counts was consistently associated to 1 pmol of protein on column with a Cv of ϳ4.9%. Because the proteins spanned a wide range of molecular masses , the relationship appears to be independent of the protein molecular mass. Alcohol dehydrogenase was treated as the internal standard protein for this analysis. The universal signal response factor (counts/mol, SR/pmol) was determined from the three most intense tryptic peptides to alcohol dehydrogenase. The absolute quantity of the remaining proteins was calculated using the normalized signal response obtained from alcohol dehydrogenase. From these results we observed that the average MS signal response for the three most intense tryptic fragments is constant per unit quantity of protein.
From these observations, we propose that the average signal response for the three most intense tryptic peptides can be used to estimate the absolute quantity of other well characterized protein within the same mixture.
The six serial dilutions were analyzed by LCMS E and subsequently processed using PLGS version 2.2. The relationship between the absolute quantity of protein loaded onto the column and the average signal response of the three most intense tryptic peptides from each of the six proteins was used to generate the results illustrated in Fig. 3. The data points from the dilution series of the proteins were determined to lie on a linear response curve ranging from 100 to 15,000 fmol (x axis). The average signal response of the proteins extends up to 400,000 counts (y axis). A linear curve fit to the data produces an R 2 value of 0.9939. These results prove that the average MS signal response of the three most intense tryptic peptides is constant for all the proteins. Because the response curve is independent of the protein, it can therefore be used as a universal means to obtain an absolute quantity of any other well characterized protein present in the mixture.
Although six proteins represent a small sample size, the response curve illustrated in Fig. 3 illustrates a linear correlation between the three most intense tryptic peptides of each protein and its corresponding absolute concentration. Taken together with the results obtained later from both human serum proteins (Fig. 4) and E. coli proteins (Fig. 5), the data support the foundation of this hypothesis. The information provided by the sequences of the three most intense tryptic peptides in these studies and those obtained in future studies will further our understanding of this phenomenon. Future studies with well defined protein complexes will help identify proteins that do not fit the proposed model. A thorough analysis of the highest and lowest ionizing peptides may help define possible exceptions. It is our intent to include an explanation of the foundation of the absolute quantification TABLE III Summary of the absolute quantification results obtained from the analysis of the six-protein mixture described in Table I Alcohol dehydrogenase (bold) was used as the internal reference standard in both studies as discussed in the text. method and address possible exceptions to the method in a manuscript in progress. 2 A list of the three most intense tryptic peptides from a subset of proteins identified in this study have been provided in Supplemental Table 1. It is worth highlighting at this point that because the quantitation relies on correctly identifying the top three most intense tryptic peptides, larger errors will occur with smaller proteins. The results illustrated in Figs. 1 and 2 demonstrate this dependence. The magnitude of the error is dependent on the size of the protein because there are fewer peptides to choose from within the highest intensity region. Smaller proteins will have fewer tryptic peptides that may have a wide range from the most intense to the next most intense tryptic peptide. With larger proteins there are many more tryptic peptides of higher intensity so that if one (or more) of the three most intense tryptic peptides is not accurately identified, several other peptides will be found close to the same intensity, keeping the altered average intensity value close to the true average intensity value of the three most intense tryptic peptides.
Analysis of the Standard Six-protein Mixture in Human Serum-A second series of samples contained identical concentrations of the standard digest with each sample containing equivalent amounts of human serum (ϳ8.75 g of serum protein in a 5-l injection). These complex protein mixtures were analyzed by LCMS E and processed with PLGS as before to obtain corresponding protein identifications. The three most intense peptides to each of the six spiked proteins were identified, and the corresponding average signal responses were calculated. These results are outlined in Table IV and were found to be similar to the results obtained from the analysis of the simple protein mixtures. These results again showed that the relative ratio of the average signal response for the three most intense tryptic peptides was consistent with the relative ratio of the absolute quantity of the individual proteins with the complex protein mixture. Although there was approximately a 20% decrease in the resulting signal response factor (counts/pmol), the results are internally consistent within the same dataset. The variability (Cv) of the normalized signal response per picomole increased slightly from 4.9 to 8.4% when obtained from the more complex sample. The error associated with the absolute quantification of the six standard proteins increased slightly as well. These results show that the determination of the absolute concentration of the six proteins was not affected by the additional complexity of the digested serum protein sample matrix and that the quantification method is capable of determining the absolute concentration of all properly characterized proteins present in a complex sample.
Absolute Quantification of Serum Proteins-The signal response of 20,597 counts/pmol (Table IV) was used to determine the absolute concentration of 11 identified serum proteins from the average intensity of the three most intensely ionizing tryptic peptides. The concentration of the 11 proteins was determined from a single serum sample to illustrate the variability associated with the analytical method (Fig. 4A). The 11 proteins were a subset of the 46 well characterized proteins described by Anderson and Anderson (18) and Anderson et al. (19). The results obtained from the replicate analysis of FIG. 3. Universal signal-response curve for the absolute quantification of the standard proteins. The average signal response for the three most intense tryptic peptides to each of the six proteins was obtained for all proteins found in each of the six samples. A single scatter plot of the average signal response (y axis) and the corresponding protein concentration (x axis) was produced for all the proteins in all six samples and found to be linear for over 2 orders of magnitude. The data from each of the proteins are color-coded as follows: red, yeast alcohol dehydrogenase; dark blue, bovine serum albumin; yellow, enolase; black, bovine ␣ hemoglobin; light blue, bovine ␤ hemoglobin; green, bovine ␣/␤ hemoglobin; and gray, rabbit phosphorylase B. A linear curve fit was calculated for the entire data set (y ϭ 27.6 ϫ (X) Ϫ 7401, R 2 ϭ 0.9939).

FIG. 4. Absolute quantity of human serum proteins from analytical (A) and biological (B) replicates.
The absolute quantity of 11 well characterized human serum proteins was calculated using the described method to produce the following average concentration measurements (log 10 (pg/ml), blue circle). The average concentration values (red circle) for the human serum proteins were obtained from Specialty Laboratories and were plotted along with their expected minimum and maximum values (red whiskers). The absolute quantities obtained from the analytical replicates were typically less than 15%. Because the y axis is presented as a log 10 scale, the analytical error is not shown. The average (blue circle), minimum, and maximum concentration values obtained from the analysis of the biological replicates are indicated (blue whiskers) in B.

FIG. 5.
Relative levels of estimated absolute protein abundance within a single sample of E. coli. The absolute quantity of protein was determined for a subset of E. coli proteins known to exist as multiprotein complexes. A relative ratio was calculated for each of the proteins listed in the following set of protein complexes (GroEL-GroES, ribosomes, and SucC-SucD). GroES, RS2, and SucD were used as the reference (or normalization) proteins for each of the protein complexes, respectively. the single serum sample are outlined in Table V. The analytical variability associated with these measurements was typically within a relative error of 15%. These errors were consistent with what has been described previously using this method (8). The results in Table V indicate that albumin and the immunoglobulins account for the majority of the mass of the total protein loaded onto the column (ϳ71%) for LCMS analysis. This value is similar to the mass fraction for these proteins as determined from the concentration values provided by Specialty Laboratories (ϳ67%). The absolute concentrations were found to be close to the expected concentration values for a number of the characterized serum proteins with four proteins outside the typical range as indicated by Specialty Laboratories. 3 This variability can be attributed to the lack of proper sampling statistics from a single sample of human serum.
Another study was performed using serum from six individuals to determine the quantity of endogenous serum proteins within this small sample set. The concentration of the 11 proteins was determined from seven different serum samples to demonstrate the observed biological variability of these proteins in human serum (Fig. 4B). A plot of the calculated concentrations, log 10 (pg/ml), of 11 well characterized human serum proteins is illustrated in Fig. 4B along with the typically observed concentration values available from Specialty Laboratories. 3 Fig. 4B illustrates the range of protein concentration calculated from the six different serum samples and reflects the associated errors inherent to the analytical method and to the biological variability. These results better illustrate the correlation between the literature values obtained from Specialty Laboratories, which incorporate the biological variability from a much larger population. The majority of the concentration values calculated on the basis of the LCMS E results for the identified proteins in Fig. 4B were found to lie within the expected concentration ranges, providing further validation to the absolute quantification method (18,19). 3 Accumulation of data from more samples would provide a better correlation with the literature values due to the improved sampling statistics.
The absolute quantification method outlined in this work provides a means to carry out a mass balance analysis as a useful accounting mechanism that can be applied to the inventory of peptides from any given LCMS E analysis. The total amount of protein present in human serum is ϳ60 -80 mg/ml. Using an estimated concentration of 70 mg/ml of total serum protein, a 5-l sample of human serum would contain ϳ350 g of total protein. According to the digestion protocol described under "Experimental Procedures," the 350 g of total protein was digested with trypsin in a final volume of 200 l to produce a digested protein solution containing ϳ1.75 g/l. A total of 5 l of digest was loaded onto the chromatography column (ϳ8.75 g of total digested protein). The results from the absolute quantification accounts for ϳ10.0 g of protein digest from the 11 identified proteins as indicated in Table V; this is in good agreement with the theoretical value.
Absolute Quantification of E. coli Proteins-Having the ability to determine absolute quantification of a protein allows one to determine the stoichiometric relationship of proteins within the same sample. To explore this possibility a yeast enolase protein digest of known concentration was spiked into the whole cell lysate of E. coli. The sample was analyzed in triplicate using the LCMS method described above. The average intensity value of the top three ionizing peptides to yeast enolase was used to convert the average intensity of the top three ionizing peptides for a number of well characterized E. coli proteins to the corresponding absolute quantity of protein loaded on column. The relative levels of the estimated absolute concentrations of a number of these proteins were found to be consistent with known quaternary structural information of these proteins. The results obtained from this quantitative assessment are outlined in Fig. 5. A number of identified ribosomal proteins were found to exist at the same relative abundance (1:1), consistent with the structure of the ribosomal complex (20). The stoichiometry of GroEL and GroES was also consistent with the known structure of the molecular chaperonin (2:1). GroEL exists as two stacked heptameric rings of 14 identical 57-kDa monomers to form a cylindrical structure. GroES exists as a single heptameric ring of seven identical 14-kDa monomers that reside at one end of the GroEL structure (21). Another example is illustrated by comparing the relative level of the estimated absolute quantity obtained for the ␣ and ␤ chains of succinyl-CoA synthetase (SucC and SucD). These proteins were identified, and the corresponding stoichiometry was also consistent with its known heterotetrameric A2B2 (1:1) structure (22). These examples provide additional validation to the method described in this study for the determination of the absolute concentration of proteins using the signal response of the highest ionizing tryptic peptide fragments of identified proteins.
Conclusion-The label-free method described in this work is ideally suited for determining the absolute concentration of proteins present in both simple and complex mixtures. The described method takes full advantage of the recently introduced LCMS E mode of data acquisition and its ability to comprehensively reduce tens of thousands of ion detections to a simple inventory list of peptide precursors along with their time-resolved fragment ions. The specificity afforded by the accurate mass measurements of both the precursors and associated fragment ions (typically less than 5 ppm) provides the ability to identify, with high confidence, a large number of proteins with high sequence coverage. The ability to collect the MS data across the entire chromatographic peak width for all peptides above the limit of detection of the instrument allows for accurate quantification of peptides/proteins from the deconvoluted signal intensities (deisotoped and charge state-reduced). These three attributes of LCMS E data acquisition in association with the correlation between the average MS signal response of the three best ionizing peptides to a protein provide a means to determine the absolute concentration of any well characterized protein present in a sample. Future studies will be performed with known complexes to further validate and understand the guiding principles of this general methodology.