Advertisement

Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein*S

      To estimate absolute protein contents in complex mixtures, we previously defined a protein abundance index (PAI) as the number of observed peptides divided by the number of observable peptides per protein (Rappsilber, J., Ryder, U., Lamond, A. I., and Mann, M. (2002) Large-scale proteomic analysis of the human spliceosome. Genome. Res. 12, 1231–1245). Here we report that PAI values obtained at different concentrations of serum albumin show a linear relationship with the logarithm of protein concentration in LC-MS/MS experiments. This was also the case for 46 proteins in a mouse whole cell lysate. For absolute quantitation, PAI was converted to exponentially modified PAI (emPAI), equal to 10PAI minus one, which is proportional to protein content in a protein mixture. For the 46 proteins in the whole lysate, the deviation percentages of the emPAI-based abundances from the actual values were within 63% on average, similar or better than determination of abundance by protein staining. emPAI was applied to comprehensive protein expression analysis and to a comparison study between gene and protein expression in a human cancer cell line, HCT116. The values of emPAI are easily calculated and add important quantitation information to proteomic experiments; therefore we suggest that they should be reported in large scale proteomic identification projects.
      Proteomic LC-MS approaches combined with genome-annotated databases currently allow identification of thousands of proteins from complex mixtures (
      • Aebersold R.
      • Mann M.
      Mass spectrometry-based proteomics.
      ). Approaches have also been developed for relative quantitation using stable isotope labeling (
      • Oda Y.
      • Huang K.
      • Cross F.R.
      • Cowburn D.
      • Chait B.T.
      Accurate quantitation of protein expression and site-specific phosphorylation.
      ,
      • Gygi S.P.
      • Rist B.
      • Gerber S.A.
      • Turecek F.
      • Gelb M.H.
      • Aebersold R.
      Quantitative analysis of complex protein mixtures using isotope-coded affinity tags.
      ,
      • Ong S.E.
      • Blagoev B.
      • Kratchmarova I.
      • Kristensen D.B.
      • Steen H.
      • Pandey A.
      • Mann M.
      Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
      ). Recently not only comprehensive quantitation studies between two states (
      • MacCoss M.J.
      • Wu C.C.
      • Liu H.
      • Sadygov R.
      • Yates III, J.R.
      A correlation algorithm for the automated quantitative analysis of shotgun proteomics data.
      ,
      • Foster L.J.
      • De Hoog C.L.
      • Mann M.
      Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors.
      ) but also protein-protein (
      • Blagoev B.
      • Kratchmarova I.
      • Ong S.E.
      • Nielsen M.
      • Foster L.J.
      • Mann M.
      A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling.
      ,
      • Ranish J.A.
      • Yi E.C.
      • Leslie D.M.
      • Purvine S.O.
      • Goodlett D.R.
      • Eng J.
      • Aebersold R.
      The study of macromolecular complexes by quantitative proteomics.
      ), protein-peptide (
      • Schulze W.X.
      • Mann M.
      A novel proteomic screen for peptide-protein interactions.
      ), and protein-drug (
      • Oda Y.
      • Owa T.
      • Sato T.
      • Boucher B.
      • Daniels S.
      • Yamanaka H.
      • Shinohara Y.
      • Yokoi A.
      • Kuromitsu J.
      • Nagasu T.
      Quantitative chemical proteomics for identifying candidate drug targets.
      ) interaction analyses have been reported. So far, however, a comprehensive approach for determining protein concentrations in one sample has not been established. Protein concentrations are one of the most basic and important parameters in quantitative proteomics because the kinetics/dynamics of the cellular proteome is described in terms of changes in the concentrations of proteins in particular compartments. Biological experiments often require at least some information on protein abundance for correct interpretation. In the past, crude quantitative information could be drawn from the intensity of gel staining in comparison to a known amount of marker protein. However, in complex mixture analysis, individual proteins cannot be stained individually, and usually all information about protein abundance is lost. So far, isotope-labeled synthetic peptides have been used as internal standards for absolute quantitation of particular proteins of interest (
      • Barr J.R.
      • Maggio V.L.
      • Patterson Jr., D.G.
      • Cooper G.R.
      • Henderson L.O.
      • Turner W.E.
      • Smith S.J.
      • Hannon W.H.
      • Needham L.L.
      • Sampson E.J.
      Isotope dilution–mass spectrometric quantification of specific proteins: model application with apolipoprotein A-I.
      ,
      • Gerber S.A.
      • Rush J.
      • Stemman O.
      • Kirschner M.W.
      • Gygi S.P.
      Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS.
      ). This approach is in principle applicable to comprehensive analysis but is hampered by the high cost of isotope-labeled peptides as well as the difficulty of quantitative digestion of proteins in-gel (
      • Havlis J.
      • Shevchenko A.
      Absolute quantification of proteins in solutions and in polyacrylamide gels by mass spectrometry.
      ).
      Even a single nano-LC-MS/MS analysis can easily generate a long list of identified proteins with the help of database searching, and additional information can be extracted, such as the hit rank in identification, the probability score, the number of identified peptides per protein, ion counts of identified peptides, LC retention times, and so on. Qualitatively some parameters, such as the hit rank, the score, and the number of peptides per protein (
      • Corbin R.W.
      • Paliy O.
      • Yang F.
      • Shabanowitz J.
      • Platt M.
      • Lyons Jr., C.E.
      • Root K.
      • McAuliffe J.
      • Jordan M.I.
      • Kustu S.
      • Soupene E.
      • Hunt D.F.
      Toward a protein profile of Escherichia coli: comparison to its transcription profile.
      ), can be considered as indicators for protein abundance in the analyzed sample. Among them, the integrated ion counts of the peptides identifying each protein would be the most direct parameter to describe the abundance and has been used to compare protein expression in different states (
      • Lasonder E.
      • Ishihama Y.
      • Andersen J.S.
      • Vermunt A.M.
      • Pain A.
      • Sauerwein R.W.
      • Eling W.M.
      • Hall N.
      • Waters A.P.
      • Stunnenberg H.G.
      • Mann M.
      Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry.
      ). However, a mass spectrometer is not as versatile as an absorbance detector because of limited linearity and possibly because of background and ionization suppression effects (
      • Shen Y.
      • Zhao R.
      • Berger S.J.
      • Anderson G.A.
      • Rodriguez N.
      • Smith R.D.
      High-efficiency nanoscale liquid chromatography coupled on-line with mass spectrometry using nanoelectrospray ionization for proteomics.
      ). Therefore, it is necessary to normalize these parameters to obtain at least approximate quantitative information. The first approach to achieve this, to our knowledge, was to use the number of peptides per protein normalized by the theoretical number of peptides (so-called protein abundance index (PAI)
      The abbreviations used are: PAI, protein abundance index; emPAI, exponentially modified protein abundance index; SILAC, stable isotope labeling with amino acids in cell culture; SCX, strong cation exchange chromatography; HSA, human serum albumin
      1The abbreviations used are: PAI, protein abundance index; emPAI, exponentially modified protein abundance index; SILAC, stable isotope labeling with amino acids in cell culture; SCX, strong cation exchange chromatography; HSA, human serum albumin
      ), and this was applied to human spliceosome complex analysis (
      • Rappsilber J.
      • Ryder U.
      • Lamond A.I.
      • Mann M.
      Large-scale proteomic analysis of the human spliceosome.
      ). PAI is superior to the number of identified peptides because it takes account of the fact that, for the same number of molecules, larger proteins and proteins with many peptides in the preferred mass range for mass spectrometry will generate more observed peptides. Independently Sanders et al. (
      • Sanders S.L.
      • Jennings J.
      • Canutescu A.
      • Link A.J.
      • Weil P.A.
      Proteomics of the eukaryotic transcription machinery: identification of proteins associated with components of yeast TFIID by multidimensional mass spectrometry.
      ) developed a similar index. The number of peptides, spectra counts, or the total of the peptide probability scores in LC/LC-MS/MS analysis can also be used for relative quantitation (
      • Liu H.
      • Sadygov R.G.
      • Yates III, J.R.
      A model for random sampling and estimation of relative protein abundance in shotgun proteomics.
      ,
      • Cox B.J.
      • Kislinger T.
      • Wigle D.A.
      • Brown K.
      • Manning D.
      • Jurisica I.
      • Emili A.
      • Rossant J.
      Proceedings of the 52nd ASMS Conference on Mass Spectrometry and Allied Topics, May 23–27, 2004, Nashville, Abstr. ThPS352.
      ,
      • Allet N.
      • Barrillat N.
      • Baussant T.
      • Boiteau C.
      • Botti P.
      • Bougueleret L.
      • Budin N.
      • Canet D.
      • Carraud S.
      • Chiappe D.
      • Christmann N.
      • Colinge J.
      • Cusin I.
      • Dafflon N.
      • Depresle B.
      • Fasso I.
      • Frauchiger P.
      • Gaertner H.
      • Gleizes A.
      • Gonzalez-Couto E.
      • Jeandenans C.
      • Karmime A.
      • Kowall T.
      • Lagache S.
      • Mahe E.
      • Masselot A.
      • Mattou H.
      • Moniatte M.
      • Niknejad A.
      • Paolini M.
      • Perret F.
      • Pinaud N.
      • Ranno F.
      • Raimondi S.
      • Reffas S.
      • Regamey P.O.
      • Rey P.A.
      • Rodriguez-Tome P.
      • Rose K.
      • Rossellat G.
      • Saudrais C.
      • Schmidt C.
      • Villain M.
      • Zwahlen C.
      In vitro and in silico processes to identify differentially expressed proteins.
      ). Here we further develop the PAI strategy to determine protein abundance from nano-LC-MS/MS experiments and present a modified form, emPAI, the exponential form of PAI minus one. In experiments with labeled complex mixtures, into which we spiked in synthetic peptides, we show emPAI to be roughly proportional to protein abundance.

      MATERIALS AND METHODS

       Preparation of Cell Lysate—

      RPMI 1640 medium (Invitrogen) containing [13C6]Leu (Cambridge Isotope Laboratories, Andover, MA) was prepared according to the SILAC protocol of Ong et al. (
      • Ong S.E.
      • Blagoev B.
      • Kratchmarova I.
      • Kristensen D.B.
      • Steen H.
      • Pandey A.
      • Mann M.
      Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
      ). Mouse neuroblastoma neuro2a cells were cultured in this medium for [13C6]Leu labeling. Whole cells were lysed using ultrasonication in the presence of a protease inhibitor mixture (Roche Diagnostics). HCT116-C9 cells were grown in a normal RPMI 1640 culture medium as described previously (
      • Oda Y.
      • Owa T.
      • Sato T.
      • Boucher B.
      • Daniels S.
      • Yamanaka H.
      • Shinohara Y.
      • Yokoi A.
      • Kuromitsu J.
      • Nagasu T.
      Quantitative chemical proteomics for identifying candidate drug targets.
      ). Whole proteins were extracted with 5 ml of M-PER (Pierce) containing protease inhibitor mixture and 5 mm dithiothreitol.

       Preparation of Peptide Mixtures for LC-MS/MS—

      Proteins from cell lysates were dried and resuspended in 50 mm Tris-HCl buffer (pH 9.0) containing 8 m urea. These mixtures were subsequently reduced, alkylated, and digested with Lys-C (Wako, Osaka, Japan) and trypsin (Promega, Madison, WI) as described previously (
      • Foster L.J.
      • De Hoog C.L.
      • Mann M.
      Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors.
      ). Digested solutions were acidified with TFA and desalted and concentrated using C18 StageTips (
      • Rappsilber J.
      • Ishihama Y.
      • Mann M.
      Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics.
      ), which were prepared by a fully automated instrument (Nikkyo Technos, Tokyo, Japan) with Empore C18 disks (3M, St. Paul, MN). Peptide fractionation by strong cation exchange chromatography (SCX) was performed using SCX-StageTip with 0–500 mm five-step ammonium acetate salt elution (
      • Ishihama Y.
      • Sato T.
      • Tabata T.
      • Miyamoto N.
      • Sagane K.
      • Nagasu T.
      • Oda Y.
      Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards.
      ), and the resultant fractions were desalted using C18 StageTips prior to LC-MS/MS analysis. Candidates for peptide synthesis containing at least one leucine and one tyrosine were selected, considering the sequences of tryptic peptides from proteins expressed in neuro2a cells. Peptides containing methionine and tryptophan were removed to avoid oxidation problems during sample preparation. In addition, peptides with double basic residues were removed, considering the frequency of missed cleavage by trypsin. The selected 54 peptides were synthesized using a Shimadzu PSSM-8 (Kyoto, Japan) with Fmoc (N-(9-fluorenyl)methoxycarbonyl) chemistry and were purified by preparative HPLC. Amino acid analysis, peptide mass measurement, and HPLC-UV were carried out for purity and structure elucidation. A solution containing equal amounts of each peptide was spiked into the peptide mixtures from neuro2a cells. Three different amounts were spiked so that peak intensity ratios of unlabeled peptides to labeled peptides were between 0.2 and 5.

       Nano-LC-MS/MS Analysis—

      All samples were analyzed by nano-LC-MS/MS using a QSTAR Pulsar i (AB/MDS-Sciex, Toronto, Canada), a Finnigan LCQ Advantage (Thermoelectron, San Jose, CA) or a Finnigan LTQ (Thermoelectron) system equipped with a Shimadzu LC10A gradient pump and an HTC-PAL autosampler (CTC Analytics AG, Zwingen, Switzerland) equipped with Valco C2 valves with 150-μm ports. ReproSil C18 materials (3 μm, Dr. Maisch, Ammerbuch, Germany) were packed into a self-pulled needle (100-μm inner diameter, 6-μm opening, 150-mm length) with a nitrogen-pressurized column loader cell (Nikkyo) to prepare an analytical column needle with “stone-arch” frit (
      • Ishihama Y.
      • Rappsilber J.
      • Andersen J.S.
      • Mann M.
      Microcolumns with self-assembled particle frits for proteomics.
      ). A Teflon-coated column holder (Nikkyo) was mounted on an x-y-z nanospray interface (Proxeon, Odense, Denmark), and a Valco metal connector with a magnet was used to hold the column needle and to set the appropriate spray position. The injection volume was 2.5 μl, and the flow rate was 250 nl/min after a tee splitter. The mobile phases consisted of A (0.5% acetic acid) and B (0.5% acetic acid and 80% acetonitrile). The three-step linear gradient of 5–10% B in 5 min, 10–30% in 60 min, 30–100% in 5 min, and 100% in 10 min was used throughout this study. A spray voltage of 2400 V was applied via the metal connector as described previously (
      • Ishihama Y.
      • Rappsilber J.
      • Andersen J.S.
      • Mann M.
      Microcolumns with self-assembled particle frits for proteomics.
      ). For QSTAR experiments with the faster scan mode, MS scans were performed for 1 s to select three intense peaks, and subsequently three MS/MS scans were performed for 0.55 s each. An information-dependent acquisition function was active for 3 min to exclude the previously scanned parent ions. For the slower scan mode, four MS/MS scans (1.5 s each) per one MS scan (1 s) were performed. For LCQ and LTQ experiments, two MS/MS scans per one MS scan were performed in the automated gain control mode. The scan cycle was 1.19 s for one MS and 1.17 s for one MS/MS on average in LCQ and 0.17 s for one MS and 0.38 s for one MS/MS on average in LTQ. The scan range was m/z 350–1400 for QSTAR, LCQ, and LTQ.

       Data Analysis—

      A Mascot version 1.9 database search engine (Matrix Sciences, London, UK) was used for protein identification against the Swiss-Prot protein database. The allowed number of missed cleavages was set to 1, and peptide scores to indicate identity were used for peptide identification without manual inspection of MS/MS spectra. MSQuant version 1.4a was customized for [13C6]Leu SILAC to determine the ion counts in chromatograms for absolute concentrations of proteins using known amounts of synthetic peptides. MSQuant is open source software developed by us and available at sourceforge.net.

       Protein Abundance Determination—

      To calculate the number of observable peptides per protein, proteins were digested in silico, and the obtained peptides masses were compared with the scan range of the mass spectrometer. In addition, the expected retention times under our nano-LC conditions were calculated according to the procedure of Meek (
      • Meek J.L.
      Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition.
      ) and Sakamoto et al. (
      • Sakamoto Y.
      • Kawakami N.
      • Sasagawa T.
      Prediction of peptide retention times.
      ) with our own coefficients based on ∼1500 peptides. Peptides that were too hydrophilic or hydrophobic were eliminated. An in-house program was written in PHP to calculate the peptide number and was used to export all data to Microsoft Excel. The program is freely accessible at xome.hydra.mki.co.jp:8080/bitt/common/Menu. Regarding the number of observed peptides per protein, three methods of counting were used, i.e. 1) counting unique parent ions, 2) counting unique sequences, and 3) counting unique sequences without partial modification and the overlap caused by missed tryptic cleavage. These numbers were exported from Mascot html files to Excel spreadsheets using the “Export All Peptides” function of MSQuant software.
      The PAI is defined as
      PAI=NobsdNobsbl
      (Eq. 1)


      where Nobsd and Nobsbl are the number of observed peptides per protein and the number of observable peptides per protein, respectively (
      • Rappsilber J.
      • Ryder U.
      • Lamond A.I.
      • Mann M.
      Large-scale proteomic analysis of the human spliceosome.
      ). The emPAI is defined as follows.
      emPAI=10PAI1
      (Eq. 2)


      Thus, the protein contents in molar and weight fraction percentages are described as
      Proteincontent(mol%)=emPAI(emPAI)×100
      (Eq. 3)


      Proteincontent(weight%)=emPAI×Mr(emPAI×Mr)×100
      (Eq. 4)


      where Mr is the molecular weight of the protein, and Σ(emPAI) is the summation of emPAI values for all identified proteins. The entire procedure for emPAI calculation is shown in Supplemental Sheet 1.
      To evaluate the accuracy of the parameters, a deviation factor was defined as
      Deviationfactor=ValuemeasuredValueestimated
      (Eq. 5)


      where measured values are larger than estimated values or
      Deviationfactor=ValueestimatedValuemeasured
      (Eq. 6)


      where estimated values are larger than measured values.

       DNA Microarray Analysis—

      HCT116-C9 cells were plated at 5.0 × 106 cells/dish in 10-cm-diameter dishes with 10 ml of the culture medium. After 24-h preincubation, the cells were treated for 12 h with 0.015% DMSO. Duplicate experiments were performed using Affymetrix HuGene FL arrays according to established protocols. Affymetrix GeneChip software was used to extract gene signal intensities, and two sets of data were grouped and averaged based on gene symbols.

      RESULTS AND DISCUSSION

       The Number of Identified Peptides from a Single Protein at Different Concentrations—

      Different amounts of human serum albumin (HSA) tryptic peptides were analyzed by nano-LC-ESI-MS/MS, and the number of identified peptides was counted. As shown in Fig. 1A, both peak area and the number of identified peptides with unique parent ions increased as the injection amount increased, although both curves saturated at larger amounts of HSA close to 1 pmol. However, even in the region where the peak area is linear, the number of peptides does not have a linear relationship to the protein amount. Interestingly the number of peptides shows a linear relationship to the logarithm of the injected amount from 3 to 500 fmol (Fig. 1B). A similar result was obtained on an LCQ with the slower scan cycle (see “Materials and Methods”). This finding indicates that each peak was well separated in time and that the influence of “random sampling” caused by the slower scan could be neglected under this condition. In this case, three ways were used to count peptides: 1) all parent ions including different charge states from the same peptide sequences, 2) all peptides excluding different charge states and partial modifications such as methionine oxidation, and 3) peptides with unique sequences excluding peptides overlapped by missed tryptic cleavage sites. Fig. 1B shows that the number of peptides based on unique parent ions (Fig. 1B, squares, and 1) above) shows the best correlation with the logarithm of protein abundance. We believe that these results are not due to the particular conditions used but are a more general phenomenon. Recently two groups independently presented similar curves relating the number of peptides to the concentration of proteins (
      • Liu H.
      • Sadygov R.G.
      • Yates III, J.R.
      A model for random sampling and estimation of relative protein abundance in shotgun proteomics.
      ,
      • Sweetman G.
      • Bantscheff M.
      • Schirle M.
      • Rick J.
      • Kuster B.
      Proceedings of the 52nd ASMS Conference on Mass Spectrometry and Allied Topics, May 23–27, 2004, Nashville, Abstr. TPR361.
      ). Although neither of them analyzed the logarithmic relationship, it appears to us that their data are also consistent with a linear relationship between the logarithm of protein concentration and the number of peptides. At present, it is not clear why the logarithm of protein concentration correlates with the number of observed peptides, and in any case this relationship is likely to be due to a combination of processes and probably holds only approximately. In any case, it is a common experience that the mass spectrometric peptide signals from the digestion of a protein are vastly different. For example, to substantially increase sequence coverage of a protein often requires orders of magnitude large protein amounts, and conversely dilution by small factors often does not decrease sequence coverage very much.
      Figure thumbnail gr1
      Fig. 1Relationship between protein concentration and several parameters.A, peak area and the number of unique parent ions of peptides versus injection amount of HSA. The most abundant tryptic peptide of HSA, LCTVATLR, was used for peak area measurement. B, numbers of peptides counted in three different ways versus injection amount of HSA. C, protein concentration versus PAI for 46 proteins in neuro2a cells. D, protein concentration versus the number of peptides divided by molecular weight of proteins for 46 proteins in neuro2a cells. E, relationship between protein concentration and emPAI for 46 proteins in neuro2a cells. F, absolute quantitation of 46 proteins in neuro2a cells using emPAI. QSTAR with faster scans (0.55 s for each MS/MS scan) was used for these experiments. Protein concentrations in neuro2a cells were measured by spiking synthetic peptides to neuro2a cells as described under “Materials and Methods.” arb.u, arbitrary units.

       PAI of 46 Proteins in Complex Mixture Solutions—

      To test performance of the PAI index in complex mixtures, we next investigated known amounts of 54 proteins in a whole cell lysate. Tryptic peptides from mouse neuroblastoma neuro2a cells SILAC-labeled with [13C6]Leu (
      • Ong S.E.
      • Blagoev B.
      • Kratchmarova I.
      • Kristensen D.B.
      • Steen H.
      • Pandey A.
      • Mann M.
      Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
      ) were measured by a single LC-MS/MS run with the QSTAR instrument, and 336 proteins were identified based on 1462 peptides. For accurate absolute quantitation, we spiked 54 synthetic peptides containing [12C6]Leu into this sample solution, one for each protein, and quantified the corresponding tryptic peptides containing [13C6]Leu. Eight peptides were not quantified because they resulted in overlapping peaks in the extracted ion current chromatograms. Together 46 proteins ranging in molecular mass from 13 to 193 kDa were quantified in the range from 30 fmol to 1.8 pmol/μl in the sample solution as listed in Table I.
      Table I46 proteins identified and quantified in mouse neuro2a cells
      Swis-Prot accession no.Protein nameMascot hit no.MrProtein concentration
      Protein concentrations were measured by “reversed” isotope dilution using SILAC-labeled proteins and unlabeled synthetic peptides.
      Number of observed peptidesMascot scorePAIemPAI
      fmoll
      P19378Heat shock cognate 71-kDa protein170,9898222912350.886.56
      P07901Heat shock protein HSP 90-α285,0039403110470.866.26
      P20152Vimentin553,5813362710600.825.58
      P58252Elongation factor 2 (EF-2)696,091118248300.532.41
      Q03265ATP synthase α chain, mitochondrial precursor859,830149186350.562.65
      P17182α enolase947,322596218280.886.50
      P15331Peripherin1054,34984135560.391.48
      P48975Actin, cytoplasmic 1 (β-actin)1242,0531206228941.2215.68
      P05213Tubulin α-2 chain (α-tubulin 2)1450,8181782248621.1412.89
      P52480Pyruvate kinase, M2 isozyme1658,289216176430.492.06
      P20001Elongation factor 1-α 12450,424870176471.009.00
      P08113Endoplasmin precursor2592,70390103790.230.71
      O35501Stress-70 protein, mitochondrial precursor2773,970195154950.381.42
      P1486960 S acidic ribosomal protein P03434,33610293790.502.16
      P03975IgE-binding protein3563,221122103680.331.15
      Q9CZD3Glycyl-tRNA synthetase3782,6247793410.210.62
      P3521514-3-3 protein ζ/δ4027,925381124010.754.62
      P42932T-complex protein 1, θ subunit4260,0889692810.260.81
      P51881ADP, ATP carrier protein, fibroblast isoform4633,13826482940.421.64
      Q9JIK5Nucleolar RNA helicase II4894,1513082680.160.45
      P1414860 S ribosomal protein L75231,45712092270.532.38
      Q9WVA4Transgelin 26523,81013082580.623.12
      P14211Calreticulin precursor7248,13611482460.331.15
      P16858Glyceraldehyde-3-phosphate dehydrogenase8735,94140082700.532.41
      P2931440 S ribosomal protein S98822,41813562010.461.89
      Q60932Voltage-dependent anion-selective channel protein8932,5027262610.381.37
      P17080GTP-binding nuclear protein RAN9724,57925551810.451.85
      P1700840 S ribosomal protein S169816,41845661740.602.98
      Q60930Voltage-dependent anion-selective channel protein9932,3405441610.270.85
      P11983T-complex protein 1, α subunit B10060,8673661430.180.52
      P05064Fructose-bisphosphate aldolase A10339,65621062600.331.15
      P0905840 S ribosomal protein S810924,3449661860.602.98
      Q01320DNA topoisomerase II, α isozyme143173,5673641250.050.12
      Q8VEM8Phosphate carrier protein, mitochondrial precursor14940,0634841220.190.55
      P1925360 S ribosomal protein L13a15023,4324841230.331.15
      P0852660 S ribosomal protein L2715715,6578631130.502.16
      P4796140 S ribosomal protein S416029,66616241090.210.62
      Q06647ATP synthase oligomycin sensitivity conferral protein17923,440783980.230.70
      Q9CPR460 S ribosomal protein L1718221,506903960.301.00
      P3902660 S ribosomal protein L1118620,3371083920.331.15
      Q9D1R960 S ribosomal protein L3420413,381963830.502.16
      O08807Peroxiredoxin 420631,261724830.270.85
      Q62188Dihydropyrimidinase related protein-320762,296303820.100.27
      P50310Phosphoglycerate kinase21344,776303800.130.33
      Q9DBJ1Phosphoglycerate mutase 122328,7971143700.250.78
      P11442Clathrin heavy chain226193,187553690.040.10
      a Protein concentrations were measured by “reversed” isotope dilution using SILAC-labeled proteins and unlabeled synthetic peptides.
      In complex protein mixtures, two additional factors should be considered. One is the influence of protein size on the number of peptides. Generally larger proteins generate more detectable peptides. Therefore, observable peptides were used for normalization as done previously except that we used the predicted peptide retention time as an additional filter. The other factor is the mixture complexity. A very large number of peptides exist in total cell lysate, and the number of observed peptides could to some extent be influenced by the random selection for MS/MS events, ion suppression effects, and saturation of the MS analyzer and/or the detector. Nevertheless Fig. 1C shows that there is still a linear relationship between log[protein] and the number of observed peptides normalized by the number of observable peptides per protein even when different proteins were plotted into one graph. Compared with other parameters, PAI correlated most highly with logarithm of protein amount (Fig. 1C, r = 0.89, deviation factor (average ± S.D.) = 1.6 ± 0.5) followed by number of peptides divided by protein Mr (Fig. 1D, r = 0.84, deviation factor = 1.8 ± 0.8), a measure similar to PAI except that it ignores how well the peptide sequence can generate tryptic peptides in the correct mass range for mass spectrometry. Commonly used proxies for protein abundance such as Mascot score and the number of peptides correlate much worse with protein abundance (r = 0.72, deviation factor = 2.7 ± 2.4 and r = 0.71, deviation factor = 2.7 ± 2.6, respectively).

       Absolute Quantitation Using emPAI—

      Although PAI can estimate the abundance relationship between proteins, it cannot express the molar fraction directly. Therefore, we derived a new parameter, emPAI, that is the exponential form of PAI minus 1 (Equation 2) and that is directly proportional to protein content as shown in Fig. E. To calculate the absolute concentrations, total protein amounts were measured as weight by BCA assay, and the weight fractions of 46 proteins among 336 neuro2a proteins were calculated using Equation 4. As shown in Fig. 1F, the emPAI-based concentrations were highly consistent with the actual values (y = 0.973x, r = 0.93), and the deviation factors ranged from 1.03 to 4.98 with an average of 1.74 ± 0.79. The outlier in (x, y) = (10.6, 2.13) is clathrin heavy chain (CLH_RAT). Mouse clathrin is not in the current Swiss-Prot but in TrEMBL (Q68FD5_MOUSE), which was not used for protein identification. Q68FD5_MOUSE is not identical in sequence to CLH_RAT. It is possible that the number of observed peptides would increase using Q68FD5_MOUSE or other sequences instead of CLH_RAT, although Q68FD5_MOUSE needs more validation for Swiss-Prot entry. Note that these measures of confidence compare favorably with protein abundance by comparative gel staining and indeed with the Bradford assay itself used here to measure total protein amount (
      • Read S.M.
      • Northcote D.H.
      Minimization of variation in the response to different proteins of the Coomassie blue G dye-binding assay for protein.
      ). Furthermore just as there are proteins known to stain well, the emPAI of certain proteins could also be adjusted in the future. In any case, the emPAI approach seems to provide a reasonably accurate estimate for comprehensive absolute quantitation.

       Dependence of emPAI on Experimental Conditions—

      In this experiment, we used the fast MS and MS/MS cycle time on the QSTAR to maximize the number of MS/MS events. When a slower cycle was used, the deviation from the linear relationship between emPAI and the protein concentrations increased (Fig. 2, A and B) due to the random sampling effects mentioned above. This effect was more pronounced when an LCQ ion trap instrument was used (Fig. 2C) presumably because the limited trap capacity results in a biased peak selection for more abundant proteins, and indeed a larger deviation was observed for more abundant proteins. We used an LTQ, a linear ion trap instrument that has a higher capacity and a faster scan time when compared with the LCQ (
      • Hager J.W.
      A new linear ion trap mass spectrometer.
      ,
      • Schwartz J.C.
      • Senko M.W.
      • Syka J.E.
      A two-dimensional quadrupole ion trap mass spectrometer.
      ), to evaluate the influence of the cycle time. The use of LTQ data improved the accuracy of emPAI in comparison to LCQ data (Fig. 2D). However, the faster scan cycle of LTQ compared with QSTAR did not provide better correlation between emPAI and protein abundance. This could be because of the limited capacity of LTQ to trap ions even in the linear configuration. We also evaluated the influence of sample complexity by using multidimensional chromatography (
      • Ishihama Y.
      • Sato T.
      • Tabata T.
      • Miyamoto N.
      • Sagane K.
      • Nagasu T.
      • Oda Y.
      Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards.
      ,
      • Link A.J.
      • Eng J.
      • Schieltz D.M.
      • Carmack E.
      • Mize G.J.
      • Morris D.R.
      • Garvik B.M.
      • Yates III, J.R.
      Direct analysis of protein complexes using mass spectrometry.
      ). As shown in Fig. 2E, SCX fractionation gave improvement in emPAI accuracy. To confirm the effect of the reduction of sample complexity on emPAI accuracy, we used both QSTAR and LTQ in combination with SCX fractionation and obtained in total 2752 identified proteins with 11,727 non-redundant peptides from neuro2a cells. The correlation between emPAI values of the 46 test proteins and their protein abundances was significantly improved as shown in Fig. 2F. Note that PAI values of 22 proteins of 46 proteins were more than one in this analysis, whereas only two proteins had PAI values of more than one in the QSTAR analysis without SCX fractionation. This result shows that under the current conditions emPAI was not saturated. However, it would be possible to saturate emPAI if some proteins are highly abundant. We observed this in our analysis of the malaria proteome where hemoglobin was extremely abundant because the samples were prepared from red blood cells (
      • Lasonder E.
      • Ishihama Y.
      • Andersen J.S.
      • Vermunt A.M.
      • Pain A.
      • Sauerwein R.W.
      • Eling W.M.
      • Hall N.
      • Waters A.P.
      • Stunnenberg H.G.
      • Mann M.
      Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry.
      ). Extremely abundant proteins may furthermore affect the efficiency of protein identification because of ionization suppression and detector saturation as well as the limited loading capacity of LC columns. The removal of extremely abundant proteins is therefore required to improve the identification efficiency and can be achieved by gel-enhanced LC-MS (one-dimensional gel followed by slicing, digesting, and LC-MS analysis) as shown in our malaria proteome study or albumin depletion treatment for plasma proteome studies. Such a treatment will also remove the influence of emPAI saturation.
      Figure thumbnail gr2
      Fig. 2Influence of MS measurement conditions on the deviation factors between the estimated and measured concentrations of 46 proteins in neuro2a cells.A, QSTAR with faster scan cycle (0.55 s for each MS/MS scan). B, QSTAR with slower scan cycle (1.5 s for each MS/MS scan). C, LCQ with slower scan cycle (1.2 s on average for each MS/MS scan). D, LTQ with faster scan cycle (0.38 s on average for each MS/MS scan). E, multidimensional LC-MS/MS with LTQ (five fractions). F, multidimensional LC-MS/MS with QSTAR and LTQ.
      We also examined the influence of the injected sample amounts on the emPAI-based molar fractions. Using the whole cell lysate of neuro2a cells, three different levels (basal and 3× and 9× dilutions) were analyzed by LC-MS/MS. For 20 proteins with commonly identified peptides in all three analyses, constant values of the molar fraction were obtained (deviation factors were 1.66 ± 0.55 for 3× dilution and 1.85 ± 0.85 for 9× dilution, respectively), whereas emPAI values depended on the injected amounts as expected.

       Application to Comprehensive Protein Expression Analysis—

      The emPAI is a convenient and easily obtained index that can be used to produce protein expression data from any LC-MS/MS runs. We applied this approach to obtain data for comparison with gene expression data in HCT116 human cancer cells. A DNA microarray provided expression data for 4971 genes, whereas a single LC-MS run provided 402 identified proteins based on 1811 peptides with unique sequences. Bridging gene symbols with protein accession numbers resulted in a total of 227 genes/protein pairs for the expression comparison study. A weak correlation was observed in Fig. 3 as expected from previous studies on yeast (
      • Liu H.
      • Sadygov R.G.
      • Yates III, J.R.
      A model for random sampling and estimation of relative protein abundance in shotgun proteomics.
      ,
      • Gygi S.P.
      • Rochon Y.
      • Franza B.R.
      • Aebersold R.
      Correlation between protein and mRNA abundance in yeast.
      ). Interestingly most of the outliers were ribosomal proteins. It is well known that, unlike prokaryotes such as Escherichia coli, mammalian cells regulate the expression levels of ribosomal proteins not only by transcription but also at the steps of transport of mRNA and translation and by degradation of excess amounts of proteins not associated with rRNA (
      • Tsurugi K.
      On the regulation of ribosome synthesis in eukaryotic cell.
      ,
      • Mager W.H.
      Control of ribosomal protein gene expression.
      ). Accordingly in a comparison study between gene and protein expression levels using emPAI for E. coli, we did not find such a deviation of ribosomal proteins.
      Y. Ishihama, D. Frishman, and M. Mann, unpublished data.
      Although both gene and protein expression data are not sufficiently accurate to discriminate a 10% difference, for instance, it is quite helpful to obtain a broad overview as shown above. We also note that the protein quantitation error of our simple emPAI is similar or better than the error in determining mRNA expression in DNA microarrays.
      Figure thumbnail gr3
      Fig. 3Comparison between gene and protein expression levels in HCT116 cells. Experimental details are described under “Materials and Methods.”

      Conclusions—

      We have established a scale for estimating absolute protein abundance named emPAI. Because emPAI is easily calculated from the output information of database search engines such as Mascot, it is possible to apply this approach to previously measured or published datasets to add quantitative information without any additional steps. emPAI can also be used for relative quantitation especially in cases where isotope-based approaches cannot be applied because of quantitative changes that are too large for accurate measurements of ratios, because metabolic labeling is not possible, or because sensitivity constraints do not allow chemical labeling techniques. In such cases, emPAI values of proteins in one sample can compare with those in another sample, and the outliers from the emPAI correlation between two samples can be determined as increasing or decreasing proteins.
      This emPAI approach was applied to multidimensional separation-MS/MS to extend the coverage of proteins. Further improvement would be possible by optimizing MS instrument-dependent parameters such as ionization dependence on m/z region. Because the emPAI index can be calculated with a simple script and does not require further experimentation in protein identification experiments, we suggest its routine use in the reporting of proteomic results.

      Acknowledgments

      We thank Rie Ushijima, Junro Kuromitsu, Takashi Owa, and Akira Yokoi (LSFT, Eisai) for DNA microarray experiments and Norimasa Miyamoto (LSFT) for SILAC cell culture. We also thank Peter Mortensen and Shao-En Ong (CEBI) for MSQuant setting and members of LSFT and CEBI for fruitful discussion. Y. I. thanks Eisai for the opportunity to stay in CEBI.

      Supplementary Material

      REFERENCES

        • Aebersold R.
        • Mann M.
        Mass spectrometry-based proteomics.
        Nature. 2003; 422: 198-207
        • Oda Y.
        • Huang K.
        • Cross F.R.
        • Cowburn D.
        • Chait B.T.
        Accurate quantitation of protein expression and site-specific phosphorylation.
        Proc. Natl. Acad. Sci. U. S. A. 1999; 96: 6591-6596
        • Gygi S.P.
        • Rist B.
        • Gerber S.A.
        • Turecek F.
        • Gelb M.H.
        • Aebersold R.
        Quantitative analysis of complex protein mixtures using isotope-coded affinity tags.
        Nat. Biotechnol. 1999; 17: 994-999
        • Ong S.E.
        • Blagoev B.
        • Kratchmarova I.
        • Kristensen D.B.
        • Steen H.
        • Pandey A.
        • Mann M.
        Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics.
        Mol. Cell. Proteomics. 2002; 1: 376-386
        • MacCoss M.J.
        • Wu C.C.
        • Liu H.
        • Sadygov R.
        • Yates III, J.R.
        A correlation algorithm for the automated quantitative analysis of shotgun proteomics data.
        Anal. Chem. 2003; 75: 6912-6921
        • Foster L.J.
        • De Hoog C.L.
        • Mann M.
        Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors.
        Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 5813-5818
        • Blagoev B.
        • Kratchmarova I.
        • Ong S.E.
        • Nielsen M.
        • Foster L.J.
        • Mann M.
        A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling.
        Nat. Biotechnol. 2003; 21: 315-318
        • Ranish J.A.
        • Yi E.C.
        • Leslie D.M.
        • Purvine S.O.
        • Goodlett D.R.
        • Eng J.
        • Aebersold R.
        The study of macromolecular complexes by quantitative proteomics.
        Nat. Genet. 2003; 33: 349-355
        • Schulze W.X.
        • Mann M.
        A novel proteomic screen for peptide-protein interactions.
        J. Biol. Chem. 2004; 279: 10756-10764
        • Oda Y.
        • Owa T.
        • Sato T.
        • Boucher B.
        • Daniels S.
        • Yamanaka H.
        • Shinohara Y.
        • Yokoi A.
        • Kuromitsu J.
        • Nagasu T.
        Quantitative chemical proteomics for identifying candidate drug targets.
        Anal. Chem. 2003; 75: 2159-2165
        • Barr J.R.
        • Maggio V.L.
        • Patterson Jr., D.G.
        • Cooper G.R.
        • Henderson L.O.
        • Turner W.E.
        • Smith S.J.
        • Hannon W.H.
        • Needham L.L.
        • Sampson E.J.
        Isotope dilution–mass spectrometric quantification of specific proteins: model application with apolipoprotein A-I.
        Clin. Chem. 1996; 42: 1676-1682
        • Gerber S.A.
        • Rush J.
        • Stemman O.
        • Kirschner M.W.
        • Gygi S.P.
        Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS.
        Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 6940-6945
        • Havlis J.
        • Shevchenko A.
        Absolute quantification of proteins in solutions and in polyacrylamide gels by mass spectrometry.
        Anal. Chem. 2004; 76: 3029-3036
        • Corbin R.W.
        • Paliy O.
        • Yang F.
        • Shabanowitz J.
        • Platt M.
        • Lyons Jr., C.E.
        • Root K.
        • McAuliffe J.
        • Jordan M.I.
        • Kustu S.
        • Soupene E.
        • Hunt D.F.
        Toward a protein profile of Escherichia coli: comparison to its transcription profile.
        Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 9232-9237
        • Lasonder E.
        • Ishihama Y.
        • Andersen J.S.
        • Vermunt A.M.
        • Pain A.
        • Sauerwein R.W.
        • Eling W.M.
        • Hall N.
        • Waters A.P.
        • Stunnenberg H.G.
        • Mann M.
        Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry.
        Nature. 2002; 419: 537-542
        • Shen Y.
        • Zhao R.
        • Berger S.J.
        • Anderson G.A.
        • Rodriguez N.
        • Smith R.D.
        High-efficiency nanoscale liquid chromatography coupled on-line with mass spectrometry using nanoelectrospray ionization for proteomics.
        Anal. Chem. 2002; 74: 4235-4249
        • Rappsilber J.
        • Ryder U.
        • Lamond A.I.
        • Mann M.
        Large-scale proteomic analysis of the human spliceosome.
        Genome. Res. 2002; 12: 1231-1245
        • Sanders S.L.
        • Jennings J.
        • Canutescu A.
        • Link A.J.
        • Weil P.A.
        Proteomics of the eukaryotic transcription machinery: identification of proteins associated with components of yeast TFIID by multidimensional mass spectrometry.
        Mol. Cell. Biol. 2002; 22: 4723-4738
        • Liu H.
        • Sadygov R.G.
        • Yates III, J.R.
        A model for random sampling and estimation of relative protein abundance in shotgun proteomics.
        Anal. Chem. 2004; 76: 4193-4201
        • Cox B.J.
        • Kislinger T.
        • Wigle D.A.
        • Brown K.
        • Manning D.
        • Jurisica I.
        • Emili A.
        • Rossant J.
        Proceedings of the 52nd ASMS Conference on Mass Spectrometry and Allied Topics, May 23–27, 2004, Nashville, Abstr. ThPS352.
        American Society for Mass Spectrometry, Santa Fe, NM2004
        • Allet N.
        • Barrillat N.
        • Baussant T.
        • Boiteau C.
        • Botti P.
        • Bougueleret L.
        • Budin N.
        • Canet D.
        • Carraud S.
        • Chiappe D.
        • Christmann N.
        • Colinge J.
        • Cusin I.
        • Dafflon N.
        • Depresle B.
        • Fasso I.
        • Frauchiger P.
        • Gaertner H.
        • Gleizes A.
        • Gonzalez-Couto E.
        • Jeandenans C.
        • Karmime A.
        • Kowall T.
        • Lagache S.
        • Mahe E.
        • Masselot A.
        • Mattou H.
        • Moniatte M.
        • Niknejad A.
        • Paolini M.
        • Perret F.
        • Pinaud N.
        • Ranno F.
        • Raimondi S.
        • Reffas S.
        • Regamey P.O.
        • Rey P.A.
        • Rodriguez-Tome P.
        • Rose K.
        • Rossellat G.
        • Saudrais C.
        • Schmidt C.
        • Villain M.
        • Zwahlen C.
        In vitro and in silico processes to identify differentially expressed proteins.
        Proteomics. 2004; 4: 2333-2351
        • Rappsilber J.
        • Ishihama Y.
        • Mann M.
        Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics.
        Anal. Chem. 2003; 75: 663-670
        • Ishihama Y.
        • Sato T.
        • Tabata T.
        • Miyamoto N.
        • Sagane K.
        • Nagasu T.
        • Oda Y.
        Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards.
        Nat. Biotechnol. 2005; 23: 617-621
        • Ishihama Y.
        • Rappsilber J.
        • Andersen J.S.
        • Mann M.
        Microcolumns with self-assembled particle frits for proteomics.
        J. Chromatogr. A. 2002; 979: 233-239
        • Meek J.L.
        Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition.
        Proc. Natl. Acad. Sci. U. S. A. 1980; 77: 1632-1636
        • Sakamoto Y.
        • Kawakami N.
        • Sasagawa T.
        Prediction of peptide retention times.
        J. Chromatogr. 1988; 442: 69-79
        • Sweetman G.
        • Bantscheff M.
        • Schirle M.
        • Rick J.
        • Kuster B.
        Proceedings of the 52nd ASMS Conference on Mass Spectrometry and Allied Topics, May 23–27, 2004, Nashville, Abstr. TPR361.
        American Society for Mass Spectrometry, Santa Fe, NM2004
        • Read S.M.
        • Northcote D.H.
        Minimization of variation in the response to different proteins of the Coomassie blue G dye-binding assay for protein.
        Anal. Biochem. 1981; 116: 53-64
        • Hager J.W.
        A new linear ion trap mass spectrometer.
        Rapid Commun. Mass Spectrom. 2002; 16: 512-526
        • Schwartz J.C.
        • Senko M.W.
        • Syka J.E.
        A two-dimensional quadrupole ion trap mass spectrometer.
        J. Am. Soc. Mass Spectrom. 2002; 13: 659-669
        • Link A.J.
        • Eng J.
        • Schieltz D.M.
        • Carmack E.
        • Mize G.J.
        • Morris D.R.
        • Garvik B.M.
        • Yates III, J.R.
        Direct analysis of protein complexes using mass spectrometry.
        Nat. Biotechnol. 1999; 17: 676-682
        • Gygi S.P.
        • Rochon Y.
        • Franza B.R.
        • Aebersold R.
        Correlation between protein and mRNA abundance in yeast.
        Mol. Cell. Biol. 1999; 19: 1720-1730
        • Tsurugi K.
        On the regulation of ribosome synthesis in eukaryotic cell.
        Seikagaku. 1989; 61: 271-284
        • Mager W.H.
        Control of ribosomal protein gene expression.
        Biochim. Biophys. Acta. 1988; 949: 1-15