|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 4:1265-1272, 2005.
© 2005 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,¶



,||
,**
From the
Laboratory of Seeds Finding Technology (LSFT), Eisai Co., Ltd., 5-1-3 Tokodai, Tsukuba, Ibaraki 300-2635, Japan, the
Center for Experimental Bioinformatics (CEBI), University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark, and || The FIRC Institute for Molecular Oncology, 20139 Milan, Italy
| ABSTRACT |
|---|
|
|
|---|
Even a single nano-LC-MS/MS analysis can easily generate a long list of identified proteins with the help of database searching, and additional information can be extracted, such as the hit rank in identification, the probability score, the number of identified peptides per protein, ion counts of identified peptides, LC retention times, and so on. Qualitatively some parameters, such as the hit rank, the score, and the number of peptides per protein (14), can be considered as indicators for protein abundance in the analyzed sample. Among them, the integrated ion counts of the peptides identifying each protein would be the most direct parameter to describe the abundance and has been used to compare protein expression in different states (15). However, a mass spectrometer is not as versatile as an absorbance detector because of limited linearity and possibly because of background and ionization suppression effects (16). Therefore, it is necessary to normalize these parameters to obtain at least approximate quantitative information. The first approach to achieve this, to our knowledge, was to use the number of peptides per protein normalized by the theoretical number of peptides (so-called protein abundance index (PAI)1), and this was applied to human spliceosome complex analysis (17). PAI is superior to the number of identified peptides because it takes account of the fact that, for the same number of molecules, larger proteins and proteins with many peptides in the preferred mass range for mass spectrometry will generate more observed peptides. Independently Sanders et al. (18) developed a similar index. The number of peptides, spectra counts, or the total of the peptide probability scores in LC/LC-MS/MS analysis can also be used for relative quantitation (1921). Here we further develop the PAI strategy to determine protein abundance from nano-LC-MS/MS experiments and present a modified form, emPAI, the exponential form of PAI minus one. In experiments with labeled complex mixtures, into which we spiked in synthetic peptides, we show emPAI to be roughly proportional to protein abundance.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Preparation of Peptide Mixtures for LC-MS/MS
Proteins from cell lysates were dried and resuspended in 50 mM Tris-HCl buffer (pH 9.0) containing 8 M urea. These mixtures were subsequently reduced, alkylated, and digested with Lys-C (Wako, Osaka, Japan) and trypsin (Promega, Madison, WI) as described previously (6). Digested solutions were acidified with TFA and desalted and concentrated using C18 StageTips (22), which were prepared by a fully automated instrument (Nikkyo Technos, Tokyo, Japan) with Empore C18 disks (3M, St. Paul, MN). Peptide fractionation by strong cation exchange chromatography (SCX) was performed using SCX-StageTip with 0500 mM five-step ammonium acetate salt elution (23), and the resultant fractions were desalted using C18 StageTips prior to LC-MS/MS analysis. Candidates for peptide synthesis containing at least one leucine and one tyrosine were selected, considering the sequences of tryptic peptides from proteins expressed in neuro2a cells. Peptides containing methionine and tryptophan were removed to avoid oxidation problems during sample preparation. In addition, peptides with double basic residues were removed, considering the frequency of missed cleavage by trypsin. The selected 54 peptides were synthesized using a Shimadzu PSSM-8 (Kyoto, Japan) with Fmoc (N-(9-fluorenyl)methoxycarbonyl) chemistry and were purified by preparative HPLC. Amino acid analysis, peptide mass measurement, and HPLC-UV were carried out for purity and structure elucidation. A solution containing equal amounts of each peptide was spiked into the peptide mixtures from neuro2a cells. Three different amounts were spiked so that peak intensity ratios of unlabeled peptides to labeled peptides were between 0.2 and 5.
Nano-LC-MS/MS Analysis
All samples were analyzed by nano-LC-MS/MS using a QSTAR Pulsar i (AB/MDS-Sciex, Toronto, Canada), a Finnigan LCQ Advantage (Thermoelectron, San Jose, CA) or a Finnigan LTQ (Thermoelectron) system equipped with a Shimadzu LC10A gradient pump and an HTC-PAL autosampler (CTC Analytics AG, Zwingen, Switzerland) equipped with Valco C2 valves with 150-µm ports. ReproSil C18 materials (3 µm, Dr. Maisch, Ammerbuch, Germany) were packed into a self-pulled needle (100-µm inner diameter, 6-µm opening, 150-mm length) with a nitrogen-pressurized column loader cell (Nikkyo) to prepare an analytical column needle with "stone-arch" frit (24). A Teflon-coated column holder (Nikkyo) was mounted on an x-y-z nanospray interface (Proxeon, Odense, Denmark), and a Valco metal connector with a magnet was used to hold the column needle and to set the appropriate spray position. The injection volume was 2.5 µl, and the flow rate was 250 nl/min after a tee splitter. The mobile phases consisted of A (0.5% acetic acid) and B (0.5% acetic acid and 80% acetonitrile). The three-step linear gradient of 510% B in 5 min, 1030% in 60 min, 30100% in 5 min, and 100% in 10 min was used throughout this study. A spray voltage of 2400 V was applied via the metal connector as described previously (24). For QSTAR experiments with the faster scan mode, MS scans were performed for 1 s to select three intense peaks, and subsequently three MS/MS scans were performed for 0.55 s each. An information-dependent acquisition function was active for 3 min to exclude the previously scanned parent ions. For the slower scan mode, four MS/MS scans (1.5 s each) per one MS scan (1 s) were performed. For LCQ and LTQ experiments, two MS/MS scans per one MS scan were performed in the automated gain control mode. The scan cycle was 1.19 s for one MS and 1.17 s for one MS/MS on average in LCQ and 0.17 s for one MS and 0.38 s for one MS/MS on average in LTQ. The scan range was m/z 3501400 for QSTAR, LCQ, and LTQ.
Data Analysis
A Mascot version 1.9 database search engine (Matrix Sciences, London, UK) was used for protein identification against the Swiss-Prot protein database. The allowed number of missed cleavages was set to 1, and peptide scores to indicate identity were used for peptide identification without manual inspection of MS/MS spectra. MSQuant version 1.4a was customized for [13C6]Leu SILAC to determine the ion counts in chromatograms for absolute concentrations of proteins using known amounts of synthetic peptides. MSQuant is open source software developed by us and available at sourceforge.net.
Protein Abundance Determination
To calculate the number of observable peptides per protein, proteins were digested in silico, and the obtained peptides masses were compared with the scan range of the mass spectrometer. In addition, the expected retention times under our nano-LC conditions were calculated according to the procedure of Meek (25) and Sakamoto et al. (26) with our own coefficients based on
1500 peptides. Peptides that were too hydrophilic or hydrophobic were eliminated. An in-house program was written in PHP to calculate the peptide number and was used to export all data to Microsoft Excel. The program is freely accessible at xome.hydra.mki.co.jp:8080/bitt/common/Menu. Regarding the number of observed peptides per protein, three methods of counting were used, i.e. 1) counting unique parent ions, 2) counting unique sequences, and 3) counting unique sequences without partial modification and the overlap caused by missed tryptic cleavage. These numbers were exported from Mascot html files to Excel spreadsheets using the "Export All Peptides" function of MSQuant software.
The PAI is defined as
![]() |
where Nobsd and Nobsbl are the number of observed peptides per protein and the number of observable peptides per protein, respectively (17). The emPAI is defined as follows.
![]() |
Thus, the protein contents in molar and weight fraction percentages are described as
![]() |
![]() |
where Mr is the molecular weight of the protein, and
(emPAI) is the summation of emPAI values for all identified proteins. The entire procedure for emPAI calculation is shown in Supplemental Sheet 1.
To evaluate the accuracy of the parameters, a deviation factor was defined as
![]() |
where measured values are larger than estimated values or
![]() |
where estimated values are larger than measured values.
DNA Microarray Analysis
HCT116-C9 cells were plated at 5.0 x 106 cells/dish in 10-cm-diameter dishes with 10 ml of the culture medium. After 24-h preincubation, the cells were treated for 12 h with 0.015% DMSO. Duplicate experiments were performed using Affymetrix HuGene FL arrays according to established protocols. Affymetrix GeneChip software was used to extract gene signal intensities, and two sets of data were grouped and averaged based on gene symbols.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
Absolute Quantitation Using emPAI
Although PAI can estimate the abundance relationship between proteins, it cannot express the molar fraction directly. Therefore, we derived a new parameter, emPAI, that is the exponential form of PAI minus 1 (Equation 2) and that is directly proportional to protein content as shown in Fig. E. To calculate the absolute concentrations, total protein amounts were measured as weight by BCA assay, and the weight fractions of 46 proteins among 336 neuro2a proteins were calculated using Equation 4. As shown in Fig. 1F, the emPAI-based concentrations were highly consistent with the actual values (y = 0.973x, r = 0.93), and the deviation factors ranged from 1.03 to 4.98 with an average of 1.74 ± 0.79. The outlier in (x, y) = (10.6, 2.13) is clathrin heavy chain (CLH_RAT). Mouse clathrin is not in the current Swiss-Prot but in TrEMBL (Q68FD5_MOUSE), which was not used for protein identification. Q68FD5_MOUSE is not identical in sequence to CLH_RAT. It is possible that the number of observed peptides would increase using Q68FD5_MOUSE or other sequences instead of CLH_RAT, although Q68FD5_MOUSE needs more validation for Swiss-Prot entry. Note that these measures of confidence compare favorably with protein abundance by comparative gel staining and indeed with the Bradford assay itself used here to measure total protein amount (28). Furthermore just as there are proteins known to stain well, the emPAI of certain proteins could also be adjusted in the future. In any case, the emPAI approach seems to provide a reasonably accurate estimate for comprehensive absolute quantitation.
Dependence of emPAI on Experimental Conditions
In this experiment, we used the fast MS and MS/MS cycle time on the QSTAR to maximize the number of MS/MS events. When a slower cycle was used, the deviation from the linear relationship between emPAI and the protein concentrations increased (Fig. 2, A and B) due to the random sampling effects mentioned above. This effect was more pronounced when an LCQ ion trap instrument was used (Fig. 2C) presumably because the limited trap capacity results in a biased peak selection for more abundant proteins, and indeed a larger deviation was observed for more abundant proteins. We used an LTQ, a linear ion trap instrument that has a higher capacity and a faster scan time when compared with the LCQ (29, 30), to evaluate the influence of the cycle time. The use of LTQ data improved the accuracy of emPAI in comparison to LCQ data (Fig. 2D). However, the faster scan cycle of LTQ compared with QSTAR did not provide better correlation between emPAI and protein abundance. This could be because of the limited capacity of LTQ to trap ions even in the linear configuration. We also evaluated the influence of sample complexity by using multidimensional chromatography (23, 31). As shown in Fig. 2E, SCX fractionation gave improvement in emPAI accuracy. To confirm the effect of the reduction of sample complexity on emPAI accuracy, we used both QSTAR and LTQ in combination with SCX fractionation and obtained in total 2752 identified proteins with 11,727 non-redundant peptides from neuro2a cells. The correlation between emPAI values of the 46 test proteins and their protein abundances was significantly improved as shown in Fig. 2F. Note that PAI values of 22 proteins of 46 proteins were more than one in this analysis, whereas only two proteins had PAI values of more than one in the QSTAR analysis without SCX fractionation. This result shows that under the current conditions emPAI was not saturated. However, it would be possible to saturate emPAI if some proteins are highly abundant. We observed this in our analysis of the malaria proteome where hemoglobin was extremely abundant because the samples were prepared from red blood cells (15). Extremely abundant proteins may furthermore affect the efficiency of protein identification because of ionization suppression and detector saturation as well as the limited loading capacity of LC columns. The removal of extremely abundant proteins is therefore required to improve the identification efficiency and can be achieved by gel-enhanced LC-MS (one-dimensional gel followed by slicing, digesting, and LC-MS analysis) as shown in our malaria proteome study or albumin depletion treatment for plasma proteome studies. Such a treatment will also remove the influence of emPAI saturation.
|
Application to Comprehensive Protein Expression Analysis
The emPAI is a convenient and easily obtained index that can be used to produce protein expression data from any LC-MS/MS runs. We applied this approach to obtain data for comparison with gene expression data in HCT116 human cancer cells. A DNA microarray provided expression data for 4971 genes, whereas a single LC-MS run provided 402 identified proteins based on 1811 peptides with unique sequences. Bridging gene symbols with protein accession numbers resulted in a total of 227 genes/protein pairs for the expression comparison study. A weak correlation was observed in Fig. 3 as expected from previous studies on yeast (19, 32). Interestingly most of the outliers were ribosomal proteins. It is well known that, unlike prokaryotes such as Escherichia coli, mammalian cells regulate the expression levels of ribosomal proteins not only by transcription but also at the steps of transport of mRNA and translation and by degradation of excess amounts of proteins not associated with rRNA (33, 34). Accordingly in a comparison study between gene and protein expression levels using emPAI for E. coli, we did not find such a deviation of ribosomal proteins.2 Although both gene and protein expression data are not sufficiently accurate to discriminate a 10% difference, for instance, it is quite helpful to obtain a broad overview as shown above. We also note that the protein quantitation error of our simple emPAI is similar or better than the error in determining mRNA expression in DNA microarrays.
|
| Conclusions |
|---|
|
|
|---|
This emPAI approach was applied to multidimensional separation-MS/MS to extend the coverage of proteins. Further improvement would be possible by optimizing MS instrument-dependent parameters such as ionization dependence on m/z region. Because the emPAI index can be calculated with a simple script and does not require further experimentation in protein identification experiments, we suggest its routine use in the reporting of proteomic results.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, June 14, 2005, DOI 10.1074/mcp.M500061-MCP200
1 The abbreviations used are: PAI, protein abundance index; emPAI, exponentially modified protein abundance index; SILAC, stable isotope labeling with amino acids in cell culture; SCX, strong cation exchange chromatography; HSA, human serum albumin ![]()
2 Y. Ishihama, D. Frishman, and M. Mann, unpublished data. ![]()
* Work at LSFT and CEBI was supported by NEDO (New Energy and Industrial Technology Development Organization, Japan) and the Danish National Research Foundation, respectively. ![]()
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
¶ To whom correspondence may be addressed. Tel.: 81-29-847-7192; Fax: 81-29-847-7614; E-mail: y-ishihama{at}hhc.eisai.co.jp
** To whom correspondence may be addressed. Tel.: 45-6550-2364; Fax: 45-6593-3929; E-mail: mann{at}bmb.sdu.dk
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Rogalski, M. A. Schottler, W. Thiele, W. X. Schulze, and R. Bock Rpl33, a Nonessential Plastid-Encoded Ribosomal Protein in Tobacco, Is Required under Cold Stress Conditions PLANT CELL, August 1, 2008; 20(8): 2221 - 2237. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Raaijmakers, W. Pluk, C. H. Schroder, J. Gloerich, E. A.M. Cornelissen, H. J.C.T. Wessels, J. L. Willems, L. A.H. Monnens, and L. P.W.J. van den Heuvel Proteomic profiling and identification in peritoneal fluid of children treated by peritoneal dialysis Nephrol. Dial. Transplant., July 1, 2008; 23(7): 2402 - 2405. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Nakazawa, T. Nakamura, A. Kokubu, M. Ebe, K. Nagao, and M. Yanagida Dissection of the essential steps for condensin accumulation at kinetochores and rDNAs during fission yeast mitosis J. Cell Biol., March 24, 2008; 180(6): 1115 - 1131. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Mayer, N. Ungerer, D. Klimmeck, U. Warnken, M. Schnolzer, S. Frings, and F. Mohrlen Proteomic Analysis of a Membrane Preparation from Rat Olfactory Sensory Cilia Chem Senses, February 1, 2008; 33(2): 145 - 162. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lopez-Campistrous, L. Hao, W. Xiang, D. Ton, P. Semchuk, J. Sander, M. J. Ellison, and C. Fernandez-Patron Mitochondrial Dysfunction in the Hypertensive Rat Brain: Respiratory Complexes Exhibit Assembly Defects in Hypertension Hypertension, February 1, 2008; 51(2): 412 - 419. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hayashi, M. Hatanaka, K. Nagao, Y. Nakaseko, J. Kanoh, A. Kokubu, M. Ebe, and M. Yanagida Rapamycin sensitivity of the Schizosaccharomyces pombe tor2 mutant and organization of two highly phosphorylated TOR complexes by specific and common subunits. Genes Cells, December 1, 2007; 12(12): 1357 - 1370. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kulasingam and E. P. Diamandis Proteomics Analysis of Conditioned Media from Three Breast Cancer Cell Lines: A Mine for Biomarkers and Therapeutic Targets Mol. Cell. Proteomics, November 1, 2007; 6(11): 1997 - 2011. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. D. Gold and V. J. J. Martin Global View of the Clostridium thermocellum Cellulosome Revealed by Quantitative Proteomic Analysis J. Bacteriol., October 1, 2007; 189(19): 6787 - 6795. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Cutillas and B. Vanhaesebroeck Quantitative Profile of Five Murine Core Proteomes Using Label-free Functional Proteomics Mol. Cell. Proteomics, September 1, 2007; 6(9): 1560 - 1573. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Liu, G. Tan, N. Levenkova, T. Li, E. N. Pugh Jr., J. J. Rux, D. W. Speicher, and E. A. Pierce The Proteome of the Mouse Photoreceptor Sensory Cilium Complex Mol. Cell. Proteomics, August 1, 2007; 6(8): 1299 - 1317. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Heller, E. Schlappritzi, D. Stalder, J.-M. Nuoffer, and A. Haeberli Compositional Protein Analysis of High Density Lipoproteins in Hypercholesterolemia by Shotgun LC-MS/MS and Probabilistic Peptide Scoring Mol. Cell. Proteomics, June 1, 2007; 6(6): 1059 - 1072. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Jaquinod, F. Villiers, S. Kieffer-Jaquinod, V. Hugouvieux, C. Bruley, J. Garin, and J. Bourguignon A Proteomics Dissection of Arabidopsis thaliana Vacuoles Isolated from Cell Culture Mol. Cell. Proteomics, March 1, 2007; 6(3): 394 - 412. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. M. Morris, J. M. Fung, B. G. Rahm, S. Zhang, D. L. Freedman, S. H. Zinder, and R. E. Richardson Comparative Proteomics of Dehalococcoides spp. Reveals Strain-Specific Peptides Associated with Activity Appl. Envir. Microbiol., January 1, 2007; 73(1): 320 - 326. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kinoshita, T. Uo, S. Jayadev, G. A. Garden, T. P. Conrads, T. D. Veenstra, and R. S. Morrison Potential Applications and Limitations of Proteomics in the Study of Neurological Disease Arch Neurol, December 1, 2006; 63(12): 1692 - 1696. [Full Text] [PDF] |
||||
![]() |
Q. W. T. Chan, C. G. Howes, and L. J. Foster Quantitative Comparison of Caste Differences in Honeybee Hemolymph Mol. Cell. Proteomics, December 1, 2006; 5(12): 2252 - 2262. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. N. Adkins, H. M. Mottaz, A. D. Norbeck, J. K. Gustin, J. Rue, T. R. W. Clauss, S. O. Purvine, K. D. Rodland, F. Heffron, and R. D. Smith Analysis of the Salmonella typhimurium Proteome through Environmental Response toward Infectious Conditions Mol. Cell. Proteomics, August 1, 2006; 5(8): 1450 - 1461. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Barrios-Llerena, P. K. Chong, C. S. Gan, A. P. L. Snijders, K. F. Reardon, and P. C. Wright Shotgun proteomics of cyanobacteria--applications of experimental and data-mining techniques Brief Funct Genomic Proteomic, June 1, 2006; 5(2): 121 - 132. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. Hood, T. P. Conrads, and T. D. Veenstra Unravelling the proteome of formalin-fixed paraffin-embedded tissue Brief Funct Genomic Proteomic, June 1, 2006; 5(2): 169 - 175. [Abstract] [Full Text] [PDF] |
||||