|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 3:896-907, 2004.
© 2004 by The American Society for Biochemistry and Molecular Biology, Inc.









,¶
From the
Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick, P.O. Box B, Frederick, MD 21702-1201; and
Department of Neurological Surgery, University of Washington School of Medicine, Box 356470, Seattle, WA 98195
| ABSTRACT |
|---|
|
|
|---|
16% of all known mouse proteins. An evaluation of the number of false-positive identifications was undertaken by searching the entire MS/MS dataset against a database containing the sequences of over 12,000 proteins from archaea. This analysis allowed a systematic determination of the level of confidence in the identification of peptides as a function of SEQUEST cross correlation (Xcorr) and delta correlation (
Cn) scores. Correlation charts were also constructed to show the number of unique peptides identified for proteins from specific classes. The results show that low-abundance proteins involved in signal transduction and transcription are generally identified by fewer peptides than high-abundance proteins that play a role in maintaining mammalian cellular structure and motility. The results presented here provide the broadest proteome coverage for a mammalian cell to date and show that MS-based proteomics has the potential to provide high coverage of the proteins expressed within a cell.
One of the major goals of proteomics is to develop technologies capable of measuring the dynamic nature of protein expression, protein interactions, and post-translational modifications as a time-dependent function of the cellular state (1). To begin to accumulate this type of data requires that many measurements be made, therefore high-throughput protein characterization is essential. The initial piece of data required is to identify the proteins that are expressed within the cell under a given set of conditions. This information provides a foundation to understanding the proteins that are observable within a cells proteome and provides insight into the components that differentiate cells that have identical genomes.
The traditional approach for fractionating and identifying proteins within complex proteome mixtures has been a combination of two-dimensional (2D)1 PAGE followed by MS or MS/MS analysis of the visualized protein spots. The bias of 2D-PAGE against proteins with extreme isoelectric points and molecular masses, as well as its difficulty to resolve membrane proteins, has been well documented (2). Not only are certain classes of proteins underrepresented by 2D-PAGE, the subsequent identification of the separated proteins is laborious and time-consuming (3). Alternative approaches for the identification of proteins circumvent the need for 2D gel fractionation and rely solely on multidimensional chromatographic separation of proteolytically digested proteins prior to MS/MS analysis (1, 4). While these solution-based methods do not provide a direct method for protein quantitation, they routinely are capable of providing well over 1,000 protein identifications in rapid (i.e. hours to days) fashion. Recent studies have used such solution-based strategies to identify 490 proteins in serum (5), 1,504 proteins in yeast (6), 2,528 proteins in rice (7), and, most recently, 2,415 proteins in Plasmodium (8).
We have applied a multidimensional fractionation method followed by MS/MS resulting in the identification of over 15,000 unique peptides corresponding to at least 4,542 proteins within the proteome of mouse cortical neurons. The raw data analysis was performed to provide a statistical analysis of the confidence in the protein identifications. In addition, plots were constructed to determine if there exists a correlation between a proteins abundance and the number of unique peptides identified for that protein. The results show that proteins anticipated as being in high abundance, such as structural proteins, are typically identified by the largest number of unique peptides. The fewest number of unique peptides was associated with proteins of low abundance such as transcription factors and signal transducers. Although this data is qualitative in the strictest sense, the results obtained from the present study can be used to ascertain information about the protein abundances in complex mixtures and also identify novel proteins that have not previously been shown to exist in neural cells.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Strong Cation Exchange (SCX) Liquid Chromatographic Fractionation
The cortical neuron protein digestate was dissolved in 25% ACN and 0.1% TFA and loaded onto a 1-mm inner diameter x 150-mm SCXLC column (PolyLC, Columbia, MD) that was pre-equilibrated with 25% ACN delivered by an Agilent 1100 capillary LC system (Agilent Technologies, Palo Alto, CA). The peptides were eluted by a gradient generated from mobile phase A (25% ACN in water) and mobile phase B (25% ACN with 0.5 m ammonium formate, pH 3) over 96 min. Ninety-six fractions were collected for microcapilary LC (µLC)-MS/MS analysis.
Reversed-phase (RP) µLC-MS/MS of SCX Fractions
Ten-centimeter-long µRPLC-ESI columns were coupled online with an ion trap (IT) MS (LCQ Deca XP; Thermo Finnigan, San Jose, CA) to analyze each SCXLC fraction. To construct the µRPLC-ESI columns, 75-µm inner diameter fused-silica capillaries (Polymicro Technologies, Phoenix, AZ) were flame-pulled to construct a 10-cm fine inner diamter (i.e. 57 µm) tip against which Luna C18 (2) (3-µm diameter, 100 Å pore size) (Phenomenex, Torrence, CA) RP particles were slurry-packed using a slurry packing pump (Model 1666; Alltech Associates, Deerfield, IL). The columns were connected via a stainless steel union to an Agilent 1100 capillary LC system (Agilent Technologies), which was used to deliver mobile phases A (0.1% formic acid in water) and B (0.1% formic acid in ACN). After loading one-third content of each SCXLC fraction, the peptides were eluted at a flow rate of 300 nl/min using a step gradient of 240% solvent B for 120 min and 4085% solvent B for 30 min. The IT-MS was operated in a data-dependent MS/MS mode using a normalized CID energy of 30%. The voltage and temperature for the capillary of the ion source were set at 10 V and 180t°C, respectively.
Cross-database Search and MS Informatics for Global Proteome Characterization
The raw MS/MS data were searched using SEQUEST (10) against a mouse proteome database from the National Center for Biotechnology Information (www.ncbi.nih.gov) (mouse database I, 12,000 protein entries) and a mouse proteome database from the European Bioinformatics Institute (EBI) (www.ebi.ac.uk/proteome/index.html) (mouse database II, 20,624 protein entries). The raw MS/MS data were also searched against an Archaean protein database (12,038 protein entries) from EBI, which consists of the protein sequences from five Archaean species (i.e. Aeropyrum pernix, Archaeoglobus fulgidus, Pyrobaculum aerophilum, Sulfolobus tokodaii, and Thermoplasma volcanium). For the SEQUEST analysis, the peptide mass tolerance was set as 2.5 Da and the fragment ion tolerance was 0.5 Da. A tryptic enzyme restriction with a maximum of two internal missed cleavage sites was used. No residues (i.e. cysteine and methionine) were considered as modified in the database search. Based on the cross-database search, the SEQUEST criteria such as cross correlation (Xcorr) and delta cross correlation score (
Cn) for confident peptide identification was set and applied to the global mouse proteome identification by searching a larger mouse proteome database downloaded from EBI (mouse database III, 28,437 protein entries).
An in silico tryptic digestion (accounting for up to two missed cleavages) of the mouse protein fasta database (mouse database III) was performed to create a reference table that contains each tryptic fragment for each protein along with the corresponding Swiss-Prot accession numbers. The filtered TurboSEQUEST nonredundant peptide identification table (referred to as the ID table) was queried against the reference table using Microsoft Access. Because a single peptide in the ID table may correlate to more than one entry in the reference table, and therefore to more than one protein, it is then necessary to calculate the number of times a peptide within the ID table correlates to a peptide in the reference table in order to assess the extent to which a given peptide uniquely identifies a given protein. Hence, a peptide within the ID table that correlates to one entry in the reference table is unique to a single protein reference and, by definition, called protein-specific peptide tag (PPT). The final list of unique proteins identified in this work (Supplemental Table I) is derived solely from PPTs, and other peptides that were identified and may correspond to these unique proteins have also been listed. For purposes of clarifying which of these peptides uniquely identifies a protein, the number of times a peptide linked to a Swiss-Prot protein reference is included within Supplemental Table I (i.e. a 1 refers to a peptide that is distinct within the entire mouse database whereas any number >1 corresponds to more than one protein and cannot uniquely identify these proteins).
|
Immunostaining
Mice were perfused transcardially, under deep Nembutal anesthesia, first with heparinized saline followed by 4% paraformaldehyde in 0.1 m phosphate buffer (pH 7.4). The brain was immediately removed, post-fixed in the same fixative for 4 h, and cryoprotected in 10% and subsequently 30% sucrose in phosphate buffer. Parasagittal frozen sections were cut at 30 µm on a sliding microtome. After blocking in PBS containing 5% goat serum and 1% BSA, free-floating sections were incubated with the polyclonal pescadillo antibody (0.5 µg/ml) for 60 h at 4t°C. Sections were washed four times in PBS, including a wash with 1% H2O2 to quench endogenous peroxidase activity, and then incubated with a biotinylated goat anti-rabbit IgG secondary antibody (2 µg/ml; Jackson ImmunoResearch Laboratories, West Grove, PA) for 24 h at 4t°C. Immunoreactivity was visualized by incubating with avidin-biotinylated peroxidase complexes (Vectastain Elite ABC kit; Vector Laboratories, Burlingame, CA) overnight at 4t°C, followed by color development with diaminobenzidine as a peroxidase substrate.
Micrographs were taken using an Axiovert 100 inverted microscope (Carl Zeiss Microimaging, Thornwood, NY) with a cooled CCD camera (Cooke Corp, Auburn Hills, MI) and Slidebook image analysis software (Intelligent Imaging Innovations, Denver, CO). Immunostaining of samples requiring a direct comparison was done in single runs, and the subsequent processing of images was performed in an identical way for individual photographs using Slidebook and Photoshop (Adobe, San Jose, CA).
| RESULTS |
|---|
|
|
|---|
|
|
|
|
90% confidence for peptide identification. While [M+H]+ and [M+3H]3+ charged species are typically produced by peptides with low and high molecular masses (Mr), respectively, [M+2H]2+ peptides can span a wide mass range. Therefore, the effect of peptide molecular mass of [M+2H]2+ peptides on the Xcorr threshold required to achieve a
90% confidence level in peptide identification was evaluated. The peptides were arbitrarily divided into two sets, one set with molecular mass <1200 Da and the other
1,200 Da. A plot of the probability of positive identification of each set of peptides versus Xcorr thresholds (Fig. 4B) demonstrates that higher Xcorr cutoffs are required for high Mr peptides to achieve the same confidence level for peptide identification as compared with that required for lower Mr peptides. While the Xcorr cutoff of 2.2 is required for identifying peptides with Mr < 1,200 Da at confidence level of
90%, the Xcorr value of
2.5 is needed for the peptides with Mr
1,200 Da at the same confidence level.
The above investigations were conducted without consideration of delta Xcorr (
Cn) value cutoff threshold, another criterion commonly used in SEQUEST for peptide identification. The effect of
Cn thresholds on the confidence in peptide identification for [M+2H]2+ peptides at different Xcorr cutoffs is shown in Fig. 4C. When Xcorr thresholds are set at a low level (i.e. 1.5 and 1.9), slight changes in the
Cn threshold (when
Cn<0.3) have a large impact on the confidence in peptide identification. As the Xcorr threshold is increased, however, the contribution of
Cn is of much less impact on the confidence of any of the identifications. For example, the slope of the line between
Cn values of 0 and 0.1 is
165 when using a Xcorr threshold of 1.5, but approaches zero when the Xcorr threshold is increased to 2.8. Therefore at sufficiently high Xcorr, the confidence level for a positive identification becomes less dependent on the
Cn value. Regardless of the Xcorr thresholds, however, confidence of any of the identifications exceeds 90% when the
Cn cutoff is above
0.25.
The parameter Xcorr measures the extent to which an experimental MS/MS spectrum corresponds to a mass and theoretical MS/MS spectrum of a peptide within a given proteomic database, while the
Cn measures how far the Xcorr value of the first (top) candidate peptide is from that of the second candidate peptide. Based on the analyses presented, we determined that utilizing Xcorr thresholds for tryptic peptide identification of 2.1 for [M+H]+ peptides, 2.2 for [M+2H]2+ peptides with Mr < 1,200 Da, 2.5 for [M+2H]2+ peptides with Mr
1,200 Da, 2.9 for [M+3H]3+ peptides, and 0.08 for the
Cn cutoff results in a 95% confidence in peptide identification. Utilizing the parameters described, a total of 15,300 unique tryptic peptides were identified from primary cultures of cortical neurons from a total of 33 µg of sample when the 96 datasets were searched against the mouse database with 28,437 protein entries. A histogram of the total number of peptides identified based on their charge state and Xcorr score is shown in Fig. 5. Of the 26,566 total (redundant) number of peptides identified, 7.5% were identified as being from [M+H]+ peptides, 71.7% were from [M+2H]2+ peptides, and 20.8% were from [M+3H]3+ peptides. This classification shows that a large percentage of the peptides were identified with Xcorr values much greater than the minimum values required to achieve a 90% confidence level for correct identification. For example, 67.6% of the [M+2H]2+ peptides were identified with Xcorr values greater than 3.1 and 61.5% of the [M+3H]3+ peptides were identified with Xcorr values greater than 3.5. In addition, two-thirds of all of the peptides identified had Xcorr values at least 25% greater than the minimal SEQUEST parameter values used for peptide identification (i.e. at least 3.1 for [M+2H]2+ ions greater than 1,200 Da in mass).
|
6% originated from the repeated analysis of the same peptides eluting over a long time period in a single RP gradient (longer than the time window of dynamic exclusion). Approximately another 20% were contributed by peptide content overlap between two adjacent SCXLC fractions. The peptide overlap between one SCXLC fraction and the second adjacent fraction was only 24%. This low overlap between SCXLC fractions suggests the peptides were not over-fractionated when they were subjected to SCXLC fractionation into 96 fractions, and this high efficient fractionation enabled more peptides to be analyzed by MS/MS and increased the overall dynamic range.
Proteins Identified in Mouse Cortical Neurons
Although 15,300 nonredundant peptides were identified in this study, not each of them could be uniquely assigned to a single protein. A total of 3,590 proteins (Supplemental Table I) were definitely identified from 12,839 peptides with at least one peptide unique to a single protein, which was defined as a PPT (described in detail in "Experimental Procedures"). A histogram illustrating the number of unique peptides and proteins identified in the various SCXLC fractions is shown in Fig. 6, along with the SCXLC fluorescence chromatogram. However, each of the remaining 2,461 peptides could not be assigned to a single protein. These peptides resulted in 2,322 protein identifications according to the Swiss-Prot accession number. We found that some of these proteins are actually redundant proteins. To address this issue, cluster analysis was conducted to group the proteins together by the common peptides shared each other. This analysis resulted in 952 distinct protein clusters (each was assigned a ProCluster number; Supplemental Table II). As shown in Fig. 7, protein cluster 1 consists of four accession numbers, Q9QXE7 is the primary accession number and Q8BMM0 is the secondary accession number of the same protein Transducin ß-like 1X protein. This protein has one identified peptide HQEPVYSVAFSPDGK distinct from another protein, which also has two accession numbers, within the cluster. Hence this cluster may contain two proteins. Similarly, cluster 2 contains various actin isoforms sharing some common identified peptides within two or more isoforms. In the protein database, proteins with similar sequences, protein fragments, mutations, and protein isoforms in the same or different cell types all account for some identified peptides not unique to a single protein with a distinct accession number. The number of protein clusters identified in this study is initial; however, this represents the minimum number of proteins that could be identified. The issue described here is even more difficult to be tackled in the quantitative proteome study, further statistical analysis is necessary to achieve accurate quantitation of a single protein.
|
|
46% of the expressed proteome was identified in this study. This approximation is obviously high because the calculation does not take into account the presence of modified (both post-transcriptionally and post-translationally) proteins. This calculation illustrates, however, that technology has advanced to the point of providing significant coverage of a proteomes protein constituents. Of importance in global proteomic analysis is the ability to identify proteins from every compartment within the cell. This need is particularly focused on membrane proteins because of solubility issues with this class of proteins. In this study, almost 29% of the identified proteins were classified as membranous by gene ontology. This compares favorably with genomic analysis that predicts 2030% of the genome encodes for membrane proteins. In addition, a study of mouse brain homogenates that employed an enzymatic digestion strategy for the identification of membrane proteins identified about 28% of the proteins in their mixture as membranous (12). Obviously the technology used in this study is not biased against membrane proteins and provides very good coverage of proteins from all cellular compartments.
Another indicator of the bias of the survey presented in this study is the number of peptides identified per protein from particular classes. The overall distribution of the number of peptides identified per protein is shown in Fig. 8A. Approximately 61% of the proteins were identified by two or more peptides. A comparison of the number of peptides identified per intracellular and membrane protein is shown in Fig. 8, B and C. Approximately 65% of the intracellular proteins were identified by two or more peptides, while 57% of the membrane proteins were identified by two or more peptides. Overall, these plots do not show a striking bias against the identification of membrane proteins.
|
|
A significant benefit gained from performing a global proteomic analysis described in this study is the opportunity to identify proteins not previously associated with a particular cell type or biological process. One example that illustrates this point is the identification of the pescadillo protein in the neuronal proteome (Supplemental Table I). Pescadillo is a unique nucleolar protein involved in ribosome biogenesis and cell cycle control. It was recently identified as a protein expressed at abnormally high levels in malignant human brain tumors (13). The protein has not previously been characterized in neurons. To validate the proteomics result that was obtained using cultured postnatal cortical neurons, pescadillo expression was evaluated in adult mouse brain by immunostaining. Validating the proteomics finding, intense immunoreactivity was detected in neurons in all brain regions examined, while glial cells showed only weak immunoreactivity. Analysis of the hippocampus was particularly instructive as it contains a defined layer of pyramidal neurons surrounded by areas where neurons are very scarce, and glial cells account for the majority of cells. As seen in Fig. 10, strong nuclear immunoreactivity was predominantly localized to neurons in the CA1 pyramidal cell layer and to displaced pyramidal neurons and/or interneurons (arrows). Glial cells outside the CA1 pyramidal cell layer (arrowheads) and in the corpus callosum, a fiber tract lacking neurons, were weakly immunostained. Thus, it appears that pescadillo is highly expressed in neurons consistent with its identification in the neuronal proteome. Because it is not clear what role a cell cycle regulator would play in post-mitotic neurons, this finding provides fertile new ground for future study.
|
| DISCUSSION |
|---|
|
|
|---|
7,800 proteins and 350,000 peptides would be required for a completely comprehensive characterization. With the paucity of coverage of most proteins in a proteome obtainable using current MS-based technologies, the value of such studies as presented here needs to be considered. As shown in this study, pescadillo was identified by only one peptide (Supplemental Fig. 1 for MS/MS of this peptide); however, its presence in mouse neurons was validated using immunostaining. Pescadillo was identified in zebrafish less than a decade ago (20), and has only recently been shown to play a role in cell cycle progression (13). Prior to this study, pescadillo had not been shown to be present within brain tissue. Proteomic studies such as this can serve as an invaluable resource to researchers who have traditionally focused on a single (or handful of) protein(s) with the goal of in-depth characterization. An excellent example is vitamin D. The dietary need for vitamin D to prevent rickets was first shown in the 1920s (21), and its function in bone mineralization and calcium mobilization soon after (22). It was not until the 1980s, however, that evidence for a function for vitamin D within brain tissues was established (23). This discovery was made through autoradiography and sophisticated purification of the vitamin D receptor followed by the design on an activity assay. With the advent of proteomic studies of this type, investigators now have the opportunity to peruse databases to search for the tissue-specific distribution of their protein(s) of interest. Downstream validation of the identified peptides, however, will continue to be a critical issue in proteomics. Proteomics is protein biochemistry done on a grand scale. Studies such as that presented here provide a "30,000-foot view" of what proteins are present within a particular cell type. The result is a foundational database that can be compared with those generated from other cell types or organisms to decipher which proteins function to confer the specific attributes or properties of any particular cell. Because proteomics is a biological science, important results collected from such studies require validation. This validation needs to be considered on an individual protein basis and what its ultimate value will be to the understanding of the cells character or function. So in a sense, global proteomic studies serve to filter the list of proteins that may be expressed by the organisms genome to those that actually are expressed by the genome.
| FOOTNOTES |
|---|
Published, MCP Papers in Press, June 30, 2004, DOI 10.1074/mcp.M400034-MCP200
1 The abbreviations used are: 2D, two dimensional; RP, reversed-phase; SCX, strong cation exchange; µLC, microcapillary LC; IT, ion trap; PPT, protein-specific peptide tag. ![]()
* By acceptance of this article, the publisher or recipient acknowledges the right of the United States Government to retain a nonexclusive, royalty-free license and to any copyright covering the article. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the United States Government. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under Contract NO1-CO-12400 and by grants from the National Institutes of Health NS35533 and NS42699 (to R S M.). ![]()
S The on-line version of this manuscript (available at http://www.mcponline.org) contains supplemental material. ![]()
¶ To whom correspondence should be addressed: SAIC-Frederick, Inc., National Cancer Institute at Frederick, P.O. Box B, Building 469, Room 160, Frederick, MD 21702. Tel.: 301-846-7286; Fax: 301-846-6037; E-mail: veenstra{at}ncifcrf.gov.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I. Kadiu, M. Ricardo-Dukelow, P. Ciborowski, and H. E. Gendelman Cytoskeletal Protein Transformation in HIV-1-Infected Macrophage Giant Cells J. Immunol., May 15, 2007; 178(10): 6404 - 6415. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kinoshita, T. Uo, S. Jayadev, G. A. Garden, T. P. Conrads, T. D. Veenstra, and R. S. Morrison Potential Applications and Limitations of Proteomics in the Study of Neurological Disease Arch Neurol, December 1, 2006; 63(12): 1692 - 1696. [Full Text] [PDF] |
||||
![]() |
E. Chertova, O. Chertov, L. V. Coren, J. D. Roser, C. M. Trubey, J. W. Bess Jr., R. C. Sowder II, E. Barsov, B. L. Hood, R. J. Fisher, et al. Proteomic and biochemical analysis of purified human immunodeficiency virus type 1 produced from infected monocyte-derived macrophages. J. Virol., September 1, 2006; 80(18): 9039 - 9052. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Lukas, W. W. Luo, H. Mao, N. Cole, and T. Siddique Informatics-assisted Protein Profiling in a Transgenic Mouse Model of Amyotrophic Lateral Sclerosis Mol. Cell. Proteomics, July 1, 2006; 5(7): 1233 - 1244. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Schindler, U. Lewandrowski, A. Sickmann, E. Friauf, and H. Gerd Nothwang Proteomic Analysis of Brain Plasma Membranes Isolated by Affinity Two-phase Partitioning Mol. Cell. Proteomics, February 1, 2006; 5(2): 390 - 400. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. L. Hood, M. M. Darfler, T. G. Guiel, B. Furusato, D. A. Lucas, B. R. Ringeisen, I. A. Sesterhenn, T. P. Conrads, T. D. Veenstra, and D. B. Krizman Proteomic Analysis of Formalin-fixed Prostate Cancer Tissue Mol. Cell. Proteomics, November 1, 2005; 4(11): 1741 - 1753. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Lowenthal, A. I. Mehta, K. Frogale, R. W. Bandle, R. P. Araujo, B. L. Hood, T. D. Veenstra, T. P. Conrads, P. Goldsmith, D. Fishman, et al. Analysis of Albumin-Associated Peptides and Proteins from Ovarian Cancer Patients Clin. Chem., October 1, 2005; 51(10): 1933 - 1945. [Abstract] [Full Text] [PDF] |
||||