|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 1:947-955, 2002.
© 2002 by The American Society for Biochemistry and Molecular Biology, Inc.




,¶

,||
Biological Sciences Department, Pacific Northwest National Laboratory, Richland, Washington 99352
Environmental and Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352
| ABSTRACT |
|---|
|
|
|---|
Historically, two-dimensional PAGE has been the primary method of separation and comparison for complex protein mixtures. This method has been critical in developing our understanding of the complexity and variety of proteins contained in cells and bodily fluids. Two-dimensional PAGE has been used to analyze serum and plasma (the unclotted parent fluid of serum) (613). Although impressive improvements in two-dimensional PAGE technologies have occurred in recent years, limitations remain. Two-dimensional PAGE is labor-intensive, requires relatively large sample quantities, is poorly reproducible, has a limited dynamic range for protein detection, and has difficulties in detecting proteins with extremes in molecular mass and isoelectric point (14). To address these limitations several types of mass spectrometry, in conjunction with various separation and analysis methods, are increasingly being adopted for proteomic measurements (1522).
One of the driving forces in proteomics is the discovery of biomarkers, proteins that change in concentration or state in associations with a specific biological process or disease. Determination of concentration changes, relative or absolute, is fundamental to the discovery of valid biomarkers. The presence of higher abundance proteins (greater than mg/ml in serum) interferes with the identification and quantification of lower abundance proteins (lower than ng/ml in serum). Other methods such as two-dimensional PAGE have been used to demonstrate that the removal or separation of high abundance proteins enables greatly improved detection of lower abundance proteins (10, 11, 17, 23). The necessity of this removal or separation is also illustrated by noting that many proteins found useful as biomarkers for malignant and non-malignant disease (e.g. C-reactive protein, osteopontin, and prostate-specific antigen) are below 10 ng/ml, a value that is at least 78 orders of magnitude less than the most abundant serum proteins (1). Thus, the dynamic range typified by traditional proteomic methods are inadequate to allow for detection of these lower abundance serum proteins, or biomarkers, without effective removal or separation of the high abundance proteins.
One problem associated with any protein separation technique is that low abundance proteins may be removed along with the abundant species (24). Albumin is a protein of very high abundance in serum (3550 mg/ml) that would be a prime candidate for complete selective removal prior to performing a proteomic analysis of lower abundance proteins. However, albumin is a transport protein in blood serum that binds a large variety of compounds including hormones, lipoproteins, and amino acids (1, 25, 26). Thus, removal of albumin from serum may also result in the specific removal of low abundance cytokines, peptide hormones, and lipoproteins of interest.
Immunoglobulins, or antibodies, are also abundant proteins in serum that function by recognizing "foreign" antigens in blood and initiating their destruction. To recognize this enormous variety of antigens present in blood, immunoglobulins contain variable regions (1, 25, 27). These variable regions are a source of random peptide sequence in serum that can complicate protein identifications from peptide sequences. Therefore, with immunoglobulins binding foreign materials and the random nature of sequences from their variable regions, removal of immunoglobulins is important for a proteomic analysis of serum.
The purpose of this investigation was to establish new preparative methods to remove or separate high abundance serum proteins and to apply new proteomic approaches that increase the dynamic range available for the identification and characterization of serum proteins. These methods include the use of protein A/G covalently bound to acrylamide beads to selectively remove immunoglobulins, described earlier as a significant source of sequence variability found in serum. Further, these methods include the separation of trypsin-digested peptides prior to mass spectrometric analysis using both strong cation exchange (SCX)1 chromatography and capillary gradient reversed-phase liquid chromatography. This investigation identifies a large number of proteins (490) from a single (submilliliter) serum sample and further provides the foundation for future studies with clinically important disease states.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Depletion of Serum Immunoglobulins and Trypsin Digestion
The immunoglobulins (Igs) were depleted by affinity adsorption chromatography using protein A/G. 500 µl of serum was diluted with an equal amount of 20 mM sodium phosphate, pH 8.0 and added to UltraLink Immobilized protein A/G beads (2:1, v/v) (Pierce) that had been equilibrated with 20 mM sodium phosphate, pH 8.0. This mixture was incubated with gentle rocking for 2 h at 25 °C. Immunoglobulin-depleted serum was separated from the protein A/G beads by centrifugation. The beads were washed three times with 5 volumes of PBS (150 mM NaCl, 10 mM sodium phosphate, pH 7.3), and the washes were pooled with the immunoglobulin-depleted serum. The diluted immunoglobulin-depleted serum sample was then dialyzed into 10 mM HCO3NH4, 5% acetonitrile, pH 7.5, digested with trypsin 1:50 (w/w) ratio (Promega, Madison, WI) for 2 h at 37 °C, and lyophilized.
Strong Cation Exchange Separation of Immunoglobulin-depleted Serum Peptides
Lyophilized, immunoglobulin-depleted serum peptides were resuspended in 2 ml of 75% 10 mM ammonium formate, 25% acetonitrile, pH 3.0 with formic acid. The sample was centrifuged to remove insoluble debris and then separated using an LC gradient ion exchange system consisting of a quaternary gradient pump (ThermoSeparations P4000, San Jose, CA) equipped with a polysulfoethyl A column (5 µm, 300 Å, PolyLC, Columbia, MD). Mobile phase A consisted of 75% 10 mM ammonium formate, 25% acetonitrile, pH 3.0 with formic acid, and mobile phase B was 75% 200 mM ammonium formate, 25% acetonitrile, pH 8.0. The column was initially loaded (2-ml injection loop) and equilibrated for 5 min with 0% B. Peptides were eluted using a linear gradient of 0100% B over 30 min, and the column was subsequently washed at 100% B for an additional 25 min all at a flow rate of 4 ml/min. The column effluent was monitored at 280 nm with a Linear 200 UV detector (Micro-Tech Scientific, Sunnyvale, CA), and a total of 120 fractions were collected at 30-s intervals using a FRAC-100 (Amersham Biosciences). Collected fractions were lyophilized and stored at -80 °C for reversed-phase LC/MS/MS analysis.
Reversed-phase Separation and LCQ Ion Trap Analysis
Reversed-phase separation was performed with an Agilent 1100 capillary high pressure liquid chromatography system with a 60-cm capillary column (150-µm inner diameter x 360-µm outer diameter, Polymicro Technologies, Phoenix, AZ) packed with 5-µm Jupiter C18 particles (Phenomenex, Torrance, CA). Mobile phase A consisted of water and 0.1% formic acid, and mobile phase B consisted of acetonitrile and 0.1% formic acid. SCX fractions were dissolved in 50 µl of water, 0.1% formic acid. Peptides were injected on the column in 8 µl at a flow rate of 1.8 µl/min, and the column was re-equilibrated with 5% B for 20 min. Peptides were eluted with a linear gradient from 5 to 70% B over 80 min. The capillary column was interfaced to an LCQ Deca XP ion trap mass spectrometer (ThermoFinnigan, San Jose, CA) using electrospray ionization.
The mass spectrometer was configured to optimize the duty cycle length with the quality of data acquired by alternating between a single full MS scan followed by three MS/MS scans on the three most intense precursor masses (as determined by Xcaliber mass spectrometer software in real time) from the single parent full scan. Dynamic mass exclusion windows were used and varied from 3 to 9 min. In addition, MS spectra for all samples were measured with an overall mass/charge (m/z) range of 4002000. Fractions 21, 34, 39, 46, and 53, which contained high peptide concentrations, were re-analyzed three times using overlapping m/z ranges of 5001050, 10001550, and 15002000, respectively. These segmented mass range analyses also utilized static mass exclusion lists that removed m/z precursors corresponding to the 20 most abundant peptides that were observed in the initial unsegmented analysis.
SEQUEST Analysis of Peptides
Tandem mass spectra were analyzed by SEQUEST (Bioworks 2.0, ThermoFinnigan) (16, 2932), which performs its analyses by cross-correlating experimentally acquired mass spectra with theoretical idealized mass spectra generated from a database of protein sequences. These idealized spectra are weighted largely with b and y fragment ions, i.e. fragments resulting from the amide linkage bond from the N and C termini, respectively. For these analyses, no enzyme rule restrictions were applied to the possible cleavage points available for peptide generation from the initial proteins, allowing identifications resulting from non-tryptic cleavage to be observed as well. The peptide mass tolerance was 3.0, and the fragment ion tolerance was 0.0.
Protein Databases
SEQUEST analysis was performed using a modified version of the human FASTA protein database provided with SEQUEST (ThermoFinnigan). Database modifications included the removal of viral proteins and the removal of some redundant protein entries as well as minimizing the number of entries for abundant serum proteins (13). Additional analyses were conducted using the National Center for Biotechnology Information (NCBI) human protein database2 and the Unigene human database3 to determine whether important abundant serum proteins were missing from our modified database. Use of the additional various human databases did not alter the vast majority of SEQUEST peptide identifications. The use of the larger databases did result in an expected decrease in magnitude of the SEQUEST DelCN score in a fraction of peptide identifications. Most peptides not found in the smaller supplied database did not pass subsequent filters including visual inspection of fragmentation spectra (data not shown), and in the case of the Unigene database analysis required up to 2 weeks to finish on a modern PC. Currently no complete human protein database has been compiled, and one is not likely to exist for a number of years (35). Thus, the modified database was considered to be an adequate resource for this initial blood serum proteome analysis after comparisons to the NCBI and Unigene databases.2,3
Of concern with a shotgun proteomic approach is whether assumptions made for simple cases continue to apply with higher levels of complexity. To address the question for database choice, we sought to analyze LC/MS/MS results using a smaller database containing very few peptides with sequence identity to human proteins but still retaining the level of complexity observed in a complete genome. A locally available Deinococcus radiodurans FASTA database derived from the open reading frames of a completely sequenced genome (15) was used to generate SEQUEST analyses to compare against the human database-derived results. Five SCX fractions (fractions 21, 34, 39, 46, and 53) that contained the greatest number of fully tryptic peptides were analyzed against the D. radiodurans database for this comparison.
Filters for SEQUEST Results
SEQUEST results were filtered (Table I) with criteria similar to those developed by Yates and co-workers (31, 36). Serum proteins in circulation are frequently found cleaved by chymotrypsin and elastase (37). Thus, while trypsin was used to digest the serum proteins, the SEQUEST data filter was modified to allow for identification of peptides resulting from both chymotrypsin and elastase cleavage sites. The chymotrypsin and elastase filter levels were derived by comparing the SEQUEST-identified tryptic peptides to the identified non-tryptic albumin peptides. The high abundance and globular nature of albumin represented a useful reference for defining non-tryptic filter parameters. The resulting filters were those that resulted in four or more hits for any non-tryptic albumin peptide. These filters further resulted in 33 non-tryptic cleavage sites of the 133 total albumin cleavage sites.
|
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
To qualitatively evaluate the global results from a SEQUEST analysis, we compared the human peptides analyzed by MS/MS and m/z segmentation using SEQUEST with two different databases. The databases compared with SEQUEST analysis were an unrelated bacterial database (D. radiodurans) and a human protein database. The plot of DelCN versus Xcorr from a SEQUEST analysis with the D. radiodurans database generally defines a region of data that is composed of low confidence peptide identifications (Fig. 4A). A similar plot for a SEQUEST analysis using a human database identifies a second population of peptides with higher quality peptide identifications (Fig. 4B). The overlap between the poor quality and high quality populations contains many real peptide identifications. After filtering (see Table I), the SEQUEST analysis of peptides using the D. radiodurans database eliminates all but 1% or 76 of the original low confidence peptide identifications (Fig. 4C). In contrast, after filtering (Table I), 20% or 2179 of the peptides from the human database remain (Fig. 4D). The filtering method results in more qualitative confidence for the peptide identifications using the human protein database at a global scale. While it is expected that most of the peptides identified from the D. radiodurans database that passed the data filters do not appear as proteins serum, some of these peptides may, by chance or evolutionary conservation, be legitimately found using the D. radiodurans database.
|
| DISCUSSION |
|---|
|
|
|---|
Previous proteomic characterizations of human plasma have used two-dimensional PAGE. These studies such as the seminal work of Anderson and co-workers (10, 41) have been summarized by the ExPASy on-line human plasma two-dimensional PAGE database (ca.expasy.org/ch2d/). These previous investigations have focused on plasma and thus are not directly comparable to the serum results reported here. However, of the 58 named proteins identified in this on-line human plasma protein database, we identified 51 in our serum analysis. There are several possible explanations for not identifying these seven proteins, including fibrinogen B, fibrinogen
, C-reactive protein, and actin. First, plasma but not serum samples contain the clotting factors fibrinogen B and fibrinogen
. Second, our serum was obtained from a single healthy female. The concentration of certain blood proteins may make detection difficult for our single source sample versus a general population; an example is C-reactive protein, which is typically at subnanogram per milliliter concentrations in a healthy female (1). Finally, the sample preparation and analytical methods used by these previous investigators differ significantly from those reported here. The lack of detection for the other proteins, such as actin, may be due to differing methods of sample collection, processing, and analysis. Overall our approach is superior for global identification since the two-dimensional PAGE database is made up of nine published reports but identified only 58 proteins, while we found the 490 proteins, including those that would be expected to be common between studies.
Another family of serum/plasma studies for comparison is the characterization of rat serum by Gianazza and co-workers (68, 11, 12). These studies identified 34 proteins with human homologues and characterized the changes in protein abundance with disease states or chemical exposures associated with inflammatory disease. These rat serum studies concluded that even abundant proteins could be markers for disease states. Our study identified the human homologues of 31 of the 34 identified rat proteins. We did not find the human equivalent of thyroxine-binding globulin, thiostatin, or C-reactive protein. Many of the same reasons for a lack of complete overlap with the ExPASy plasma two-dimensional PAGE database apply here. In addition, species-specific differences may explain differing proteins and expression levels.
Serum is a complex biological fluid with many functions, and the presence, absence, or concentration of a specific protein may be non-intuitive until the serum proteome is fully understood. In an analysis of this complexity, it is important to note that expectations often differ from results for many proteins. Examples of unexpected results are hemoglobin and actin, which are both ubiquitous in the red blood cells. Therefore between high quantity and rapid turnover of red blood cells it may be expected that hemoglobin and actin should be readily detectable in serum (42). In contrast to our expectations, few hemoglobin-derived peptides and no actin-derived peptides were identified. In fact, both hemoglobin and actin are actively sequestered and cleared from the serum via the abundant serum proteins haptoglobin and vitamin D-binding protein, respectively (4245). Another example of unexpected results are the identification of immunoglobulin-derived peptides, although depletion was complete when evaluated by SDS-PAGE. It is unclear whether these peptides originated from incomplete depletion of immunoglobulins in vitro or from proteolyzed immunoglobulins circulating in blood.
As global proteomic approaches become more common, there is an increasing need to evaluate and visualize large data sets with improvements in individual scoring methods (4648). Often proteomic studies are less concerned with individual peptide identifications than with globally studying changes. In fact, a recent study using a global approach to profile proteins only by masses using surface-enhanced laser desorption-ionization MS with blood serum has been shown to have predictive value in ovarian cancer (33). One of the difficulties related to the use of SEQUEST for peptide identifications is the lack of methods to globally evaluate the quality of data and the lack of methods to access global changes created by filtering schemes and/or database changes. Here, by comparing our SEQUEST results to multiple databases, we have illustrated an intuitive and easily adopted method for analyzing LC-MS/MS experiments in global terms (Fig. 4).
Major technical issues complicate the routine characterization of the plasma/serum proteome. First, plasma/serum proteins, like tissue proteins, may be post-translationally modified, and many plasma proteins are glycosylated (13). Other important factors include modifications such as sulfation, phosphorylation, oxidation, glycation, lipidation, and
-carboxyglutamylation. Currently there are no commercially available tools that can identify peptides with this variety and number of modifications. The serum proteins in this study (Table III) were identified from translationally unmodified peptides. Significant improvements to sample processing and informatics are needed to identify these protein modifications. Second, protease digestion further adds to the complexity of a proteomic analysis of serum (13). Here we filtered peptide identifications based on protease modifications to take in situ proteolysis (chymotrypsin and elastase) into account. Third, the concentration range of plasma/serum proteins encompasses at least 9 orders of magnitude. Thus, significant improvements in the sample processing and separation with improvement in the dynamic range, sensitivity, and ability to quantitate results from mass spectrometry are needed to elaborate the plasma/serum proteome beyond the 490 proteins identified in this report. Last, the immature status of human protein databases further complicates analysis because there are likely to be protein identifications even in this mid-abundance range that have not yet been added to any publicly available human protein database (35).
The Human Proteome Organization (HUPO) has been founded to consolidate and organize future efforts in human proteomics (34). Among the many of the stated goals of HUPO are the research goals of characterizing the human plasma/serum proteomes and the informatic goals of standardizing proteome data and annotations with the improvement of bioinformatic tools for proteome analysis (34). Here we report a large improvement for proteomic analysis of serum; this analysis identifies 490 proteins, about 10% toward a 5000 protein goal of HUPO. Further, we have presented a visualization method that can be used to evaluate the quality of a global SEQUEST proteomic analysis along with the ability to subjectively evaluate protein database quality for a SEQUEST analysis.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, November 15, 2002, DOI 10.1074/mcp.M200066-MCP200
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental data: peptide identifications. ![]()
1 The abbreviations used are: SCX, strong cation exchange; HUPO, Human Proteome Organization; LC, liquid chromatography; MS, mass spectrometry; MS/MS, tandem mass spectrometry; NCBI, National Center for Biotechnology Information. ![]()
2 NCBI, Hs GenBankTM Protein Databases ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/protein/. ![]()
3 NCBI, Hs Unigene Contig Databases ftp.ncbi.nlm.nih.gov/repository/UniGene/. ![]()
* This work was supported by the Biotechnology section of Core Technology, Battelle Memorial Institute. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
¶ Current address: Human Genome Sciences, 9410 Key West Ave., Rockville, MD 20850. ![]()
|| To whom correspondence should be addressed: Biological Sciences Dept., Pacific Northwest National Laboratory, P.O. Box 999, MSIN: P7-58, Richland, WA 99352. Tel.: 509-376-1015; Fax: 509-376-9449; E-mail: joel.pounds{at}pnl.gov
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Z. Cao, C. Li, J. N. Higginbotham, J. L. Franklin, D. L. Tabb, R. Graves-Deal, S. Hill, K. Cheek, W. G. Jerome, L. A. Lapierre, et al. Use of Fluorescence-activated Vesicle Sorting for Isolation of Naked2-associated, Basolaterally Targeted Exocytic Vesicles for Proteomics Analysis Mol. Cell. Proteomics, September 1, 2008; 7(9): 1651 - 1667. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. C. Garbett, J. J. Miller, A. B. Jenson, and J. B. Chaires Calorimetry Outside the Box: A New Window into the Plasma Proteome Biophys. J., February 15, 2008; 94(4): 1377 - 1383. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Keshishian, T. Addona, M. Burgess, E. Kuhn, and S. A. Carr Quantitative, Multiplexed Assays for Low Abundance Proteins in Plasma by Targeted Mass Spectrometry and Stable Isotope Dilution Mol. Cell. Proteomics, December 1, 2007; 6(12): 2212 - 2229. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. C. Garbett, J. J. Miller, A. B. Jenson, D. M. Miller, and J. B. Chaires Interrogation of the plasma proteome with differential scanning calorimetry. Clin. Chem., November 1, 2007; 53(11): 2012 - 2014. [Full Text] [PDF] |
||||
![]() |
M. Hammel, G. Sfyroera, S. Pyrpassopoulos, D. Ricklin, K. X. Ramyar, M. Pop, Z. Jin, J. D. Lambris, and B. V. Geisbrecht Characterization of Ehp, a Secreted Complement Inhibitory Protein from Staphylococcus aureus J. Biol. Chem., October 12, 2007; 282(41): 30051 - 30061. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Lopez, A. Mikulskis, S. Kuzdzal, E. Golenko, E. F. Petricoin III, L. A. Liotta, W. F. Patton, G. R. Whiteley, K. Rosenblatt, P. Gurnani, et al. A Novel, High-Throughput Workflow for Discovery and Identification of Serum Carrier Protein-Bound Peptide Biomarker Candidates in Ovarian Cancer Samples Clin. Chem., June 1, 2007; 53(6): 1067 - 1074. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Lapierre, K. M. Avant, C. M. Caldwell, A.-J. L. Ham, S. Hill, J. A. Williams, A. J. Smolka, and J. R. Goldenring Characterization of immunoisolated human gastric parietal cells tubulovesicles: identification of regulators of apical recycling Am J Physiol Gastrointest Liver Physiol, May 1, 2007; 292(5): G1249 - G1262. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Bix, R. A. Iozzo, B. Woodall, M. Burrows, A. McQuillan, S. Campbell, G. B. Fields, and R. V. Iozzo Endorepellin, the C-terminal angiostatic module of perlecan, enhances collagen-platelet responses via the {alpha}2{beta}1-integrin receptor Blood, May 1, 2007; 109(9): 3745 - 3748. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dutta and T. Chen Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search Bioinformatics, March 1, 2007; 23(5): 612 - 618. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Liu, W.-J. Qian, M. A. Gritsenko, W. Xiao, L. L. Moldawer, A. Kaushal, M. E. Monroe, S. M. Varnum, R. J. Moore, S. O. Purvine, et al. High Dynamic Range Characterization of the Trauma Patient Plasma Proteome Mol. Cell. Proteomics, October 1, 2006; 5(10): 1899 - 1913. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Jaffe, D. R. Mani, K. C. Leptos, G. M. Church, M. A. Gillette, and S. A. Carr PEPPeR, a Platform for Experimental Proteomic Pattern Recognition Mol. Cell. Proteomics, October 1, 2006; 5(10): 1927 - 1941. [Abstract] [Full Text] |