High Dynamic Range Characterization of the Trauma Patient Plasma Proteome*S

Although human plasma represents an attractive sample for disease biomarker discovery, the extreme complexity and large dynamic range in protein concentrations present significant challenges for characterization, candidate biomarker discovery, and validation. Herein we describe a strategy that combines immunoaffinity subtraction and subsequent chemical fractionation based on cysteinyl peptide and N-glycopeptide captures with two-dimensional LC-MS/MS to increase the dynamic range of analysis for plasma. Application of this “divide-and-conquer” strategy to trauma patient plasma significantly improved the overall dynamic range of detection and resulted in confident identification of 22,267 unique peptides from four different peptide populations (cysteinyl peptides, non-cysteinyl peptides, N-glycopeptides, and non-glycopeptides) that covered 3654 different proteins with 1494 proteins identified by multiple peptides. Numerous low abundance proteins were identified, exemplified by 78 “classic” cytokines and cytokine receptors and by 136 human cell differentiation molecules. Additionally a total of 2910 different N-glycopeptides that correspond to 662 N-glycoproteins and 1553 N-glycosylation sites were identified. A panel of the proteins identified in this study is known to be involved in inflammation and immune responses. This study established an extensive reference protein database for trauma patients that provides a foundation for future high throughput quantitative plasma proteomic studies designed to elucidate the mechanisms that underlie systemic inflammatory responses.

The search for disease-specific biomarkers from various human biofluids (e.g. plasma/serum (1)(2)(3), cerebrospinal fluid (4), bronchoalveolar lavage fluid (5), synovial fluid (6), nipple aspirate fluid (7), saliva (8), and urine (9)) is gaining increasing attention due to significant advances in genomic and proteomic technologies and their potential for discovering novel disease biomarkers. An advantage of using human plasma for biomarker discovery is that plasma samples are readily obtained from patients. More importantly, because almost all cells in the body communicate with blood either directly or indirectly and tissue-specific proteins are released into the bloodstream upon cell damage and cell death, the onset or presence of most disease states can potentially be determined by measuring either the altered presence or the differential abundances of proteins in the blood plasma.
Although progress has been made toward quantitatively studying the plasma proteome (10 -13), significant effort still remains focused on broadly characterizing the plasma protein constituents. These constituents include a handful of major proteins such as albumin that dominate the plasma protein content as well as other tissue leakage proteins that are generally present only at extremely low levels. As a result, robust and specific detection of the broad spectrum of proteins that span such a wide dynamic range in concentration challenges current analytical technologies. Various separation approaches (e.g. prefractionation of intact plasma proteins (14 -17), utilization of solution isoelectric focusing (18), and ultrahigh efficiency separations (19,20) at the peptide level and multidimensional fractionation on both the protein and peptide levels (21)) have been applied to address this challenge. Efforts have also been made to characterize subsets of the plasma proteome (e.g. the low molecular weight subproteome (22) and the N-glycoproteome (23)). Additionally the removal of albumin, immunoglobulins, and a group of high abundance proteins in plasma/serum has been demonstrated to be very effective for attaining more comprehensive proteome coverage (20,(23)(24)(25)(26). Recently developed multicomponent immunoaffinity subtraction systems, based on mam-concern of potentially losing a subset of nontargeted proteins with these methods by way of protein-protein interactions (e.g. the "sponge effects" of albumin (30)) can be addressed by studying the "interactomes" (29).
Herein we report on an alternative strategy for profiling the human plasma proteome that combines immunoaffinity subtraction and subsequent chemical fractionation via solidphase capture of cysteinyl peptides and N-glycopeptides with two-dimensional (2D) 1 LC-MS/MS to extend the dynamic range of analysis and increase proteome coverage. The advantages afforded by dividing a complex proteome into several subproteomes include 1) reduced complexity of the subproteome samples (particularly enriched cysteinyl peptide and N-glycopeptide samples) that allows more low abundance proteins to be identified, 2) the complementary nature of different subproteome fractions that significantly improves overall proteome coverage, and 3) the relative simplicity, efficiency, and reproducibility afforded by these fractionation methods that make them amenable to automation. Application of this strategy to characterize blood plasma obtained from human trauma patients resulted in broad proteome coverage that included many low abundance proteins (e.g. cytokines) and proteins involved in both inflammation, a hallmark of many human diseases (31,32), and the immune response. Previous inflammation studies have relied heavily on gene expression analysis of blood leukocyte populations obtained from trauma patients (33,34). An extensive reference plasma proteome database for trauma has been established from this study, providing the basis for subsequent comparative quantitative proteomic studies aimed at revealing the underlying mechanisms of inflammatory response in trauma patients.

EXPERIMENTAL PROCEDURES
Immunoaffinity Subtraction-The human blood plasma samples were supplied by the Department of Surgery at the University of Florida College of Medicine, which serves as the Sample Collection and Coordination Site for a multicentered clinical study (Inflammation and the Host Response to Injury). Approval for the conduct of this programmatic research was obtained from the Institutional Review Boards of the University of Florida College of Medicine and the Pacific Northwest National Laboratory in accordance with federal regulations. Plasma samples were pooled from six severe trauma patients and one healthy subject, and the protein concentration was determined by Coomassie protein assay (Pierce) to be 60 mg/ml. Twelve plasma proteins (albumin, IgG, ␣ 1 -antitrypsin, IgA, IgM, transferrin, haptoglobin, ␣ 1 -acid glycoprotein, ␣ 2 -macroglobulin, apolipoprotein A-I, apolipoprotein A-II, and fibrinogen) that constitute up to 96% of the total protein mass of human plasma were removed in a single step by using the prepacked 2-ml Seppro TM MIXED12 affinity LC column (GenWay Biotech, San Diego, CA) on an Agilent 1100 series HPLC system (Agilent, Palo Alto, CA) according to the manufacturer's instruction. A total of 2625 l (25 l ϫ 105 injections) of plasma was subjected to MIXED12 depletion. The flow-through fractions were pooled and concentrated under reduced pressure to one-tenth of the original volume and then desalted by using a prepacked PD-10 column (Amersham Biosciences) equilibrated with 50 mM NH 4 HCO 3 . The desalted sample was further concentrated, and the total protein amount was determined by Coomassie protein assay to be 8.85 mg.
Plasma Protein Digestion-One-third of the proteins from the depleted plasma sample were directly subjected to tryptic digestion prior to cysteinyl peptide enrichment (CPE). The proteins were denatured and reduced in 50 mM Tris buffer (pH 8.2), 8 M urea, 10 mM DTT for 1 h at 37°C. The resulting protein mixture was diluted 10-fold with 20 mM Tris buffer (pH 8.2), and then sequencing grade modified porcine trypsin (Promega, Madison, WI) was added at a trypsin: protein ratio of 1:50 (w/w). The sample was incubated overnight at 37°C. The following day, the tryptic digest sample was loaded on a 1-ml solid phase extraction (SPE) C 18 column (Supelco, Bellefonte, PA) and washed with 4 ml of 0.1% TFA, 5% ACN. Peptides were eluted from the SPE column with 1 ml of 0.1% TFA, 80% ACN and lyophilized. Peptide samples were stored at Ϫ80°C until CPE treatment.
Fractionation via Capture of Cysteinyl Peptides-Unless otherwise noted, all solutions used in this step were degassed to prevent oxidation of the thiol content. The tryptic digest was reduced with 5 mM DTT in 80 l of 50 mM Tris buffer (pH 7.5), 1 mM EDTA (coupling buffer) for 30 min at 37°C after which the samples were diluted to 400 l by adding coupling buffer and then split into four 100-l aliquots. Thiopropyl-Sepharose 6B thiol affinity resin (4 ϫ 100 l; Amersham Biosciences) was prepared from dried powder according to the manufacturer's instruction. Briefly the dried powder was rehydrated in water for 15 min and washed by 50 bed volumes of water followed by 50 bed volumes of coupling buffer in a Handee Mini-Spin column (Pierce). The reduced peptide sample was then incubated with the resin for 1 h at room temperature with gentle mixing, and the unbound portion (non-cysteinyl peptides) was collected by spinning the column at low speed. The resin was washed 6 times in the spin column sequentially with each of the following solutions: 0.5 ml of 50 mM Tris buffer (pH 8.0), 1 mM EDTA (washing buffer); 2 M NaCl; 80% ACN, 0.1% TFA solution; and washing buffer. To release the captured cysteinyl peptides, 100 l of 20 mM DTT freshly prepared in washing buffer was added to the resin and incubated for 30 min at room temperature. The resin was further washed with 100 l of 80% ACN. The pH of the combined eluates was adjusted to 8.0, and the sample was alkylated with 80 mM iodoacetamide for 30 min at room temperature and in the dark. Both the eluted cysteinyl peptide samples and the unbound non-cysteinyl peptide samples were desalted by using an SPE C 18 column and then lyophilized.
Fractionation via Capture of N-Linked Glycopeptides-Hydrazide resin (Bio-Rad) and a method similar to that reported previously (23,35) were used for capturing glycoproteins. Two-thirds of the concentrated MIXED12 flow-through fraction was diluted 10-fold in coupling buffer (100 mM sodium acetate and 150 mM NaCl, pH 5.5) and oxidized in 15 mM sodium periodate at room temperature for 1 h in the dark with constant shaking. The sodium periodate was subsequently removed by using a prepacked PD-10 column equilibrated with coupling buffer. Four 1-ml portions of hydrazide resin were each washed five times with coupling buffer; the oxidized protein sample was split into four equal aliquots, and then each aliquot was added to a portion of resin and incubated overnight at room temperature. The following 1 The abbreviations used are: 2D, two-dimensional; PNGase F, peptide-N-glycosidase F; SCX, strong cation exchange; CPE, cysteinyl peptide enrichment; SPE, solid-phase extraction; Xcorr, crosscorrelation score; ⌬Cn, delta correlation; GO, Gene Ontology; FDR, false discovery rate; M-CSF, macrophage colony-stimulating factor; TNF, tumor necrosis factor; TNF R, tumor necrosis factor receptor; IL, interleukin; PDGF, platelet-derived growth factor; VEGF, vascular endothelial growth factor; TGF-␤, transforming growth factor-␤; CD, cell differentiation; IGF, insulin-like growth factor; HUPO, Human Proteome Organization; CCL21, small inducible cytokine A21. day, the non-glycoproteins were removed by washing the resin briefly three times with 100% methanol and then three times with 8 M urea in 0.4 M NH 4 HCO 3 . The glycoproteins were denatured and reduced in 8 M urea and 10 mM DTT at 37°C for 1 h. Protein cysteinyl residues were alkylated with 20 mM iodoacetamide for 90 min at room temperature. After washing sequentially with 8 M urea and 50 mM NH 4 HCO 3 , the resin was resuspended as a 20% slurry in 50 mM NH 4 HCO 3 after which sequencing grade trypsin (Promega) was added at a 1:100 (w/w) trypsin:protein ratio (based on the initial plasma protein concentration of 60 mg/ml). The sample was then digested on-resin overnight at 37°C. After centrifuging at 3000 ϫ g for 5 min, the trypsin-released peptides (non-glycopeptides) were collected from the supernatants and lyophilized. The resin was further washed extensively with the following three different solutions: 2 M NaCl, 100% methanol, and 50 mM NH 4 HCO 3 . The resin was resuspended as a 50% slurry in 50 mM NH 4 HCO 3 , and the N-glycopeptides were released by incubating the resin with peptide N-glycosidase F (PNGase F; New England Biolabs, Beverly, MA) at a ratio of 1 l of PNGase F/100 l of plasma for 4 h at 37°C. The released deglycosylated peptides were cleaned by using an SPE C 18 column before being lyophilized.
Strong Cation Exchange (SCX) Peptide Fractionation-All of the enriched cysteinyl peptides and deglycosylated peptides and 300 g each of non-cysteinyl peptides and non-glycopeptides were individually reconstituted with 300 l of 10 mM ammonium formate (pH 3.0), 25% ACN and fractionated by SCX chromatography on a Polysulfoethyl A 200 ϫ 2.1-mm column (PolyLC, Columbia, MD) that was preceded by a 10 ϫ 2.1-mm guard column. The separations were performed on an Agilent 1100 series HPLC system (Agilent) at a flow rate of 200 l/min and with mobile phases that consisted of 10 mM ammonium formate (pH 3.0), 25% ACN (A) and 500 mM ammonium formate (pH 6.8), 25% ACN (B). After loading 300 l of sample onto the column, the gradient was maintained at 100% A for 10 min. Peptides were separated by using a gradient from 0 to 50% B over 40 min followed by a gradient of 50 -100% B over 10 min. The gradient was then held at 100% B for 10 min. A total of 30 fractions were collected for each peptide population, and each fraction was dried under vacuum. The fractions for each population were dissolved in 30 l of 25 mM NH 4 HCO 3 , and 10 l of each fraction were analyzed by capillary LC-MS/MS.
Reversed-phase Capillary LC-MS/MS Analyses-A custom-built high pressure capillary LC system (36) coupled on line to a linear ion trap mass spectrometer (LTQ; ThermoElectron) via an in-house-manufactured electrospray ionization interface was used to analyze peptide samples. The reversed-phase capillary column was prepared by slurry-packing 3-m Jupiter C 18 bonded particles (Phenomenex, Torrance, CA) into a 65-cm-long, 150-m-inner diameter ϫ 360-mouter diameter fused silica capillary (Polymicro Technologies, Phoenix, AZ) that incorporated a retaining stainless steel screen in an HPLC union (Valco Instruments Co., Houston, TX). The mobile phases, which consisted of 0.2% acetic acid and 0.05% TFA in water (A) and 0.1% TFA in 90% ACN, 10% water (B), were degassed on line by using a vacuum degasser (Jones Chromatography Inc., Lakewood, CO). After loading 10 l of peptides onto the column, the mobile phase was held at 100% A for 20 min. An exponential gradient elution was achieved by increasing the mobile phase composition in a stainless steel mixing chamber from 0 to 70% B over 150 min. To identify the eluting peptides, the linear ion trap mass spectrometer was operated in a data-dependent MS/MS mode (m/z 400 -2000) in which each full MS scan was followed by 10 MS/MS scans. The 10 most intense precursor ions were dynamically selected in order of highest to lowest intensity and then subjected to collision-induced dissociation; a normalized collision energy setting of 35% and a dynamic exclusion duration of 1 min were used. The temperature of the heated capillary and the ESI voltage were 200°C and 2.2 kV, respectively.
Data Analysis-The SEQUEST (37) algorithm (ThermoFinnigan) was used to independently search all MS/MS spectra against the human International Protein Index (IPI) database (Version 3.05 that consists of 49,161 protein entries; available on line at www.ebi-.ac.uk/IPI) and the reversed human IPI protein database. Tandem MS peaks were generated by extract_msn.exe, part of the SE-QUEST software package. Dynamic carboxamidomethylation of cysteine and oxidation of methionine were used to identify cysteinyl peptides, non-cysteinyl peptides, and non-glycopeptides; another PNGase F-catalyzed conversion of asparagine to aspartic acid at the site of carbohydrate attachment was added to identify the formerly N-linked glycopeptides.
Because false positive peptide/protein identifications are a common concern in proteomic investigations, we developed a set of criteria based on the reversed database approach for filtering the raw data to limit false positive identifications to Ͻ5%. The reversed human protein database was created by reversing the order of the amino acid sequences for each protein, and the percentage of false positive peptide identifications was estimated as described previously (38) by dividing the number of peptides identified from the reversed database search by the number of peptides identified from the normal database search. Only the cysteine-containing and the NX(S/T) motif-containing peptides were used to estimate false positive identifications for cysteinyl peptides and formerly N-linked glycopeptides, respectively. (An in silico assessment of the number of cysteine-containing and NX(S/T) motif-containing peptides among all tryptic peptides shows the number of these kinds of peptides in the normal and reversed databases to be similar; thus, the number of random false positive identifications is assumed to be similar for both databases.) Criteria that would yield an overall confidence of over 95% at the unique peptide level were established for filtering raw peptide identifications. Table I summarizes the cross-correlation score (Xcorr) values for the different peptide populations; a delta correlation (⌬Cn) value of 0.10 was used to determine these values. Two additional ⌬Cn cutoff values of 0.05 and 0.16 were applied to reduce false negative identifications while maintaining a 95% level of confidence for peptide assignments. For ⌬Cn Ն 0.05, the minimum acceptable Xcorr value was raised to achieve a comparable percentage of false positive rate identifications, and similarly for ⌬Cn Ն 0.16, the minimum acceptable Xcorr value was reduced. In an attempt to remove redundant protein entries, Protein Prophet software was used as a clustering tool to group similar or related protein entries into a "protein group" (39). All peptides that passed the filtering criteria were assigned an identical probability score of 1 and entered into the software program (solely for clustering analysis) to generate a final list of nonredundant proteins or protein groups. One protein identification was randomly selected to represent each corresponding protein group that contains member database entries.
Gene Ontology (GO) component, function, and process terms extracted from text-based annotation files downloaded from the European Bioinformatics Institute ftp site (ftp.ebi.ac.uk/pub/databases/ GO/goa/HUMAN) were used to categorize the identified proteins. Further biological interpretation in the context of gene ontology was performed with the aid of GOstat (40). Protein transmembrane helices were determined by TMHMM, a prediction model based on a hidden Markov model (www.cbs.dtu.dk/services/TMHMM/).
Pathway and network analysis was performed by using the Ingenuity Pathways Knowledge Base (IPKB) (41). Canonical signal transduction and metabolic pathways were examined among the proteins identified. Queries were also made to obtain cellular localization information of the detected proteins from the IPKB. The significant functional networks among the extracellular proteins were computed as described elsewhere (41).
ELISA-Specific plasma protein levels in the pooled trauma patient plasma sample were determined in duplicate using Quantikine ELISA kits (R&D Systems, Minneapolis, MN) following the manufacturer's instructions. Each sample was analyzed at two different dilutions to determine the optimal dilution. A plasma sample from a healthy subject obtained from Golden West Biologicals (Temecula, CA) was also assayed for comparison. Fig. 1 schematically illustrates the sample processing and fractionation strategy developed to globally characterize the trauma patient plasma proteome. High abundance proteins were first removed by immunoaffinity subtraction. The 12 proteins targeted by the MIXED12 column in this step constitute up to 95% of the total protein mass in human plasma, and our results showed that the column removed the majority of these proteins; 5.6% of the total protein mass in the pooled trauma patient plasma remained after the depletion step. Following the depletion, the less abundant proteins in the flow-through fraction were split into two aliquots, which were subjected to solid-phase chemical fractionation schemes to enrich both cysteinyl peptides and N-glycopeptides. The cysteinyl peptides were captured by incubating the tryptic peptides with a thiol affinity resin, thiopropyl-Sepharose, upon which mixed disulfide bonds formed between the resin and the cysteinyl peptides. Non-cysteinyl peptides were collected immediately after incubation, and the cysteinyl peptides were recovered by incubating the resin with a low molecular weight thiol, DTT, followed by alkylation. Another aliquot of proteins was first oxidized by periodate, which converted the hydroxyl groups on adjacent carbon atoms in the glycoprotein carbohydrates to aldehydes. The glycosylated proteins were then captured on the hydrazide resin via formation of covalent hydrazone bonds between the newly converted aldehyde groups and the hydrazide groups on the resin. The captured glycoproteins were denatured, reduced, alkylated, and digested in situ after which the non-glycopeptides were collected from the supernatant using brief centrifugation. The N-glycopeptides were specifically released by incubating the resin with PNGase F.

High Dynamic Range Proteomic Profiling Strategy-
Peptide and Protein Identifications-30 SCX fractions of each peptide population were analyzed by LC-MS/MS, and the resulting spectra were searched against the human IPI protein database by using SEQUEST. Only those peptide identifications that passed the filtering criteria in Table I were considered as confident identifications. The numbers of confident peptide and protein identifications obtained from the four different populations are summarized in Table II. A total of 22,267 unique peptides (81.8% fully tryptic) were identified from the four different peptide populations, corresponding to 3654 nonredundant proteins, including 1494 multipeptide proteins (Supplemental Table 1) and 2160 single peptide proteins (Supplemental Table 2). Note that the non-cysteinyl peptide sample contributed the largest set of peptide identifications (9951), and the N-glycopeptide fraction contributed the smallest set of peptide identifications (2910).
Of the 3654 proteins identified from the pooled trauma patient plasma, 1341 proteins (36.7%) were present in more than one peptide population, and 330 proteins (9.0%) were observed in all four peptide populations (Fig. 2B), demonstrat-FIG. 1. Schematic representation of the sample processing and fractionation used to characterize the trauma patient plasma proteome. High abundance proteins (H) were first removed using immunoaffinity subtraction. The resulting less abundant proteins (L) were split and submitted individually for solidphase cysteinyl peptide and N-glycoprotein captures. Non-cysteinyl peptides and non-glycopeptides generated at the same time were also collected. All four different peptide populations were then fractionated by SCX chromatography, and each fraction was analyzed by capillary LC-MS/MS. ing the complementary nature of the four different types of peptide identifications for obtaining higher proteome coverage. 662 distinct N-glycoproteins were identified from the 2910 N-glycopeptides. The overlap between peptides identified in the four peptide populations ( Fig. 2A) shows that the cysteinyl peptide and N-glycopeptide fractions share only a small percentage of identifications with the other peptide fractions. Although the non-cysteinyl peptide and non-glycopeptide fractions exhibit much greater overlap in peptide identifications, these fractions still provided additional proteome coverage (e.g. 523 proteins only from non-glycopeptides and 729 proteins only from non-cysteinyl peptides). Interestingly the same sequences for 12.5% of the N-glycopeptides were also found in 94 cysteinyl peptides, 58 nonglycopeptides, and 260 non-cysteinyl peptides, indicating that these N-glycosylation sites are only partially modified and that both N-glycosylated and unmodified peptides from different fractions are detected.
Confidence of Identifications-To illustrate the stringency of our approach for limiting peptide false discovery rate (FDR) to Ͻ5%, the filtering criteria developed in this study were compared with two sets of published criteria (21,42) used previously to filter plasma/serum MS/MS data. The three different sets of criteria were applied to filter the SEQUEST identifications for all four peptide populations, and FDRs resulting from each set of criteria were assessed using the reversed database approach (38). The results of this evaluation are summarized in Table III. Our filtering criteria generated the smallest number of peptide and protein identifications, consistent with the significantly lower percentage of false positive identifications (ϳ4%). Using the criteria recommended by the Human Proteome Organization (HUPO) (42), peptide identifi-    tide level was ϳ66%. The common feature between our criteria and the HUPO criteria is that only fully and partially tryptic peptides are considered; however, the HUPO criteria use much lower Xcorr cutoff values for the partially tryptic peptides. The criteria used by Hood et al. (21) are similar to two other previously reported sets of criteria (22,25) in which chymotryptic and elastic peptides as well as those peptides that had high Xcorr scores but no proteolytic end were also considered. Clearly our filtering criteria are significantly more stringent when compared with the two sets of previously published criteria. Furthermore the cysteinyl peptide and Nglycopeptide identifications were specifically enriched through reliable solid-phase capture, which further increases the overall confidence level for our data.
The confidence of our current filtering criteria was also compared with the independent statistical model Peptide Prophet program (43) by analyzing the same subset of datasets applying both approaches. Histograms of Xcorr distributions in peptides identified by reversed database approach showed that low FDR can be achieved using filtering criteria listed in Table I (Supplemental Fig. 1). Similarly high confidence peptide identification can be obtained at a certain Peptide probability (p comp ) level (Supplemental Fig. 2). 6279 unique peptides were identified from the reversed database approach (Ն95% confidence), whereas 6341 unique peptides were identified from Peptide Prophet (p comp Ն 0.95) among which 5973 (ϳ95%) peptides were found in common. Note that the estimated FDR predicted by Peptide Prophet is 0.3% (p comp Ն 0.95), further supporting the stringency of the cur-rent reversed database filtering criteria. After processing of both sets of peptides by Protein Prophet, a total of 1301 proteins and 1254 proteins were identified for the reversed database approach and Peptide Prophet, respectively. The high percentage of overlap at the peptide level and the comparable numbers of protein identifications reflect the overall high quality of the currently reported data. Fig. 3 plots reference protein concentrations under normal conditions for 80 selected proteins identified in this study as well as their peptide distribution within the four peptide populations. It is obvious that most of the high abundance proteins and proteins having a midrange concentration (e.g. from g/ml to ng/ml level) were identified from multiple peptide populations, whereas many of the low abundance proteins (e.g. cytokines) were identified from only one peptide population. It is speculated that some protein concentrations in trauma patients may be significantly different from those under normal conditions as a result of inflam-

TABLE V Categorization of 78 classic cytokines and cytokine receptors identified in this study
Cytokine nomenclature and categorization are from R&D Systems and Ref. 57. LT-␤, lymphotoxin beta; TRAIL, TNF-related apoptosis-inducing ligand; DR6, death receptor 6; GM-CSF, granulocyte macrophage colony-stimulating factor; IL-1RA, interleukin 1 receptor antagonist; IL-1R AP, interleukin 1 receptor accessory protein; IL-18 BP, interleukin 18-binding protein; HGF, hepatocyte growth factor; MSP, macrophage-stimulating protein; Met, hepatocyte growth factor receptor; FGF, fibroblast growth factor; trk B, neurotrophic tyrosine kinase receptor type 2; PlGF, placenta growth factor; EGF, epidermal growth factor; OB, leptin; OSM, oncostatin M; CNTF, ciliary neurotrophic factor; SCF, stem cell factor; Flt-3L, fms-like tyrosine kinase 3 ligand; c-kit, stem cell factor receptor; MK, midkine; PTN, pleiotrophin; RPTP␤, receptor-type tyrosine-protein phosphatase ␤ ; BMP, bone morphogenetic protein; GDF, growth/differentiation factor; Inh ␤ C, inhibin ␤ C chain; Inh ␤ E, inhibin ␤ E chain; GFR, glia-derived neurotrophic factor receptor; Act, activin; NK cells, natural killer cells; PMNs, polymorphonuclear leukocytes; HCC, hemofiltrate CC (a class of chemokines); MIP, macrophage inflammatory protein; 6Ckine, 6-cysteine chemokine; Eot, eotaxin; CTACK, cutaneous T-cell-attracting chemokine; CCR, CC chemokine receptor; PF4, platelet factor 4; CXCL16, CXC chemokine ligand 16; MIF, macrophage migration inhibitory factor; GH R, growth hormone receptor; IFN, interferon; TLR, toll-like receptor. matory responses. To test this hypothesis and evaluate the overall dynamic range of measurements that resulted from this study, six identified low abundance proteins that are related to inflammatory response were assayed by ELISA. The results showed that the protein levels from the trauma patient plasma pool are higher for most of these cytokines or cytokine receptors than those measured from a normal individual except small inducible cytokine A21 (CCL21); however, all six proteins are still present in the 0.5-20 ng/ml range for the trauma patient plasma samples (Table IV). An estimated overall dynamic range of detection of Ͼ10 7 was achieved by applying the combined approach of immunoaffinity subtraction, solid-phase capturing of cysteinyl peptides and N-glycopeptides, and 2D LC-MS/MS. This claim is justified by the confident identification of a series of ng/ml range proteins in this study, exemplified by macrophage colony-stimulating factor (M-CSF) and CCL21 (Fig. 4, top panel and bottom panel, respectively) that are both presenting at 1 ng/ml level in trauma patient plasma. In addition, many tissue leakage proteins in the g/ml to ng/ml concentration range were readily detected in multiple peptide populations, providing a solid basis for candidate disease biomarker (e.g. cancer biomarker) discovery when applying this strategy. Cytokines are a diverse family of soluble immunomodulatory proteins and peptides that can be produced by every nucleated cell type in the body. 78 "classic" cytokines and cytokine receptors, e.g. tumor necrosis factor receptor (TNF R), interleukin (IL), gp130, platelet-derived growth factor (PDGF), vascular endothelial growth factor (VEGF), and transforming growth factor-␤ (TGF-␤), and chemokines were identified in this study and are categorized in Table V.
The immune system works through leukocytes interacting with each other, other cells, tissue matrices, infectious agents, and other antigens. These interactions are mediated by cell surface glycoproteins and glycolipids (cell differentiation molecules, or CD antigens) that are frequently cleaved from the cell surface by protease activity. A total of 136 of the 288 (47.2%) known human protein CD antigens (available at www.hlda8.org/CD1toCD339.htm) were detected in this study (Supplemental Table 3). Good coverage was obtained for the CD antigens routinely detected with anti-leukocyte monoclonal antibodies and used to characterize the cell surface immunophenotypes of different leukocyte subpopulations (e.g. B-cells, helper T-cells, cytotoxic T-cells, and natural killer cells). In 90.4% of the identified CD antigens each had at least one predicted transmembrane domain, whereas among all other proteins, only 15.5% had predicted transmembrane domain(s). This finding is consistent with the fact that the majority of CD antigens are believed to be membraneassociated molecules.
GO and Pathway Analysis of the Detected Proteins- Fig. 5 shows the categories of proteins identified from this study in terms of cellular location based on gene ontology analysis. Comparison of cellular components for N-glycoproteins and the other proteins identified shows major differences. The majority of N-glycoproteins (Fig. 5A) are predicted to be extracellular/secreted proteins (38.8%) and membrane-associated proteins (48.8%), whereas all other proteins (Fig. 5B) are predicted to distribute more evenly across all cellular locations. None of the N-glycoproteins identified are from the nucleus, cytoplasm, mitochondrion, ribosome, proteasome, and cytoskeleton; this is consistent with the biological functions of N-linked glycoproteins (23). The high percentage of intracellular proteins in this dataset indicates that large numbers of proteins present in plasma may result from different levels of cellular leakage.
Further GOstat analyses that compared the distribution of GO terms of identified proteins with the entire human IPI database revealed over-and under-represented molecular functions and biological processes (data not shown). Overrepresented molecular function categories included hematopoietin/interferon class cytokine receptor activity, insulin-like growth factor (IGF) binding, VEGF receptor activity, metallopeptidase activity, protease inhibitor, extracellular matrix structural constituent, lipid binding and transporter activity, polysaccharide binding, receptor protein kinase activity, and oxidoreductase activity. In the GO comparison of biological processes, proteins involved in response to wound, regulation of body fluids, complement activation, and proteolysis categories appeared over-represented among the proteins identified. These findings reflect certain distinguishing features of the trauma patient plasma proteome, e.g. the presence of  Table VI. These proteins include acute phase reactants, cytokines and growth factors, complement proteins and coagulation factors, hormones, extracellular matrix proteins, cell adhesion molecules, and secreted proteases and protease inhibitors in addition to other proteins and immunoglobulins.
Pathway analysis revealed the significant representation of specific signaling pathways, e.g. NF-B signaling (inflammation and immune regulation), apoptosis signaling, ERK (extracellular signal-regulated kinase)/MAPK (mitogen-activated protein kinase) signaling, and Wnt/␤-catenin signaling (data not shown). As an example, Fig. 6A shows a global representation of the extracellular proteins that are involved in immune response comprising a network of 193 proteins and their interactions. (A larger version of this image may be viewed online (Supplemental Fig. 3).) In total, 113 of the 193 (58.5%) known players were identified. Fig. 6B further illustrates the coverage of specific regions of this network and highlights the IGF and IGF-binding proteins, laminins, and matrix metalloproteinases. DISCUSSION Specific biomarkers for diagnosis/prognosis of disease and for monitoring disease progression and response to therapy  have been clinically applied to screen patient tissues and blood samples as well as used to develop therapeutics and segment the population for specific treatments (44). Proteom-ics is increasingly being used in this field to describe and enumerate the systematic changes in the protein constituency of a cell, to generate lists of proteins that change in expres- sion as a cause or consequence of disease, and more importantly to characterize the information flow through the intraand extracellular molecular protein networks that interconnect organ and circulatory systems. These networks are expected to provide new targets for therapeutics and to reveal the dynamic biological changes that give rise to new candidate biomarkers (45). Because of its constant perfusion through tissues within the body, blood plasma is anticipated to contain ample information regarding these networks and therefore provides a basis for candidate disease biomarker discovery. However, several intrinsic features of plasma, such as an enormous dynamic range in concentrations of protein of interest and extreme sample complexity and heterogeneity, hamper effective proteomic analysis.
Our strategy for analyzing blood plasma addresses these issues by combining multicomponent immunoaffinity subtraction and multiple chemical fractionations ( Fig. 1) with 2D LC-MS/MS. The single step depletion of 12 high abundance proteins on an automated LC system significantly increases the dynamic range of detection and reduces sample heterogeneity (due to the simultaneous removal of the highly variable IgG, IgA, and IgM populations). The high efficiency CPE step further reduces sample complexity and in turn enables detection of low abundance proteins (46,47). Simultaneous analysis of the non-cysteinyl peptides, generated as a "byproduct" during CPE, significantly increases proteome coverage (47,48). The N-glycopeptide enrichment step affords yet another effective way of reducing plasma sample complexity (23,35,49). N-Glycosylation is particularly prevalent in proteins that are secreted and located on the extracellular side of the plasma membrane and in proteins that are contained in various body fluids (e.g. blood plasma) (50). Because the Nglycosylation sites generally fall into a consensus NX(S/T) sequence motif, where X represents any amino acid residue except proline (51), the availability of N-glycopeptides from each protein is limited; this provides the basis for reducing sample complexity. The performance of the immunoaffinity subtraction MIXED12 column has been reported previously to be specific, efficient, and reproducible (29). In our study, the 12 immunosubtraction-targeted proteins were specifically and effectively removed, but peptides from these proteins were still detected by MS following the depletion step (data not shown) presumably due to the initial high concentrations of these proteins. Binding of other nontarget proteins to the MIXED12 column did occur, however, in a slight and fairly reproducible fashion (data not shown). Therefore, this column should be applicable to quantitative studies. The methods for cysteinyl peptide and N-glycopeptide enrichment can be integrated seamlessly into the quantitative strategies by using stable isotope labeling (35,46) (e.g. 18 O labeling) and are amenable to automation. The major challenges in proteomic studies on inflammation are that, whereas cancer biomarkers are usually binomial (i.e. they are absent in normal and present in disease and probably do not vary extensively over short time periods), the inflammatory biomarkers are likely not a matter of presence or absence but more of a change in concentration compared with normal and likely to change rapidly over time, addressing the need for high throughput temporal analysis in this field. Although the current study is qualitative, the analytical strategy proposed here is able to be directly linked to quantitative automation. Taking full advantage of the broad proteome coverage and in-depth detection dynamic range, samples from temporal studies on inflammation prepared by the current strategy, with or without stable isotope labeling, can be analyzed by LC-MS-based approaches for high throughput biomarker discovery by using the accurate mass and time tag approach (52).
Application of this "divide-and-conquer" strategy significantly improved overall dynamic range of detection (estimated to be Ͼ10 7 ) for the characterization of the blood plasma proteome. In a previous report a nonredundant human plasma protein list was developed by combining proteins from four separate sources (53). The 46 proteins found in common among the four datasets were all relatively high abundance classic plasma proteins. In the current study, some of the 330 proteins observed in all four of the peptide populations appear to be classic plasma proteins (e.g. complement components, coagulation factors, and protease inhibitors); however, there are also many proteins such as cathepsins D and L that are anticipated to be present at relatively low concentrations in plasma (Ͻ10 ng/ml at normal conditions (54,55)). These results suggest the overall high dynamic range of detection applying the current strategy that allows many low abundance proteins to be consistently detected in all four peptide populations. Additionally improvements in overall proteome coverage were achieved by combining identifications from these four different, but complementary, peptide populations. Only very small overlap was observed between the cysteinyl peptide and non-cysteinyl peptide fractions, between the N-glycopeptide and non-glycopeptide fractions, and between the cysteinyl peptide and N-glycopeptide fractions ( Fig. 2A). A total of 3654 non-redundant proteins (Table II) were confidently identified that included 662 N-glycoproteins (Fig. 2B) as well as numerous cytokines, cytokine receptors, CD antigens, and proteins involved in both inflammation and immune response processes. Approximately 63% of these nonredundant proteins were identified solely from one peptide population and included many low abundance proteins such as IL-2 from non-glycopeptides; PDGF B chain from glycopeptides; CCL21 from cysteinyl peptides, and calcitonin from non-cysteinyl peptides.
A major challenge in the analysis of MS data is the accurate assessment of the extent of false positive identifications. Various criteria have been developed to filter raw MS/MS identifications; however, further statistical evaluation is essential to ensure that high confidence protein identifications can be derived from such analyses. For example, an initial large scale HUPO plasma proteome collaborative study assembled a list of 3020 proteins identified with two or more peptides using data acquired on different instruments from 18 different laboratories (42). Recently the list has been reduced to 889 proteins (containing both multipeptide and single peptide protein identifications) identified with a confidence level of at least 95% using a rigorous statistical approach taking into account the length of coding regions in genes and multiple hypothesis-testing techniques (56). Our filtering criteria developed based on the reversed database searching are much more stringent compared with the early HUPO criteria (Table  III). Their stringency is also supported by the comparable results from the reversed database approach and Peptide/ Protein Prophet program in terms of generating high confidence protein identifications. Furthermore reanalyzing the data from one of our early plasma profiling studies (19) using the HUPO criteria yielded 1073 proteins; the length-dependent statistical analysis yielded ϳ2-fold reduction in protein identifications (433 proteins with confidence Ͼ95%) (56). Similarly for the data presented in this study, the reversed database analysis also resulted in ϳ2-fold fewer protein identifications compared with those identifications obtained if the HUPO criteria were applied (3654 versus 7928 proteins using previous HUPO criteria, Table III), suggesting an approximate comparable level of confidence for protein identifications obtained between the reversed database criteria and the recently published length-dependent statistical analysis (56). These comparisons between independent statistical approaches reflect the overall high quality of the currently reported data obtained by using the novel combined approach; however, the single peptide protein identifications will have a higher FDR compared with the multipeptide proteins; therefore, these single peptide proteins are listed separately in Supplemental Table 2.
The relatively low abundance cytokines present in plasma mediate not only host responses to invading organisms, tumors, and trauma but also maintain our capacity of daily survival in our germ-laden environment (57). The detection of cytokines in disease states (e.g. inflammation) may provide very useful diagnostic and/or therapeutic tools; for example, IL-1 receptor antagonist has been shown to play a role in systemic host responses and improve survival in septic shock (58,59), and some cytokine receptors are also being evaluated for the anti-inflammation effects (60). Our strategy enabled the detection of members of all major cytokine families (Table V), demonstrating the applicability of this strategy for discovering cytokine inflammation biomarkers in quantitative studies.
An area of contention surrounding biomarker discovery is whether a single protein marker or a panel of biomarkers should be used for disease diagnosis and therapeutic treatment. An increasingly common view is that the use of a single biomarker lacks the required sensitivity and specificity when applied to a heterogeneous population; however, these limitations may be overcome by utilizing panels of biomarkers (61). As in cancer, the dysfunctional or malignant cell growth may result from the changes in multiple members of the deranged protein signal transduction pathways. Therefore, an understanding of the pathways and networks that involve plasma proteins released from the cells would facilitate the development of a disease biomarker panel for clinical applications. The pathway analysis reveals that our dataset indeed provides extensive coverage for important signaling pathways (e.g. NF-B signaling pathway) and protein networks involved in inflammatory and innate immune responses. Such coverage suggests the potential for simultaneously monitoring the temporal changes of many protein players for a specific pathway/network when the current strategy is coupled with quantitative methodologies (e.g. stable isotope 18 O labeling).