Multidimensional Liquid Chromatography Separation of Intact Proteins by Chromatographic Focusing and Reversed Phase of the Human Serum Proteome

In biomarker discovery, the detection of proteins with low abundance in the serum proteome can be achieved by optimization of protein separation methods as well as selective depletion of the higher abundance proteins such as immunoglobins (e.g. IgG) and albumin. A relative newcomer to the proteomic separation arena is the commercial instrument PF2D from Beckman Coulter that separates proteins in the first dimension using chromatofocusing followed in line by reversed phase chromatography in the second dimension, thereby separating intact proteins based on pI and hydrophobicity. In this study, assessment and optimization of serum separation (undepleted serum and albumin-IgG-depleted serum) by the PF2D is presented. Protein databases were created for serum obtained from a healthy individual under traditional and optimized methods and under different sample preparation protocols. Separation of the doubly depleted serum using the PF2D with 20% isopropanol present in the first dimension running buffer allowed us to unambiguously identify 150 non-redundant serum proteins (excluding all immunoglobulin and albumin, a minimum of two peptide matches with acceptable Mascot score) in which 81 have not been identified previously in serum. Among them, numerous cellular proteins were identified to be specifically the skeletal muscle isoform, such as skeletal muscle fast twitch isoforms of troponin T, myosin alkali light chain 1, and sarcoplasmic/endoplasmic reticulum calcium ATPase. The detection of specific skeletal muscle protein isoforms in the serum from healthy individuals reflects the physiological turnover that occurs in skeletal muscle, which will have an impact on the ability to use generic “cellular” proteins as biomarkers without further characterization of the precise isoforms or post-translational modifications present.

In biomarker discovery, the detection of proteins with low abundance in the serum proteome can be achieved by optimization of protein separation methods as well as selective depletion of the higher abundance proteins such as immunoglobins (e.g. IgG) and albumin. A relative newcomer to the proteomic separation arena is the commercial instrument PF2D from Beckman Coulter that separates proteins in the first dimension using chromatofocusing followed in line by reversed phase chromatography in the second dimension, thereby separating intact proteins based on pI and hydrophobicity. In this study, assessment and optimization of serum separation (undepleted serum and albumin-IgG-depleted serum) by the PF2D is presented. Protein databases were created for serum obtained from a healthy individual under traditional and optimized methods and under different sample preparation protocols. Separation of the doubly depleted serum using the PF2D with 20% isopropanol present in the first dimension running buffer allowed us to unambiguously identify 150 non-redundant serum proteins (excluding all immunoglobulin and albumin, a minimum of two peptide matches with acceptable Mascot score) in which 81 have not been identified previously in serum. Among them, numerous cellular proteins were identified to be specifically the skeletal muscle isoform, such as skeletal muscle fast twitch isoforms of troponin T, myosin alkali light chain 1, and sarcoplasmic/endoplasmic reticulum calcium ATPase. The detection of specific skeletal muscle protein isoforms in the serum from healthy individuals reflects the physiological turnover that occurs in skeletal muscle, which will have an impact on the ability to use generic "cellular" proteins as biomarkers without further characterization of the precise isoforms or post-translational modifications present.

Molecular & Cellular Proteomics 5:26 -34, 2006.
There has been a surge of interest in the proteomic analysis of plasma and serum in the search for clinically relevant biomarkers of disease. In biomarker discovery, it is necessary to maximize the observation of the plasma or serum proteome to detect proteins with low abundance. This can be achieved by optimization of protein separation methods as well as selective depletion of the proteins at high abundance such as immunoglobins (e.g. IgG) and albumin. There are a large number of proteomic technologies that separate either proteins or peptides prior to MS (1). A relative newcomer to the proteomic separation arena is the commercial instrument PF2D from Beckman Coulter that separates proteins in the first dimension using chromatographic focusing followed in line by reversed phase chromatography in the second dimension, thereby separating intact proteins based on pI and hydrophobicity. To date, assessment and optimization of either plasma or serum separation by the PF2D has not been undertaken. Generally LC proteomic methods have focused on separating complex mixtures of peptides obtained following digestion of the serum proteome (peptide LC, shotgun), whereas separation of proteins has been relegated primarily to electrophoresis in both one and two dimensions (2DE) 1 (2,3). 2DE has an advantage over peptide-based LC methods as it enhances the ability to identify the precise isoforms of proteins that are present and/or post-translational modifications (PTMs) that may alter the pI or mass of a protein (1). The potential to separate proteins, rather than peptides, by liquid chromatography using two dimensions would potentially provide the same advantages as 2DE. The PF2D system (Beckman Coulter) is a two-dimensional LC system that uses chromatographic focusing to separate intact proteins in the first dimension by pI (from 8.5-4.0) and in the second dimension by reversed phase chromatography, which separates based on hydrophobicity, thus enhancing the precise detection of isoforms and/or PTMs that alter the pI and/or hydrophobicity of a protein. Also by resolving proteins based upon their intrinsic characteristics prior to mass spectrometry analysis into fractions containing a few proteins (one up to 20), a high degree of sequence coverage for each identification can be obtained to observe PTMs. This is in contrast to shotgun databases in which protein identifications can be made on a single observed peptide with no information about the nature of the parent protein. In this study, we outline the optimization of the solubilization and separation conditions for the human serum proteome on the PF2D. To assess the utility of the PF2D, protein databases were created for serum obtained from a healthy individual under traditional and optimized methods and under different sample preparation protocols to remove lipids, albumin, and immunoglobulins.

MATERIALS AND METHODS
Serum Preparation-100 l of serum was delipidated and depleted sequentially of albumin and IgG (doubly depleted serum) as outlined in Fu et al. (4). Briefly serum was centrifuged for 15 min at 15,000 ϫ g, and the lipid-containing upper layer was removed. The delipidated serum was mixed with protein G-Sepharose beads (HiTrap protein G HP, Amersham Biosciences) in 100 mM NaCl, 10 mM HEPES, pH 7.4 and mixed in a Handee Mini-spin column (Pierce) for 1 h at room temperature and then centrifuged (6000 rpm for 3 min) to pellet beads. Prechilled 95% ethanol (Sigma) was added to the supernatant to a final concentration of 42% and incubated for 1 h at 4°C with gentle mixing followed by centrifugation at 16,000 ϫ g for 45 min at 4°C. The supernatant (albumin-enriched fraction) was removed, and the pellet (serum proteins depleted of albumin) was stored at Ϫ80°C. Over 95% of the albumin and IgG is removed (4). Purified bovine serum albumin (Sigma), the pellet (doubly depleted serum), or unfractionated serum was suspended in PF2D start buffer (pH 8.5). Protein concentrations were determined in duplicate by the BCA protein assay (Pierce).
Liquid Chromatography-The albumin, serum, or the doubly depleted serum was analyzed on a one-dimensional liquid chromatography system (System Gold system with autoinjector, Beckman Coulter) or a two-dimensional liquid chromatography system (Pro-teomeLab PF2D, Beckman Coulter). The serum analysis was exclusively carried out on new naïve columns (which were not used for any other experiments) with strong wash conditions between runs to ensure no contamination from another analysis. The first dimension of PF2D consists of a single piston pump (HPCF Module), manual injector for sample introduction, pH monitor, and UV detector. The first dimension separation consists of chromatofocusing based on charge. Fractions from the first dimension are collected in the fraction collector/injector (FC/I Module), which is the interface between the first and second dimensions. Fractions are automatically introduced into the second dimension reversed phase chromatography, which separates based on hydrophobicity. The second dimension consists of a binary pump system (HPRP Module), column heater, UV detector, and fraction collector. The fractions are collected into 96-deepwell plates for mass spectrometry analysis. The hardware is controlled by 32 Karat software. With this system, the first and second dimensions occur sequentially in an automatic manner. Chromatofocusing was carried out on the CF column by mixing two buffers with different pH, Start Buffer (pH 8.5) and Eluent Buffer (pH 4.0), to create a linear pH gradient from 8.5 to 4.0 that is followed by a wash buffer comprising 1 M NaCl. 1 mg of purified bovine albumin, 1.5 mg of depleted serum, or 3 mg of native undepleted serum was injected onto the CF column equilibrated for 130 min at 0.2 ml/min with the proprietary start buffer that included urea and a reducing agent at pH 8.5. The pH gradient was achieved by introducing increasing amounts of the eluent buffer (pH 4.0, either 0, 10, or 20% acetonitrile, methanol, or isopropanol) at a flow rate of 0.2 ml/min over 95 min. The CF column was then washed for 45 min with the third buffer containing 1 M NaCl and re-stored in water. Fractions were collected every 5 min except during the pH gradient portion of the run when fractions were collected at 0.3 pH intervals. Each fraction (200 -500 l) was sequentially analyzed by reversed phase HPLC at a constant 50°C. Proteins were separated on a non-porous C 18 reversed phase column using 3.33% B/min linear gradient in which solvent A was 0.1% aqueous TFA and solvent B was 0.08% TFA in acetonitrile at a flow rate of 0.75 ml/min. Proteins were monitored at 214 nm. The reversed phase fractions were collected by 0.25 min/tube and stored at Ϫ80°C for further analysis. For one-dimensional LC, samples were separated on the same column as used in the second dimension of PF2D with the same gradient and flow rate. The percentage of albumin eluted on different columns and at different pH conditions was determined by calculating the peak area of the albumin monitored at 214 nm, a wavelength at which the peak area is directly proportional to the quantity of the protein(s) (5,6). Mass spectrometry was carried out to confirm the composition of the albumin peak.
Mass Spectrometry-The reversed phase fractions (188 l) obtained from the two-dimensional LC were concentrated using a SpeedVac concentrator (ThermoSauvant) to 5-10 l. 1 M NH 4 HCO 3 was added to the residue to neutralize samples to pH 8.0. Sequencing grade modified trypsin (Promega, Madison, WI) at an enzyme-tosubstrate ratio of 1:50 was added and incubated overnight at 37°C (Ͼ12 h). 10% trifluoroacetic acid was added to stop the digestion. ESI-MS/MS of tryptic peptides was performed on an LC-ion trap-MS/MS instrument (Thermo Finnigan, San Jose, CA) at The Technical Implementation and Coordination Core, Johns Hopkins University, Baltimore, MD. Proteins identified with amino acid sequences obtained from MS/MS had a minimum of two peptide matches with a minimum Mascot score of 40 for each peptide (Table I). When protein identifications were made with two peptide matches the fragments had to be unique to the protein (and not matching to other potential identifications (non-redundant peptide)). Only in the five cases of serum analysis using isopropanol, where a protein was expected to be present in the fraction, was identification based on a single confirmed peptide, and these are highlighted in Table I. Further stringency was added by eliminating any protein that could be assigned to more than one protein. To create a non-redundant database, the protein identifications were examined manually in the database for possible redundancies including multiple names and homologies because numerous instance were found where the same protein was contained in multiple database protein identifications. Furthermore confirmation of a protein isoform or processed proteins were done based on matching a tryptic peptide fragment to a unique amino acid sequence of the isoform or intact protein. The theoretical pI and mass values were recorded from the National Center for Biotechnology Information (NCBI) entry.

RESULTS AND DISCUSSION
With the standard PF2D default running conditions provided by the manufacturer for plasma (application note A-1963A, Beckman Coulter), we were able to identify a total of 59 non-redundant proteins, excluding any immunoglobulins, in serum from a healthy human individual (Table I). Of these, 19 proteins have not been reported previously in the nonredundant serum/plasma database complied by Anderson et al. (7). These published data are based on the detection of proteins by two or more means, including the literature search or a proteomic analysis using two-dimensional gel electrophoresis of plasma (8) or following global digest of the plasma proteome analysis by LC-MS/MS (9, 10) (see Supplemental  Table 1). We also compared our data to proteins matched by a single means of identification with either human serum or    Table 1). In fact, 16 proteins have not been detected by any proteomic method to date. Interestingly these comprise both traditional serum proteins as well as proteins of cellular origin. In some cases, the identification of a particular isoform was sufficient to make it a novel find. For example, the ␣ actinin isoform 3 was unambiguously identified in the doubly depleted serum using the PF2D (Table  I), whereas the ␣ actinin isoform 2 in human plasma had been reported previously (7,11). Unexpectedly 25 serum proteins were eluted in two fractions, at their theoretical pI and with the salt wash, or were only eluted during the salt wash (Table I). Although the salt wash is designed to elute proteins with a pI below 4.0, the theoretical pI of the proteins eluting in the salt wash ranged from 7.25 to 5.22 (e.g. N-acetylmuramoyl-L-alanine amidase and apolipoprotein A-IV, respectively). Therefore, unless extensively modified, these proteins should not elute in this fraction indicating suboptimal running conditions under default running conditions provided by the manufacturer. Serum albumin (theoretical pI, 5.7) also displayed a dual fractionation pattern and eluted in the first dimension fraction collected between pH 5.5 and 5.7 and in the salt wash (Fig. 1). This suggests that either there is incomplete denaturing of protein complexes during the solubilization step and/or enhanced binding of particular proteins to the first dimension column.
To address both possibilities, we altered both solubilization and first dimension running conditions (some representative conditions are shown in Table II) and tracked the elution of albumin either from human serum or purified isolated albumin (bovine, which behaves in a similar manner and was eluted in two fractions using the default method (data not shown)). Addition of salt, detergent, and/or organic solvent to the solubilization buffer was tried with little effect. Our optimized first dimension separation included the addition of 20% isopropanol to first dimension running buffer, which most likely reduced hydrophobic interaction between proteins and the CF column. The addition of 20% isopropanol increased the percentage (quantity) of albumin eluting from the CF column in the correct pH 5.5-5.7 fraction, although there was a small amount eluting in the salt wash. This is illustrated by compar- FIG. 1. Elution profiles from the PF2D analysis of undepleted and doubly depleted human serum. 3 mg of serum from a healthy individual was analyzed on the PF2D (Beckman Coulter) using the default method provided by the manufacture (A) or under conditions with the addition of 20% isopropanol to the first dimension buffer (B). These can be compared with 1.5 mg of doubly depleted serum separated under this optimized condition (C). The first dimension elution profiles were monitored at 280 nm, and the pH gradient is shown with a dotted line. Fractions collected in the first dimension were subsequently separated on a reversed phase column monitored at 214 nm. Two fractions equivalent to pH 5.5-5.7 and 5.2-5.3 and the fraction obtained during the salt wash are shown. Albumin was the only protein identified by ESI-MS/MS (noted by the peak (*)) eluting from the reversed phase column at 16.8 min. There was a minor amount of transferrin contamination of the albumin peak in the salt wash when whole serum was separated under the default methods. 1D, first dimension; 2D, second dimension.
ing the area of the albumin peak (Fig. 1, *) in the pH 5.5-5.7 and salt fractions when the serum sample was separated using the default method or when 20% isopropanol was present in the first dimension buffer. The peak eluting at 16.8 min contained only albumin based on the ESI-MS/MS analysis of the reversed phase fraction and is designated with an * in Fig.  1. The only exception is in the serum sample run under the default method in which there was a small amount of transferrin also present in the albumin peak eluting in the salt wash. Under our optimized conditions with 20% isopropanol present in the first dimension buffer, transferrin was not present. The effect of 20% isopropanol on the elution profile of serum proteins was to increase the total number of proteins unambiguously identified to 63, but more importantly it reduced the number of proteins eluting in two fractions (compare the proteins eluting in two fractions (Table I, ϩ b ) with those that are eluting only at their theoretical pI (Table I, ϩ)). When this occurred the quantity of proteins in the salt wash was reduced based on the rough indication of the percent sequence coverage (11).
Separation of the doubly depleted serum using the PF2D with 20% isopropanol present in the first dimension running buffer (Table I) Table 1). As schematically illustrated in Fig. 2, the serum database is comprised of 89 (60%) serum proteins, 53 (35%) cellular proteins, and eight (5%) proteins with no known function (unknown) or that are from viral or bacterial proteins (see Supplement Fig. 1 for examples of LC-MS/MS spectra confirming the unique identification of some of these proteins). Surprisingly 38% (57 of 150) of all the proteins observed by the PF2D analysis had never been reported to be in serum, not even from a single source (7, 11) (Supplemental Table 1).
Interestingly numerous cellular proteins were identified to be specifically of the skeletal muscle isoform. Unambiguous identifications of the specific protein isoform was based on the detection of a tryptic fragment of amino acid sequence unique to the particular isoform (examples of MS/MS data and the corresponding unique sequences for some of these proteins are shown in Supplement Fig. 1). For example, there were three tryptic peptide fragments used to confirm that the The unambiguously identified proteins present in the doubly depleted serum were classified according to the Swiss-Prot database, NCBI database, or literature searches. Classification of proteins was based on their primary location, either serum or cellular, or proteins with unknown function (or derived from viral or bacterial proteins). Surprisingly 30% of the serum proteins (27 of 89), 70% (37 of 53) of the cellular proteins, and all of the unknown proteins have not been reported previously by Anderson et al. (7) and are highlighted as "New."

TABLE II Optimization of chromatographic focusing conditions for separation of endogenous albumin present in human serum
The bold values represent the optimized condition that was used to generate the database. The dashes represent no data available because the condition was too harsh on the instrument. troponin C was the fast twitch skeletal muscle isoform rather than the slow twitch skeletal isoform. Similarly two, two, and four tryptic fragments were used to uniquely identify the skeletal muscle fast twitch isoforms of troponin T (TnT), myosin alkali light chain 1, and sarcoplasmic/endoplasmic reticulum calcium ATPase, respectively. The unambiguous identification of a specific isoform such as skeletal fast twitch TnT versus cardiac or skeletal slow twitch TnT isoform is uncommon in peptide-based LC methods. However, it is critical to distinguish between the various isoforms. The detection of cardiac TnT in serum is the gold standard for diagnosis of acute myocardial infarction. The basis for this as a diagnostic tool is the concept that the cardiac isoform is only present in cardiac muscle, and its release into blood is due to cellular necrosis or apoptosis. In fact, even minor quantities of cardiac TnT are linked to poor outcome. It is well documented that the serum proteome consists of classical serum proteins as well as proteins that "leak" from cells during active secretion due to physiological or pathological events or as a result of cellular necrosis or potentially apoptosis. In theory, the cellular leakage of proteins could arise from any cell type, and thus the unambiguous identification of cell-or organ-specific proteins is key to understanding pathology and to discovering robust diagnostic markers.

Conditions for solubilization and eluent buffer
There are several factors that influence the number of proteins identified in any proteomic study: (i) quantity of sample, (ii) extent of fractionation, (iii) loading capacity of the technology, and (iv) the stringency of criteria used for the protein identification. In a recent study, peptide fragments obtained from digestion of 1.1 ml (Ͼ50 mg of protein) of undepleted plasma were fractionated using ion exchange chromatography followed by capillary reversed phase LC (12). The authors reported the identities of 1292 non-redundant proteins. It is important to note that only 138 proteins (including the highly abundant plasma proteins) were identified based on two or more fragments. This is ϳ10% of their database and close to the number of proteins we have unambiguously identified with a more stringent protein identification criteria (two or more fragments). Using a similar approach, Qian et al. (11) analyzed 3 mg of peptides obtained from the tryptic digestion of 200 l of undepleted plasma from healthy and diseased individuals. The peptides were separated by multiple dimension LC. 667 proteins were identified based on more than one peptide from any of multiple LC analyses (for comparison see Supplemental Table 1). One potential reason for the higher number of proteins is the increased release of "new" proteins with lipopolysaccharide administrations, which activates a strong acute phase response. Although the majority of proteins that were identified have plasma concentrations above 10 g/ml, one of their least abundant proteins was ␣-fetoprotein, a serum protein elevated with cancer, which has an estimated plasma concentration near 0.005 g/ml in healthy individuals (11). Using the PF2D to separate intact proteins with our optimized protocol and double depletion of the serum has also allowed the detection of less abundant serum proteins such as coagulation factor XIII, which has an approximate plasma concentration of 10 g/ml. The concentration of cellular proteins with low abundance is difficult to assess as the plasma concentration in humans has not been measured. However, it is possible to obtain an estimate of skeletal troponin T (sTnT), which we observed, based on measured levels of skeletal troponin I (sTnI), a protein that binds sTnT in a 1:1 stoichiometry. Only sTnI concentrations in plasma of healthy individuals has been reported. Sorichter et al. (13,14) reported sTnI to be 0.002 g/ml and then later differentiated between gender with 0.0012 and 0.0005 g/ml in healthy males and females, respectively. Recently we reported sTnI concentrations in healthy individuals and patients with nonskeletal muscle diseases to be below 0.0016 g/ml (level of assay detection) (15). The presence of sTnT in our database is an excellent example of the sensitivity that can be achieved using the PF2D under optimized conditions. There is an apparent dichotomy between a proteomic method that is able to detect low abundance proteins with concentrations in the ng/ml range (like sTnT) but is only able to identify 154 proteins. The explanation for this low number of proteins identified likely involves the high degree of stringency used for protein identification. Protein identifications were made based on two or more unique peptides that unambiguously match to a single protein. Furthermore care was taken to ensure nonredundancy with respect to other protein names, and the amino acid sequences of "unknown" or "similar to" proteins were analyzed for sequence homology and were considered matches if homology was above 95%. Because the probability decreases dramatically for obtaining multiple peptide fragments as the relative abundance of a protein decreases, these results are not surprising. As pointed out above, the large serum databases are based on less stringent identification, which greatly swells the total number of proteins. This difference highlights the need for standards for MS/MS identification.
The presence of cellular "leakage" of proteins in the serum of healthy individuals has major implications in biomarker discovery. For example, the presence of both cardiac TnT and TnI in serum is a gold standard for the diagnosis of acute myocardial infarction. Generally disease-specific biomarkers derived from cellular proteins must meet one of the following criteria: 1) the presence in serum correlates to a disease state (de novo) or 2) the expression level differs between a normal and a diseased state and 3) the presence of a disease statespecific PTM or isoform. The leakage of skeletal muscle proteins into the blood (or proteins from any cell type) establishes a continuous "protein background" that must be overcome to find robust markers. Certainly the ability to unambiguously identify specific isoforms or PTMs will expand the number of unique proteins that can be used.