The Human Erythrocyte Proteome

This report describes an analysis of the red blood cell proteome by ion trap tandem mass spectrometry in line with liquid chromatography. Mature red blood cells lack all internal cell structures and consist of cytoplasm within a plasma membrane envelope. To maximize outcome, total red blood cell protein was divided into two fractions of membrane-associated proteins and cytoplasmic proteins. Both fractions were divided into subfractions, and proteins were identified in each fraction separately through tryptic digestion. Membrane protein digests were collected from externally exposed proteins, internally exposed proteins, “spectrin extract” mainly consisting of membrane skeleton proteins, and membrane proteins minus spectrin extract. Cytoplasmic proteins were divided into 21 fractions based on molecular mass by size exclusion chromatography. The tryptic peptides were separated by reverse-phase high-performance liquid chromatography and identified by ion trap tandem mass spectrometry. A total of 181 unique protein sequences were identified: 91 in the membrane fractions and 91 in the cytoplasmic fractions. Glyceraldehyde-3-phosphate dehydrogenase was identified with high sequence coverage in both membrane and cytoplasmic fractions. Identified proteins include membrane skeletal proteins, metabolic enzymes, transporters and channel proteins, adhesion proteins, hemoglobins, cellular defense proteins, proteins of the ubiquitin-proteasome system, G-proteins of the Ras family, kinases, chaperone proteins, proteases, translation initiation factors, and others. In addition to the known proteins, there were 43 proteins whose identification was not determined.

A human red blood cell (RBC) 1 is in residence in the human circulatory system for 120 days carrying oxygen from the lungs to all tissues within the body and carbon dioxide from the tissues back to the lungs. An RBC is an 8-m biconcave disk bounded by a plasma membrane. The major cytoplasmic constituent is hemoglobin, which is responsible for binding and releasing oxygen and carbon dioxide. On the cytoplasmic surface of the plasma membrane is a two-dimensional meshwork of proteins referred to as spectrin membrane skeleton. The spectrin membrane skeleton renders elasticity and flexibility to an RBC, allowing it to pass through vessels and capillaries that narrow to 1 m in diameter (1).
Because of the ease in obtaining RBCs and because they lack internal organelles, the plasma membrane of this cell type has been studied extensively. The functions of hemoglobin are also well documented. Based on four decades of study, the identity, function, and topology of many RBC membrane proteins have been determined (1)(2)(3). With the advent of modern mass spectrometry (MS) and associated proteomic techniques, determination of the RBC proteome is now plausible. This kind of approach is a necessary first step in understanding how the RBC proteome becomes altered in various hematologic disorders. With this goal in mind, we utilized ion trap tandem MS to analyze the entire human erythrocyte proteome (plasma membrane and cytoplasmic proteins). We identified 181 unique RBC proteins, half of which reside in the plasma membrane and half in the cytoplasm. Moreover, we were able to not only identify the proteins but to also categorize most of them according to function.

EXPERIMENTAL PROCEDURES
Sample Preparation-Human peripheral whole blood was collected in vacutainer tubes containing lithium heparin, sufficient for 10 ml of blood, and used within 24 h. The RBCs were sedimented at 1000 ϫ g for 10 min at 4°C and resuspended in phosphate-buffered saline (PBS; 10 mM NaPO 4 , pH 7.6, 150 mM NaCl) to the original volume (ϳ10 ml) four times. Each time, the upper 1-2 mm layer of packed cells was aspirated along with liquid phase to remove white blood cells. RBCs were transferred to 50-ml centrifugal tubes (2-3 ml of packed cells per tube) and washed with PBS at 4°C three times: the cells were resuspended in 10 volumes of PBS and sedimented at 2000 ϫ g for 10 min.
To maximize the number of identified RBC proteins, membraneassociated proteins, as well as cytoplasmic proteins, were separated and digested into several fractions; each fraction was analyzed as a separate sample. The RBC membranes, cytoplasmic proteins, inside out vesicles (IOV), and membrane skeleton proteins ("spectrin extract") were prepared as described (4) with the following modifications. Five milliliters of PBS-washed and packed RBCs were resuspended in 10 ml of PBS and incubated (gently shaken) with 1.5 mg of tosylphenylalanyl chloromethyl ketone-treated trypsin (Worthington, Lakewood, NJ) for 2 h at room temperature to digest exposed do-mains of membrane proteins. The digested material and the trypsinized RBCs were separated by centrifugation at 2000 ϫ g for 10 min. The digested material was collected (sample 1) and cleared by centrifugation at 32,000 ϫ g for 15 min at 4°C. The separated trypsin-treated RBCs were washed with 10 volume of PBS three times and resuspended (final volume 32 ml) in ice-cold lysis buffer (5 mM NaPO 4, pH 7.6, 1 mM EDTA). The lysed cells and the soluble cytoplasmic proteins were separated by centrifugation at 32,000 ϫ g for 10 min at 2°C. The cytoplasmic proteins (ϳ35 mg protein/ml) were collected and cleared by recentrifugation. Lysed RBCs were further washed with 10 volume of ice-cold lysis buffer (six times) to prepare cell membranes (ghosts, ϳ4 mg protein/ml). Membrane skeleton proteins and IOVs were prepared as follows. The membranes (ϳ5 ml) were washed in 10 volumes of the ice-cold spectrin extraction buffer (0.1 mM EDTA, pH 8.0), then resuspended in 20 ml of the same buffer and incubated at 37°C for 30 min to dissociate membrane skeleton. The extracted membrane skeleton proteins (spectrin extract) and IOV were separated by centrifugation at 250,000 ϫ g for 30 min at 4°C. Spectrin extract (ϳ0.2 mg/ml) was collected. The tight IOV pellet was gently resuspended in bicarbonate buffer (100 mM ammonium bicarbonate, pH 8.2) to the original membrane volume of 5 ml (ϳ3 mg protein/ml).
The IOV were further diluted twice (ϳ1.5 mg protein/ml) in the bicarbonate buffer and incubated with 0.1 mg/ml trypsin for 15 h at 37°C to digest exposed domains of IOV proteins. The digested IOV were separated from corresponding supernatant containing tryptic peptides (sample 2) by sedimentation at 100,000 ϫ g for 30 min at 4°C. In the second set of experiments, the total IOV proteins were solubilized and digested with 0.2 mg/ml trypsin under similar conditions at 16°C in the presence of 1% precondensed Triton X-114 (5). After digestion, the mixtures were incubated at 30°C and centrifuged at 1000 ϫ g for 10 min at room temperature to separate aqueous and detergent phases. The aqueous phase containing digested peptides (sample 3) released from IOV were collected and cleared by centrifugation at 100,000 ϫ g for 30 min.
The cytoplasmic proteins were diluted with 10 mM Tris-HCl, pH 7.8, to 3.7 mg/ml, and 4.6 ml was applied to a Sephacryl S100 HR (bed volume 92 ml) column (1 ϫ 120 cm). The column was equilibrated and eluted with the Tris buffer at 6.0 ml/h flow rate. Three-milliliter fractions were collected, and protein concentrations were determined. Protein was detected in fractions 11-31 (samples 5-25). The samples (5-7, 10 -25) were concentrated by vacuum centrifugation. All cytoplasmic samples were incubated with 8 M urea for 1 h at 37°C, then diluted four times in the bicarbonate buffer (at this point, the protein concentration in different cytoplasmic samples varied from 0.08 to 1 mg/ml) and digested with 0.06 mg/ml trypsin for 15 h at 37°C.
Digested sample aliquots were reduced with 2 mM dithiothreitol for 1 h and "alkylated" in the dark with 20 mM iodoacetamide for 1 h at 37°C in the bicarbonate buffer. The reaction was stopped by addition of 200 mM 2-mercaptoethanol.
Mass Spectrometry-Digested samples were analyzed by microcapillary liquid chromatography in line with tandem MS (LC/MS/MS) using a Surveyor high-performance liquid chromatography (HPLC) system connected to a LCQ DECA XP ion trap mass spectrometer with an electrospray ionization source (ThermoFinnigan, San Jose, CA). Proteins in the tryptic digest (20 l) were separated by reversephase chromatography on a C18 column (2.1 ϫ 150 mm; Thermo Hypersil-Keystone, Bellefonte, PA) at 200 l/min flow rate. Water and acetonitrile with 0.1% formic acid each were used as solvents A and B, respectively. The gradient was started and kept for 5 min at 5% B, then ramped to 60% B in 110 or 165 min, and finally ramped to 90% B for another 15 min. The eluted peptides, singly, doubly, or triply ionized (charge state 1ϩ, 2ϩ, or 3ϩ, respectively) at the electrospray source, were analyzed in data-dependent MS experiments ("big three" or "triple play") with dynamic exclusion. In the "big three" experiments, the data acquisition parameters were set such that each analytical event consisted of four consecutive scans: the first, full MS (m/z 300 -2000) scan was followed by three MS/MS scans on the three most intense peptide ions from the full MS spectrum. In the "triple play" experiments, the first, full MS scan was followed by a zoom (high resolution) scan and an MS/MS scan on the most intense ion from the full scan. A peptide ion, analyzed twice within 30 s, was excluded from re-analysis for 2 min. The spray voltage was set at 4.5 kV; the ion transfer capillary temperature was set at 200°C; and the normalized collision energy for MS/MS decomposition of peptides was set at 35%.
Database Search-A quality MS/MS spectrum, resulting from a peptide fragmentation, features a unique set of b and y fragment ions characteristic to the peptide. The peptide sequence can be identified through interpretation of its MS/MS spectrum and similarity to the MS/MS spectrum of a known peptide sequence. Each acquired MS/MS spectrum was searched against the nonredundant protein sequence database nr.fasta using the SEQUEST software tool (6,7). The software creates theoretical peptides for all, or a limited group of, database proteins; calculates corresponding MS/MS spectra; and compares them to an experimental spectrum (submitted for the database search) to find the match. The database search was restricted to 700 -3500 molecular mass tryptic peptides of human (Homo sapiens) origin. Up to two missed trypsin cleavage sites were allowed, and cysteines, where modified, were considered carbamidomethylated. The acceptable molecular mass difference (mass tolerance) between an experimental and database peptides was set to 1.5 mass unit. Mass tolerance for experimental and calculated MS/MS fragment ions was set to zero. Based on the similarity to the experimental MS/MS spectrum, the software assigns each database peptide the primary score (Sp), then the cross-correlation score (Xcorr) to filter candidate peptides and select a defined number of top hits: database peptides with highest Xcorr scores. Finally, a delta cross-correlation score (dCn, a difference between the top 2 Xcorr values normalized to 1) is calculated. The candidate database peptide with the highest Xcorr score was considered the match if the following identification criteria were met: 1) Xcorr of at least 2.0, 2.2, and 3.5 for singly, doubly, and triply charged peptides, respectively, and 2) dCn of at least 0.1 irregardless of charge state.
All identified peptides were grouped under the proteins of their origin. For each identified protein, the number of identified peptides was counted and the percentage of the covered sequence was calculated, and molecular mass was recorded.

RESULTS
Mature RBC consists of the plasma membrane, resting on the membrane skeleton and surrounding the cytoplasm. Extreme complexity of protein mixture, as well as highly abundant proteins (e.g. hemoglobin), limit the detection of lowabundance proteins by LC/MS/MS analysis of tryptic digest. A peptide coeluted from the HPLC column with more abundant peptides or a high number of other peptides may not be detected even if its absolute concentration in an eluate is within the sensitivity of the mass spectrometer.
To maximize the number of identified proteins, total RBC protein was divided into two major fractions: membrane-associated proteins, including the membrane skeleton, and cytoplasmic proteins. The two fractions were further divided into subfractions, and the proteins were identified in each subfrac- tion separately as tryptic fragments. For membrane subfractions, trypsin digests were collected from externally exposed proteins, internally exposed proteins, spectrin extract consisting mainly of membrane skeleton proteins, and soluble membrane proteins (missing the spectrin extract). Cytoplasmic proteins were divided into 21 subfractions based on molecular mass using size exclusion chromatography. Thus, the complexity of each sample analyzed was minimized and the abundant hemoglobins were separated from the majority of cytoplasmic proteins. We were able to identify 181 protein sequences, which are organized in Tables I and II. Proteins found in the membrane fractions are presented in Table I with spectrin extract proteins indicated by an asterisk. Proteins found in the cytoplasmic fractions are listed in Table II. Data for each identified protein include protein description assigned by SEQUEST, protein identification number (Gi), percent of the covered amino acid sequence, number of identified peptides, and molecular mass. Ninety one unique sequences are listed in Table I for the plasma membrane fractions and 91 unique sequences are itemized in Table II for the cytoplasmic fractions. As expected, some proteins were found in more than one fraction. For example, spectrin subunits, protein 4.1, and tropomyosin 3 were found in the spectrin extract as well as in other membrane subfractions. The sequence coverage (number of identified peptides) for protein 4.1 or tropomyosin 3 was of comparable level in the spectrin extract and IOV fractions. As for spectrin subunits, the sequence coverage was much higher in the spectrin extract than in the other fractions. Hemoglobin subunits were found in both cytoplasmic and membrane fractions. However, the sequence coverage was much higher in the cytoplasmic fractions than in the membrane fractions. For this reason, hemoglobin subunits are listed only in Table II. Glyceraldehyde-3-phosphate dehydrogenase, on the other hand, is listed in both tables because coverage of its sequence was of comparable level in both the membrane and cytoplasmic fractions.
The number of peptides identified for the listed proteins  varies from 1 to 77. Although abundant average-size proteins are identified by several peptides, very low-level proteins usually are detected by a single peptide (8). Thus, the numbers of identified peptides provides a semiquantitative estimate of relative amounts of the different proteins. In this regard, the following should be noted. Proteins are identified through a unique set of detected peptides, characteristic exclusively to that given protein. However, similar proteins (e.g. isoforms) contain regions of identical sequence and may produce a number of identical peptides. The analytical approach used can not distinguish identical peptides originated from different proteins. A peptide detected in an analyzed mixture is assigned to all potential parent proteins found in the same mixture. As a result, the number of peptides identified for the similar proteins, especially for ones of low abundance, may be overestimated. If a set of identified peptides could originate from more than one protein, we list all potential parent proteins of human origin (Table I, positions 37 and 72).
Furthermore, 93 of the proteins identified in this article are based on single peptide assignment, which should be regarded as tentative.
We grouped the 181 identified proteins into different categories as summarized in Fig. 1. Number and percent of proteins included in each category as well as protein positions in Tables I and II are presented in Table III. Proteins that are described as similar to X (example similar to tropomyosin 4) are included under unknown proteins in Table III and Fig. 1.
The two largest groups of identified RBC proteins are membrane skeletal proteins and metabolic enzymes (Fig. 1). Proteins listed in category 5 in Table III (band 7.2b and flotillins) most likely act as separate scaffolding components at the cytoplasmic face of erythrocyte lipid rafts (9). The globins represent the most abundant group of proteins specific to erythrocytes. Identified globins include ␣, ␤, ␥-G, and ␦ chains, embryonic Gower Ii carbonmonoxy hemoglobin F chain, and 1 globin. It should be noted that 1 globin had not previously been found in adult human erythroid or nonerythroid tissues. It should be also noted that though mutants of ␣ hemoglobin (Gi 3212437) and ␤ hemoglobin (Gi 1431650, 18418633) are not included in the tables, mutated peptides specific to these proteins were identified in the analyzed samples. As expected, transporters and channel proteins were found only in the membrane fractions (Table I). Eleven of the 12 identified cellular defense proteins were found in the cytoplasmic fractions. Approximately half of the proteins from these groups were identified with three or more peptides. All 14 identified proteins of the ubiquitin-proteasome system, including six proteasome subunits, were found in the cytoplasmic fractions (Table II). Ubiquitin-activating enzyme was detected with highest sequence coverage (10 identified peptides), followed by polyubiquitin and ubiquitin isopeptidase T with three identified peptides each. All but two of eight identified proteasome subunits were detected with a single peptide.
The category of unknown proteins includes hypothetical proteins whose existence was predicted from the genome sequence; proteins whose existence was shown only at the transcriptome level of different human cell types where the sequences were deduced from cDNAs; and unknown proteins showing similarity to the sequences of known proteins. The majority of the unknown proteins listed in both tables were determined by a single peptide. The primary reasons why some proteins are difficult to identify are low level of expression or extensive post-translational modifications. Nevertheless, we were able to specify 43 unknown proteins, which represents a significant portion (ϳ24%) of all the erythrocyte proteins described in Tables I and II. DISCUSSION The present study represents the first attempt at determining the complete proteome of the human erythrocyte. The human RBC provides an ideal model for proteomic analysis because it combines simplicity with physiologic significance. Although the RBC is simple (e.g. it is devoid of a transcriptome), the proteome of the human erythrocyte provides further insight into its physiological make-up, which can be applied to more metabolically complex cells. A significant insight into the metabolism of the RBC, provided by the present study, is the realization that the cell contains a large number of proteasomal proteins. While six of these proteasomal proteins were identified based on a single-peptide assignment, two (Table II, positions 34 and 37) were based on two-peptide assignment with 21 and 10% sequence coverage, respectively. Previous studies have shown that the human RBC contains many ubiquitinated proteins including spectrin (10), ankyrin (11), and band 3 (12). Spectrin has an E2/E3 ubiquitin-conjugating/ligating activity that targets itself (10) as well as ankyrin (11) and band 3 (12). Ubiquitination of spectrin down-regulates the spectrin-protein 4.1-actin interaction (13) and the spectrin-adducin-actin interaction (14). An earlier report (15) suggested that mature erythrocytes have no ubiquitin-and ATP-dependent protein degradation capacity due to the lack of proteasomes, although they do maintain significant levels of ubiquitin conjugates. The results of our proteomic analysis reveal the likelihood that RBCs do contain at least a remnant of proteasomes, which could maintain a low level of ubiquitin-proteasomal activity.
Using gel filtration, we separated the abundant hemoglobins from the majority of the cytoplasmic proteins to facilitate the detection of lower-abundance proteins. Hemoglobin peak fractions, however, would in addition contain a significant number of other cytoplasmic proteins with molecular mass close to that of hemoglobin. In these fractions, where relative abundance of hemoglobin would remain high, many proteins were most likely not detected. An approach that specifically subtracts hemoglobin (for example immunoaffinity) could further expand the analysis of the RBC proteome.
A recent study combining one-and two-dimensional electrophoresis with matrix-assisted laser desorption/ionizationtime-of-flight (MALDI-TOF) MS identified 84 unique RBC membrane proteins (3). They made no attempt to study RBC cytoplasmic proteins. Interestingly, two major RBC glycoproteins, glycophorin A (ϳ600,000 copies/RBC) and glycophorin C (ϳ50,000 copies/RBC) (1, 2) were not identified in the recent study (3). Glycosylated transmembrane proteins are known to be underrepresented on one-and two-dimensional gels (16), which makes the LC/MS/MS technique of great value when trying to obtain a complete proteome analysis.
We believe that our study provides a strong basis for analysis and interpretation of the physiological competence of the RBC and sets the stage for further protein expression and function-based activity profiling not only of normal healthy erythrocytes but also for RBC pathology as well. Indeed, a more thorough proteomic examination involving LC/MS/MS combined with isoelectrofocusing-SDS-PAGE/MALDI-TOF approaches should afford a means to elucidate the human erythrocyte proteome in its entirety. In turn, a more general understanding and appreciation of the metabolic capability of the RBC and other cells will be realized.