|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 7:2254-2269, 2008.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
The structure of the RBC membrane (a thin layer that constitutes less than 0.1% of the cell thickness and only 1% of its weight) has been well elucidated in the past 35 years both from the normal and pathological metabolic points of view (1, 2) and, more recently, from a structural point of view via extensive proteomics mapping (3). Regarding the cytoplasmic content of the RBC, most studies have focused on a variety of rare enzyme deficiencies with particular regard to disorders of erythrocyte glycolysis and nucleotide metabolism collectively called chronic (or hereditary) non-spherocytic hemolytic anemias (4–6). The most frequent RBC enzyme defects include glucose-6-phosphate dehydrogenase deficiency followed by (with decreasing frequency) pyruvate kinase (PK), glucose-phosphate isomerase, pyrimidine 5'-nucleotidase, triose-phosphate isomerase, and phosphofructokinase deficiencies. Yet it seems unrealistic to think that the full cytoplasmic proteome of RBCs would consist of just a handful of enzymes as known up to present times. In fact, in recent reports, 91 gene products were described by Kakhniashvili et al. (7), and an additional 252 species were identified by Pasini et al. (3) in the 2% non-hemoglobinic proteome. Although these numbers are outstanding, they still fall short of expectations considering that the cytoplasmic proteomic asset would be inherited by the erythroblasts and that it is believed that living human cells should have a genetic asset of >10,000 unique gene products.
The exploration of the cytoplasmic proteome of the RBC represents a challenging analytical task as all the components present in the sample are largely masked by one heavily abundant protein, namely hemoglobin. Direct analysis of a total hemolysate by 2D gel leads to a major, smearing spot of hemoglobin with a very small number of other detectable spots (8). Thus, pretreatment of the sample is mandatory and is often performed by fractionation techniques. In different studies, the RBC cytosol was fractionated respectively by size exclusion chromatography (7), cation exchange chromatography (8), in-solution isoelectrofocusing (9), or 1D SDS-PAGE (3). Although they allow identifying a number of proteins, these generic approaches suffer from several drawbacks including limited loading capacity and inappropriate detection of the proteins co-fractionated with hemoglobin. These studies were based on standard proteomics tools, although in the study by Pasini et al. (3), sophisticated MS instrumentation was adopted that alone substantially increased the depth of proteome exploration.
We have recently described a novel approach for capturing the "hidden proteome," i.e. those rare and very rare proteins that constitute the vast majority in any cell or tissue lysate and in biological fluids (10, 11). It is based on a combinatorial library of hexameric peptide ligands bound to porous beads. Each bead contains billions of copies of a unique hexapeptide ligand distributed throughout its porous structure, and each bead potentially has a ligand different from that of every other bead. With a population of millions of individual peptide ligands obtained by combinatorial chemistry, any protein present in the starting material could theoretically interact with one or a few particular beads. Once the most abundant protein species have saturated their binding sites, the remaining molecules are washed away in the flow-through while minor protein species get progressively enriched on their corresponding beads. Thus, instead of simplifying the complex mixture into fractions or partitioning away the most abundant proteins, this approach captures most of the species present in solution up to the saturation of the solid phase ligand library and greatly reduces the dynamic concentration range of protein present in the sample. This ligand library has been efficiently used for capturing and revealing a very large population of previously undetected proteins from urine (12), serum (13), or platelets (14) as well as for "amplifying" trace impurities present in recombinant DNA products (15, 16). General reviews of this technology have also been recently published (17, 18). Here we apply this technology to the mining of the minority RBC cytoplasmic proteome and show that treatment of a hemolysate efficiently reduced the concentration of hemoglobin in the sample and unmasks numerous previously undetected proteins. We report novel methodological aspects that offer a unique insight on the performance and behavior of two combinatorial peptide libraries composed either by an amino-terminal collection of hexapeptides or by carboxylated hexapeptides. The RBC lysate was treated with both libraries, and captured proteins were analyzed on an LTQ-Orbitrap mass spectrometer. The combination of the peptide affinity-based methodology with high resolution, fast sequencing mass spectrometry allowed detection with high confidence of more than 1500 different protein species.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Collection and Lysis of Red Blood Cells—
80 ml of blood were collected from a healthy consenting donor by venipuncture. The blood was collected in Vacutainer tubes containing lithium heparin. The hemoglobin level from the donor was 14.2 g/dl. The blood sample was centrifuged at 1000 x g at +4 °C for 10 min to eliminate plasma. Erythrocytes were washed four times with PBS + PMSF (154 mM NaCl, 10 mM phosphate buffer, pH 7.4, containing 0.1 mM PMSF). At each step supernatant and buffy coat were removed to eliminate white blood cells and the top RBC layers. Erythrocytes were filtered using an
-methylcellulose column (prepared in 10-ml syringes containing 3 ml of
-methylcellulose and 3 ml of blood). After that, an additional three washing steps were performed. At each step, along with supernatant, the upper RBC layer was removed. Filtered red cells appeared to be deprived of leukocytes. The final sample was evaluated for leukocyte (WBC) and platelet contamination by both electronic counter (LH 750 Beckman Coulter) and microscopic examination and found to be free of each of these cell types. According to the protocol used, the residual WBC content was <1 cell/1 million RBCs. The lysis of RBCs was operated by hypotonic shock. The filtered red cells were diluted to a 1:3 ratio with lysis buffer (5 mM phosphate buffer, pH 7.4, containing 1 mM EDTA and 0.5 mM PMSF) and left in ice for 30 min. At the end of the lysis step, a mixture of five protease inhibitors (Complete protease inhibitor mixture tablets, Roche Diagnostics) was added to the hemolysate. After centrifugation at 36,000 x g for 30 min at +4 °C, the clear supernatant was collected. The final hemoglobin content was 5.59 g in 95 ml of hemolysate for a total protein content of 5.73 g. Hemoglobin represented 97.47% of the total protein content.
Treatment of Proteins from Red Blood Cell Extraction—
The initial pretreated and clarified protein solution was first mixed with NaCl to reach a physiological ionic strength and then loaded on a column (6.6-mm inner diameter x 32 mm in length) containing 1 ml of Library-1 at a flow rate of 0.25 ml/min. The column effluent was continuously injected in a second column of the same dimensions packed with Library-2. The columns connected in series were then washed with PBS until reaching the UV base line of the effluent of the second column. After the wash each individual column was subjected to three distinct elutions using, respectively, a TUC solution (2 M thiourea, 7 M urea, 2% CHAPS), a UCA solution (9 M urea, citric acid up to pH 3.3), and a hydro-organic solution (HOS) composed of 6% (v/v) acetonitrile, 12% (v/v) isopropanol, 10% (v/v) ammonia at 20%, and 72% (v/v) water. The six eluates were immediately neutralized, submitted to protein content analysis by the Bradford-Lowry standard spectrophotometric method, and then desalted by dialysis at 4 °C against a 10 mM ammonium carbonate solution (cutoff of dialysis membrane was 1000 Da) followed by lyophilization.
Analysis of Red Blood Cell Lysate Fractions by SDS-PAGE—
10 µl of each sample were mixed with 10 µl of Laemmli buffer (4% SDS, 20% glycerol, 10% 2-mercaptoethanol, 0.004% bromphenol blue, 0.125 M Tris HCl, pH approximately 6.8) from Bio-Rad. The mixture was heated in boiling water for 5 min and immediately loaded in the gel. The SDS-PAGE gel was composed of a stacking gel (125 mM Tris-HCl, pH 6.8, 0.1% SDS) with a large pore polyacrylamide gel (4%) cast over the resolving gel (8–18% acrylamide gradient in 375 mM Tris-HCl, pH 8.8, 0.1% SDS buffer). The cathodic and anodic compartments were filled with Tris-glycine buffer, pH 8.3, containing 0.1% SDS. The electrophoretic run was performed by setting a voltage of 100 V until the dye front reached the bottom of the gel. Staining and destaining were performed with Colloidal Coomassie Blue and 7% acetic acid in water, respectively. The SDS-PAGE gels were scanned with a Versa-Doc image system (Bio-Rad).
Analysis of Red Blood Cell Lysate Fractions by SELDI-TOF MS—
Protein fractions at appropriate concentration, i.e. 0.02 µg/µl, were deposited upon ProteinChip® array surfaces using a Bioprocessor device. Two types of arrays were selected: CM10 (weak cation exchanger) and IMAC 30 (immobilized metal ion affinity capture) loaded with copper ions. Each array contained eight distinct spots over which the adsorption of protein could be performed. After applying the samples (the starting material and the three first eluates from each combinatorial peptide ligand library), the chip surfaces were washed to remove non-associated protein, then dried, and prepared for the analysis after application of 1 µl of energy-adsorbing matrix solution composed of a saturated solution of sinapinic acid in 50% acetonitrile and 0.5% trifluoroacetic acid. All arrays were then analyzed with a PCS 4000 ProteinChip MS reader. The instrument was used in a positive ion mode with an ion acceleration potential of 20 kV and a detector gain voltage of 2 kV. The mass range investigated was from 3 to 20 kDa. The laser intensity was set between 200 and 250 units according to the sample tested. The instrument was mass-calibrated with a kit of a standard mass mixture, "All-in-1 protein standard."
2D PAGE Analysis—
The desired volume of each non-treated sample and eluates was solubilized in the "2D sample buffer" (7 M urea, 2 M thiourea, 3% CHAPS, 40 mM Tris, 5 mM tributylphosphine, 10 mM acrylamide) to a final concentration of 2 mg/ml protein, and the alkylation reaction was allowed to proceed at room temperature for 60 min. To stop the alkylation reaction, 10 mM DTT (diluted from a bottle of neat DTT) was added to the solution followed by 0.5% Ampholine (diluted directly from the stock, 40% Ampholine solution) and a trace amount of bromphenol blue. 18-cm-long IPG strips (Bio-Rad), pH 3–10, were rehydrated with 400 µl of protein solution for 5 h. IEF was carried out with a Protean IEF cell (Bio-Rad) in a linear voltage gradient from 100 to 2000 V for 5 h and 2000 V for 4 h followed by an exponential gradient up to 10,000 V until each strip was electrophoresed for 25 kV-h. For the second dimension, the IPG strips were equilibrated for 25 min in a solution containing 6 M urea, 2% SDS, 20% glycerol, 375 mM Tris-HCl (pH 8.8) under gentle shaking. The IPG strips were then laid on an 8–18% acrylamide gradient SDS-PAGE gel with 0.5% agarose in the cathode buffer (192 mM glycine, 0.1% SDS, Tris to pH 8.3). The electrophoretic run was performed by setting a current of 5 mA/gel for 1 h followed by 10 mA/gel for 1 h and 15 mA/gel until the dye front reached the bottom of the gel. Gels were incubated in a fixing solution containing 40% methanol and 7% acetic acid for 1 h followed by silver staining. Destaining was performed in 7% acetic acid until the background was clear followed by a rinse in pure water. The two-dimensional electrophoresis gels were scanned with a Versa-Doc image system (Bio-Rad) by fixing the acquisition time at 10 s; the relative gel images were evaluated using PDQuest software (Bio-Rad). After filtering the gel images to remove the background, spots were automatically detected, manually edited, and then counted.
LC-MS/MS as Analytical Method for the Identification of Proteins in Lysate Fractions—
100 µg of each elution fraction as well as 2000 µg of the initial non-treated RBC lysate were diluted in Laemmli buffer and boiled for 5 min before being separated by 12% acrylamide SDS-PAGE. Proteins were visualized by Coomassie Blue staining. Each lane was cut into 20 or 22 homogenous slices that were washed in 100 mM ammonium bicarbonate for 15 min at 37 °C followed by a second wash in 100 mM ammonium bicarbonate, acetonitrile (1:1) for 15 min at 37 °C. Reduction and alkylation of cysteines were performed by mixing the gel pieces in 10 mM DTT for 35 min at 56 °C followed by 55 mM iodoacetamide for 30 min at room temperature in the dark. An additional cycle of washes in ammonium bicarbonate and ammonium bicarbonate/acetonitrile was then performed. Proteins were digested by incubating each gel slice with 0.6 µg of modified sequencing grade trypsin (Promega) in 50 mM ammonium bicarbonate overnight at 37 °C. The resulting peptides were extracted from the gel by three steps: a first incubation in 50 mM ammonium bicarbonate for 15 min at 37 °C and two incubations in 10% formic acid, acetonitrile (1:1) for 15 min at 37 °C. The three collected extractions were pooled with the initial digestion supernatant, dried in a SpeedVac, and resuspended with 14 µl of 5% acetonitrile, 0.05% trifluoroacetic acid.
The peptide mixtures were analyzed by nano-LC-MS/MS using an Ultimate3000 system (Dionex, Amsterdam, The Netherlands) coupled to an LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). 5 µl of each sample (for the slices at hemoglobin size in the non-treated sample, the sample was 2x diluted) were loaded on a C18 precolumn (300-µm inner diameter x 5 mm, Dionex) at 20 µl/min in 5% acetonitrile, 0.05% trifluoroacetic acid. After 5-min desalting, the precolumn was switched on line with the analytical column (75-µm inner diameter x 15 cm, PepMap C18, Dionex) equilibrated in 95% solvent A (5% acetonitrile, 0.2% formic acid) and 5% solvent B (80% acetonitrile, 0.2% formic acid). Peptides were eluted using a 5–50% gradient of solvent B during 80 min at 300 nl/min flow rate. The LTQ-Orbitrap was operated in data-dependent acquisition mode with the Xcalibur software. Survey scan mass spectra were acquired in the Orbitrap in the 300–2000 m/z range with the resolution set to a value of 60,000. The five most intense ions per survey scan were selected for CID fragmentation, and the resulting fragments were analyzed in the linear trap (LTQ). Dynamic exclusion was used within 60 s to prevent repetitive selection of the same peptide.
Database Search and Data Analysis—
The Mascot Daemon software (version 2.2.0, Matrix Science, London, UK) was used to perform database searches in batch mode with all the raw files acquired on each gel lane. To automatically extract peak lists from Xcalibur raw files, the ExtractMSN macro provided with Xcalibur (version 2.0 SR2, Thermo Fisher Scientific) was used through the Mascot Daemon interface. The following parameters were set for creation of the peak lists: parent ions in the mass range 400–4500, no grouping of MS/MS scans, and threshold at 1000. A peak list was created for each fraction analyzed (i.e. gel slice), and individual Mascot searches were performed for each fraction. Data were searched against all entries in the Internation Protein Index (IPI) human v3.34 protein database (67,764 sequences). Carbamidomethylation of cysteines was set as a fixed modification and oxidation of methionines and protein amino-terminal acetylation were set as variable modifications for all Mascot searches. Specificity of trypsin digestion was set for cleavage after Lys or Arg except before Pro, and one missed trypsin cleavage site was allowed. The mass tolerances in MS and MS/MS were set to 5 ppm and 0.6 Da, respectively, and the instrument setting was specified as "ESI-Trap." Mascot results were parsed with the in-house developed software Mascot File Parsing and Quantification (MFPaQ) version 4.0 (19), and protein hits were automatically validated if they satisfied one of the following criteria: identification with at least one top ranking peptide with a Mascot score of more than 41 (p value < 0.001) or at least two top ranking peptides each with a Mascot score of more than 24 (p value < 0.05). When several proteins matched exactly the same set of peptides, only one member of the protein group was reported in the final protein lists in supplemental data 1 and 2 for more clarity (the one returned by Mascot in the protein summary list). However, detailed protein groups are shown in supplemental data 3 and can be viewed at http://proteomique.ipbs.fr/rbc through the MFPaQ interface using the Mozilla Firefox Web browser. Moreover in each Mascot result file, the MFPaQ software detected highly homologous Mascot protein hits, i.e. proteins identified with top ranking MS/MS queries also assigned to another protein hit of higher score (i.e. red, non-bold peptides). These homologous protein hits were validated and included in the final list only if they were additionally assigned a specific top ranking (red and bold) peptide of score higher than 41. Otherwise they were considered as proteins matching a subset of peptides from another hit and were eliminated from the final lists shown in supplemental data 1, 2, and 3 (however, these homologous entries can be viewed at http://proteomique.ipbs.fr/rbc through the MFPaQ interface where they are indicated in italic). From all the validated result files corresponding to the fractions of a 1D gel lane, MFPaQ was used to generate a unique, non-redundant list of proteins found in different gel slices by creating clusters of protein groups (composed of all the protein sequences matching the same set of peptides) if they have at least one common member. Global protein lists corresponding to the merging of several protein lists (for example, the three different eluates from one peptide library column corresponding to three different 1D gel lanes) were generated in the same way to eliminate redundancy. Protein list comparisons were also based on the comparison of protein groups (hits matching the same set of peptides) from different lists, and the software assigned these protein groups as "shared" or "specific" depending on whether or not they have common members. The contaminant proteins, i.e. desmoplakin, desmoglein-1, desmocollin-1, and all keratins, were manually removed from the protein lists. To evaluate the false positive rate in these large scale experiments, all the initial database searches were performed using the "decoy" option of Mascot, i.e. the data were searched against a combined database containing the real specified protein sequences (target database, IPI human) and the corresponding reversed protein sequences (decoy database). MFPaQ used the same criteria to validate decoy and target hits, computed the false discovery rate (FDR = number of validated decoy hits/(number of validated target hits + number of validated decoy hits) x 100) for each gel slice analyzed, and calculated the average FDR for all slices belonging to the same gel lane (i.e. to the same sample). The FDR found for the analysis of the non-treated sample and of the six different eluates after peptide library treatment ranked between 0.33 and 1.68% with an average of 1.16%.
Reproducibility of LC-MS/MS Analysis—
100 µg of UCA eluate of Library-1 were separated by 12% SDS-PAGE, and 20 gel slices were cut and treated as described above. After peptide extraction and drying, the samples were resuspended in 17 µl of 5% acetonitrile, 0.05% trifluoroacetic acid. The 20 samples were successively analyzed on the nano-LC-MS/MS system, and this analysis was made in triplicate. Mascot was used for database searches, and MFPaQ was used for protein validation, generation of protein lists, and comparison of these lists as described above.
Nano-LC/MSMS Quantitative Experiment after Peptide Library Treatment—
100, 300, or 1000 pmol of yeast alcohol dehydrogenase were added to 680 mg of red blood cell lysate prior to treatment with a 0.5-ml Library-1 column. The column was then washed as described above and eluted with UCA buffer. Three replicate treatments were performed for each ADH amount. 125 µg of each of the nine eluates were separated by 12% SDS-PAGE, and a slice was cut in each lane at the ADH molecular mass. The gel slices were treated as described above, and samples were resuspended in 17 µl of 5% acetonitrile, 0.05% trifluoroacetic acid and analyzed three times by nano-LC-MS/MS with the same method described above. Xcalibur software was used to perform the ion current chromatogram extraction of ADH peptides identified by MS/MS with a mass tolerance of 5 ppm and to perform the area integration of the corresponding peaks using the ICIS algorithm. To estimate the abundance of the protein in the different samples, the extracted ion chromatograms (XIC) was integrated for eight parent ions assigned to ADH, summed area values were calculated from these different ion peaks, and average values were then calculated from the three different MS injections for the same sample.
| RESULTS |
|---|
|
|
|---|
|
and β chains including some oligomeric globin aggregates that are mostly dimers and trimers, constitutes 98% of the entire proteome. From the remaining 2% one can scarcely count some 80 spots. Conversely the map with the mixture of all eluates (lower rightmost panel) appears to be fully covered by spots, approximately 950 via PDQuest count, covering the entire pI 3–10 and the molecular mass interval from 10 to 250 kDa, and this notwithstanding the fact that in the latter half as much protein (640 µg) had been loaded as compared with the control (1300 µg). The comparison between TUC eluates shows a relatively limited number of differences: although the eluate from Library-1 appears more populated with proteins, most of them are located within similar areas. Conversely the comparison of UCA eluates clearly shows two types of areas covered by protein spots: dominantly acidic with medium-high masses in the eluate from Library-1 and mostly alkaline with medium-low masses in the eluate from Library-2.
|
|
Effect of the Combinatorial Library Treatment on Nano-LC-MS/MS Analytical Coverage—
The numbers given above show that treatment of the sample with the two peptide libraries greatly enhanced the number of proteins identified by nano-LC-MS/MS; this must result from a better MS/MS sampling of the ions detected. To illustrate that effect, we compared the MS/MS queries assigned to identified proteins in the sample before treatment versus the UCA eluate from Library-1. A total number of 429 proteins were identified in common between these two samples and are represented in the histogram in Fig. 4 from high to low Mascot score, reflecting protein abundance (decreasing abundance, from left to right). In the upper part of the graph, the number of MS/MS spectra performed for each protein from the non-treated RBC lysate sample is plotted, whereas in the lower part, the corresponding number of MS/MS spectra is plotted for the same proteins identified in the UCA eluate. For the non-treated sample, there is an exponential decay of the number of MS/MS spectra when going toward the region of low abundance proteins, reflecting the large dynamic range of the sample. Conversely for the UCA eluate, although some proteins in the high abundance region still show high values of MS/MS, such values remain at a much higher level even in the low abundance region. This illustrates the effect of peptide ligand library treatment on the concentration of proteins in the cell lysate; this can be further appreciated by looking at the sequence coverage obtained for two particular proteins, one in the high abundance region and one in the low abundance region (Fig. 4, A and B). These panels show tables summarizing the number of peptide sequences and number of MS/MS queries that were assigned to the protein. For a very abundant protein like peroxiredoxin 2 (A), 25 unique peptides were identified by Mascot, giving a sequence coverage of 89% in the case of non-treated sample. However, to achieve this result as many as 1018 MS/MS spectra were acquired during the analysis of the gel slice where this protein was most abundant. Clearly very abundant ions from this protein were eluted continuously during the nano-LC run, probably suppressing the signal of lower abundance species, and were repeatedly sequenced by the mass spectrometer. Conversely in the eluate from the first library column, Mascot identified 14 peptides for this protein (73% sequence coverage), but "only" 71 MS/MS spectra were acquired during the run. Thus, by avoiding acquiring too much redundant information on that protein, the mass spectrometer could spend more time on less abundant species. This effect can be seen on a minor protein like pyrimidine 5'-nucleotidase, an important enzyme involved in the catabolism of nucleotides, whose deficiency in RBCs is often associated with hereditary non-spherocytic hemolytic anemia. Although this protein was barely identified in the starting material with three MS/MS spectra and three peptides, it was very well detected in the eluate from the column with 48 MS/MS spectra and 24 peptides. Thus, as can be seen on the global histogram, less time was spent on abundant species, and more time was spent on minor species. Moreover the treatment enabled identification of 625 additional proteins that were not detected at all in the starting material either because the signal of the corresponding ions was suppressed or because the mass spectrometer missed them for MS/MS sequencing.
|
|
Regarding subcellular location, a majority of proteins were categorized as cytoplasmic by the two softwares. For example, out of 535 proteins identified in the non-treated material, 348 were annoted with a GO term related to cellular localization, and Gominer classified 72% of these annotated proteins as cytoplasmic. However, a relatively high number of proteins were also classified as nuclear, i.e. 28% in the non-treated RBC lysate and up to 33% in the treated sample. It must be noted that some overlap may exist between these categories as many proteins assigned with the GO term "nuclear" can also have a possible localization in the cytoplasm. Although the RBCs are enucleated when they enter the blood circulation, the presence of proteins classified as nuclear probably reflects the fact that a certain number of proteins that normally stay in the nucleus in the erythroblasts are still present in the RBC cytosol.
GoMiner and Ingenuity were also used to perform a functional classification of the proteins identified in this study. This was done on the lists obtained either before or after treatment of the sample. The two softwares assign the proteins to different functional classes based either on the annotation of protein entries with GO terms in protein databases (GoMiner) or on the data contained in a proprietary, in-house curated knowledge database (Ingenuity). The latter tool can also assign the proteins to specific, well defined canonical pathways related to various biological processes that are contained in the knowledge database. Both programs provide an "enrichment" or "ratio" value measuring the extent to which a functional class or pathway is represented in the list of interest compared with a background reference. In the case of GoMiner analysis, we provided a reference list constituted by the entire set of protein entries from the IPI human database. Thus, the software compares the number of proteins from the list assigned in a particular category with the number of proteins in the IPI database assigned to the same category. A p value and a false discovery rate reflecting the statistical significance for enrichment of categories are also computed. In the case of Ingenuity analysis, ratios and p values are automatically computed both for functional classes and for canonical pathways. Fig. 6 shows the best represented canonical pathways in the non-treated RBC lysate, ranked by decreasing ratio values, and their corresponding ratios in the treated sample as calculated by Ingenuity. Most of them seem relevant in light of the known function of the RBC and of the known biological processes taking place in that cell: degradation of proteins (protein ubiquitination pathway), degradation of nucleic acids (purine and pyrimidine metabolism pathways), anaerobic glycolysis (pentose phosphate and glycolysis pathways), and protection against oxidative damages because of the high content of oxygen in the cell (oxidative stress response and glutathione metabolism pathways). These major processes of the RBC give high ratio values for the related pathways in the non-treated as well as in the treated samples. However, some unexpected pathways are also highlighted by the software in the case of the treated sample, like various signaling pathways. This may be due to the fact that many minor species were identified in the treated RBC lysate. For example, several mitogen-activated protein kinases or mitogen-activated protein kinase (MAPK)-interacting proteins were found in the treated RBC lysate (eight versus three in the untreated sample), accounting for the high scoring attributed to signaling pathways by Ingenuity. Although these signaling cascades may no longer be active in mature RBCs, several molecules taking part in them are still present in the cell, and their identification after treatment of the sample with peptide libraries illustrates the in-depth characterization of the proteome allowed by the described method. Comparable results were obtained when performing functional annotation of the protein lists using the GoMiner software. The list of GO terms and their number of associated proteins along with their enrichment value, p value, and FDR computed by GoMiner for each protein list is given in supplemental data 5. Among the best enriched GO biological processes, many GO categories corroborating the major expected function of RBCs can be found either in treated or non-treated samples (e.g."gas transport," "heme metabolic process," "glycolysis," "oxygen and reactive oxygen species metabolism," etc . . . ). However, in the case of treated RBC lysate, some unexpected GO terms appear as well ranked in the list, although they may not reflect a real process taking place in RBCs. Again this indicates that many low abundance species, probably inherited from erythroblasts, are still present in RBCs and could be identified thanks to the peptide libraries beads.
|
| DISCUSSION |
|---|
|
|
|---|
To increase the analytical coverage of body fluids characterized by a large dynamic concentration range of proteins, many strategies, mostly based on sample prefractionation or on immunodepletion of major species, have been extensively described. Although prefractionation tools based on general physicochemical parameters of proteins (size, charge, and hydrophobicity) may be useful to segregate some abundant proteins in specific fractions, thus enhancing the detectability of minor species in other fractions, they often suffer from several limitations such as implementation difficulties, limited loading capacity, questionable reproducibility, and limited selectivity power. This last point implies that highly abundant species often co-fractionate with many other minor proteins that will thus still be undetectable. In the case of immunosubtraction methods, which are based on specific antibodies raised against the most abundant proteins of the sample, a major reported drawback is probably the co-depletion and loss of many proteins either bound to the target protein to be subtracted or in some way recognizing the capturing antibody (22). In our experience, using the approach described here, very few proteins are lost while the relative concentration of minor species is increased. One limitation of the approach is to load enough material on the peptide library column to saturate the binding sites of abundant proteins. This may not be possible in clinical studies dealing with limited amounts of biological samples, and further improvements will have to be made to scale down the peptide library bead treatment in such cases. However, when the fluid is easily available, huge amounts of sample can be processed (5.6 g of protein from RBC lysate in the present study), further incrementing the enrichment of the final sample in low abundance proteins in a quite reproducible manner (see below).
Using this method on the RBC lysate, we could show here that a better MS/MS sampling of the ions from minor species was obtained as a result of the reduction of protein concentration dynamic range. In a recent study, de Godoy et al. (23) examined the different parameters that affect the analytic coverage of complex mixtures by nano-LC-MS/MS. They report that the major limitation is not the sensitivity of the mass spectrometer but rather a combination of effective sequencing speed (the mass spectrometer is not fast enough to sequence all the ions detected) and dynamic range (high abundance peptides prevent detection of low abundance peptides because of signal suppression). The use of sample treatment with peptide library beads clearly alleviates these instrumental limitations and allows globally more MS/MS on a higher number of species. It appears that after treatment of the sample the limitation due to sequencing speed seems no longer to be a crucial issue. Indeed reproducibility of the protein lists obtained was shown to be about 95%, a number significantly higher than typical values obtained for analysis of complex proteomes by nano-LC-MS/MS, suggesting that almost complete MS/MS sampling of the ions detected was performed and that nearly everything that was detected was probably also sequenced. It is possible, however, that some proteins of the sample were not identified either because they did not find a good ligand in the combinatorial peptide libraries and did not bind efficiently to the beads (this is probably reflected in the fact that 56 proteins identified in the non-treated sample could not be found after treatment) or because they were too dilute in the initial sample and the enrichment provided by the method was still insufficient to bring the concentration of these species into the 104 dynamic range typically explored by the mass spectrometer. However, the fact that very minor species, never detected before, could be identified in this study (discussed below) suggests that the list of RBC cytoplasmic proteins provided here is probably as exhaustive as possible with current technology.
Normal Peptide Library Versus COOH-modified Peptide Library—
The use of two peptide libraries was justified by their complementary behavior already described in the case of platelet extracts (14) and bile fluid protein discovery (24). Notwithstanding the large protein overlap, the use of the second library always allowed detection of additional proteins that were not captured by the first set of peptides (16% more in the present case). Clearly the different behavior is attributed to the difference of the terminal group of the peptides chains. The presence of weak cationic (primary amine) or anionic (carboxylic group) character on the terminal group may induce a difference of affinity between the peptide chain and the bait protein because of the modification of the dissociation constant between the peptide and the partner proteins present in the initial crude lysate. Generally peptides with carboxylic terminal groups interact more efficiently with cationic proteins, whereas amino-terminal peptides interact mostly with anionic proteins. This behavior is illustrated by the 2D gels shown in Fig. 2, panels UCA, Library-1 and UCA, Library-2. Although Library-1 contributed to a larger number of exclusively captured species (see Fig. 3B), it did not have all necessary baits to capture all proteins from the crude extract. 254 additional proteins were found thanks to the use of Library-2. This represents about 16% of proteins that would not have been detected without the use of the second library.
Interestingly when observing the amount of proteins eluted by the sequence of desorbing agents, UCA was the most efficient desorbing solution for Library-1. A possible explanation is that when using citric acid, a trivalent compound bearing three negative charges, such a negative surface would efficiently compete for the amino binding sites of the hexapeptides on the beads, thus helping to discharge all acidic proteins by breaking the ionic bonds while the hydrogen bonds would be loosened by urea. Moreover it can be seen on the 2D gel maps that the UCA eluates from the two libraries present an overlap that is significantly lower than that for the TUC eluates. Clearly acidic proteins are preferentially eluted by UCA in the case of Library-1. The HOS eluent, a hydro-organic mixture designed for the desorption of highly hydrophobic proteins, allowed collection of only a minor amount of proteins but contributed to enlarge the list of discovered species by 201 additional gene products. This relatively modest number is not a real surprise because all proteins identified in the present study were from the soluble part of the red blood cell.
Quantitative Information—
Treatment with peptide ligand libraries is intrinsically an approach that will modify the abundances of proteins in the treated versus initial sample. Highly abundant species should quickly saturate their partner baits on the beads, and their final concentration in the treated sample will be greatly decreased. On the other hand, the enrichment factor of other species will depend on the affinity of a given protein toward its recognized ligand, and the order of abundance of diverse proteins in the final sample may be changed. It was thus important to assess whether or not the method could be used for differential proteomics studies, i.e. whether proteins differentially expressed in samples to be compared are still found with the same differential expression ratio after treatment of these samples. In other words, although the enrichment -fold during the treatment may vary from protein to protein in the same sample, it should be reproducible for a given protein present in different samples treated in parallel. To assess that, different amounts of a standard protein were spiked in samples of RBC lysate, and replicate peptide library treatments were performed on these samples. We show here that the treatment seems to be very reproducible as indicated by the XIC area measurements of the peptide ions corresponding to the spiked protein. Indeed for a given amount of this protein, the standard deviation of the values measured for different replicate treatments was lower than the standard deviation obtained for the different MS measurements themselves performed on the same sample. Moreover for different amounts of the spiked protein, the theoretical initial ratio was found to be nearly the same after treatment of the samples. Thus, as long as the protein does not saturate the beads, the relative quantitative information on this protein between different samples is well conserved during the treatment. This will need to be confirmed on a larger number of proteins, and a more precise evaluation of the number of proteins that effectively saturate the beads and for which the relative quantitative information should be lost (at least in the case of label-free approaches) will have to be performed. However, our preliminary experiment, as well as the successful identification of several specific differentially expressed species in proteomics analysis using this methodology,2 suggests that treatments with peptide libraries can be used in the context of differential studies, for example on RBC pathologies or for biomarker discovery in body fluids.
Biological Significance of the Proteins Identified—
Many of the proteins identified in this study are well known components of the RBC cytoplasm that accomplish important functions in this cell. Among them are of course all the hemoglobin chains but also the proteins involved in essential metabolic functions of the RBC such as glycolysis or protection against oxidative damage described in more detail below. However, more unexpected classes of biological processes or cellular localization were also found when automatic annotation of the list was performed with software like Ingenuity Pathway Analysis or GoMiner. Considering the high number of proteins identified, it is likely that some of them are inherited from the erythroblast but are present in the RBC in an inactive state. Differentiation of RBCs occurs in well defined stages in the bone marrow from proerythroblasts to early erythroblasts and then late erythroblasts. In these nucleated intermediates, many pathways are activated to establish the differentiation program in response to microenvironment stimulation and particularly erythropoietin signaling. This promotes adequate protein synthesis by ribosomes and initiates the accumulation of hemoglobin and iron. Then through association with macrophages in erythroblastic islands, late stage erythroblasts enucleate and thereby become reticulocytes. At this stage, the cell should contain all the information necessary to complete the specialization into RBC and survive in the absence of transcription and new gene activity. A large pool of RNA is still present in reticulocytes, which maturate within a few days by completing the synthesis of hemoglobin and other proteins that characterize mature RBCs and degrading internal organelles and "useless" proteins. In addition, reticulocytes migrate from the bone marrow into the blood circulation in which they terminate their maturation into RBCs. In this study, we extensively identified the proteins from the degradation machinery that actively eliminates superfluous species in reticulocytes and RBCs, i.e. all the subunits of the proteasome catalytic core (20 S proteasome) and regulatory complex (19 S proteasome) but also a wide range of E1, E2, and E3 enzymes from the ubiquitin-proteasome pathway (supplemental data 6). In particular, many E3 ubiquitin-protein ligases could be identified after sample treatment. These enzymes, which are responsible for targeting ubiquitination to specific substrate proteins and promoting their proteasomal degradation, should be particularly active in reticulocytes and RBCs where many proteins are no longer needed. It is questionable whether all the proteins that we identify in our final list are all genuine components of mature RBCs or whether some of them are incompletely degraded products that could be more specific to the reticulocytes also present in the sample. Although reticulocytes represent only 0.5–1% of the total cell population and should weakly contribute to the total protein content of the sample, minor proteins arising from these cells may be amplified because of the peptide libraries treatment. On the other hand, the degradation process initiated in reticulocytes is perpetuated in RBCs, and it is likely that a lot of proteins identified in this study are still present in the mature cells while being in the process of being degraded. It is also possible that many non-essential proteins are not necessarily degraded in the RBC. Indeed this would cost a lot of energy, whereas a better "solution" for the erythroblasts would be to just leave them intact but to specifically degrade some key proteins along their pathways. More focused studies would be needed to test this hypothesis and verify whether some of the identified proteins are present in the RBC in an active state or represent leftovers from the degradation process and to elucidate in more detail the mechanisms of this process. A lot of identified species, however, represent essential components of the mature cell, and some of them are discussed below.
Hemoglobins—
Among the proteins of red blood cells, the most abundant is hemoglobin, which is in charge of oxygen transport from periphery to tissues. It represents 97–98% of the whole proteome. In vertebrates, hemoglobin is a heterotetramer composed of two types of subunits: the first type belongs to the
-globin family (
or
chains) and is dependent on genes located on chromosome 16; the second type belongs to the β-globin family (β,
,
, or
chains) with genes located on chromosome 11. Switches in globin gene expression appear during development; thus the embryonic
and
chains are replaced in the fetus by
- and
-globins as well as a little bit of
-globin. Finally in adult,
and β chains are the most abundant, forming the HbA tetramer (
2β2), but some
- and
-globin chains are also present to form other types of hemoglobin. Mutations or deletion on globin gene loci can lead to different pathologies such as thalassemia (total or partial deficit of expression of one or several globin chains sometimes associated with abnormal persistence of fetal or embryonic globin forms) or other hemoglobinopathies (expression of globin variants) (25). In our study from healthy adult erythrocytes, the four chains
, β,
, and
were identified in both non-treated and treated samples with a very high degree of coverage for the two major chains:
, 99.3% and β, 97.3% (supplemental data 6). Interestingly the two embryonic chains,
(in both samples) and
(in treated sample only), have also been identified confirming the observations made at the mRNA level that the corresponding genes are incompletely silenced during development (26, 27). In addition, in the present study we identified in the treated sample two other globin chains called µ and
. The gene of µ chain has been found recently on the
-globin locus and has a high level of homology with the avian
D-globin (26). It has been detected in erythroid tissue by quantitative real time PCR (26) and recently identified by mass spectrometry by Pasini et al. (3), although its exact function is still unknown. Furthermore we identified, for the first time at the protein level, another globin chain called
. There has been some debate upon the existence or not of a protein product for the
gene (28). However, the mRNA expression of this gene was demonstrated in fetal erythroid tissue (29), and it was shown to be a fetal/adult gene expressed like the
-globin in fetus and adult but at a very low level (30). The identification in our study of these low expression globin chains confirms the ability of the peptide library treatment to reveal proteins usually hidden by major proteins (
and β chains in this case).
Other Major RBC Pathways—
The lifespan of RBCs is about 120 days during which the cell must generate energy to accomplish its vital functions. This is done by degrading glucose to lactate through the Embden-Meyerhof pathway with concomitant production of ATP (5). Efficient generation of ATP also relies on proper nucleotide metabolism. Indeed as RBCs cannot synthesize nucleotides de novo, they have evolved mechanisms that preserve the pool of adenine nucleotides in the cell by recycling adenine and adenosine into AMP and maintaining an equilibrium between AMP, ADP, and ATP. In addition, a small amount of glucose is also diverted and catabolized through the hexose monophosphate shunt (pentose phosphate pathway), which provides NADPH and reduced glutathione that are essential to protect the cell from oxidative damages. Protection against oxidation is critical in RBCs where the levels of oxygen and iron are very high. Therefore, any dysfunction in these pathways may have important deleterious effects, and many pathologies of the RBC are linked to inappropriate production of one enzyme involved in these metabolic functions. Those related to glycolysis or nucleotide metabolism hamper the generation of ATP and generally lead to chronic nonspherocytic hemolytic anemia, whereas those affecting the hexose monophosphate shunt will cause hemolytic episodes when RBCs are submitted to oxidative stress. In our study of the RBC cytoplasm, we identified most of the actors of these major pathways (supplemental data 6), particularly the enzymes involved in the most frequent metabolic RBC disorders such as pyruvate kinase, glucose-6-phosphate isomerase, or triose-phosphate isomerase. Although many of these proteins could be identified also in the non-treated sample, some particular species were detected only after peptide library treatment. This is the case for the PK-M form of pyruvate kinase. This enzyme catalyzes the second crucial ATP-generative step in glycolysis and exists as four isoenzymes (M1, M2, L, and R) encoded by two separate genes (PK-M and PK-LR), which are expressed in a tissue-specific manner (31, 32). The PK-LR gene codes for both PK-L (dominant form in liver) and PK-R (restricted to RBCs) through the use of alternate promoters. PK-R deficiency is the most common glycolytic enzyme defect associated with hereditary non-spherocytic hemolytic anemia. This gene product was identified in this study with high sequence coverage in the treated as well as non-treated RBC lysate. On the other hand, the PK-M gene codes for M1 and M2 isoenzymes by alternative splicing of the same RNA (33). The M1 protein predominates in skeletal muscle, heart, and brain, whereas the M2 isoenzyme is found primarily in rapidly proliferating fetal tissues. The M2 form is also found in erythroblasts where it is progressively replaced by the R form as the cells maturate into RBCs (34). In pathologies related to PK deficiency, persistence of the fetal M2 form can be observed in mature cells possibly to compensate for the lack of the R isozyme. Moreover PK-M2 has been shown to be overexpressed in tumor cells (35) and is evaluated as a diagnostic biomarker in many cancer types (36–40). The fact that we could specifically detect this form in the treated lysate from RBC, although it is probably present at very low amounts in these healthy, mature cells, illustrates the performance of the method for low abundance species identification and its potential for biomarker discovery.
Conclusion—
By using combinatorial peptide libraries and powerful nano-LC-MS technology, we could map extensively the cytosolic RBC proteome and demonstrate the presence in these cells of a large body of previously undetected species. Given the high number of proteins identified and the wide range of physiological functions to which they are related, more focused studies would be needed to understand better whether these proteins still play a role in the mature cell. Clearly some minor proteins reported here are inherited from the erythroblast, like for example the BCl-XL antiapoptotic molecule, which is essential for survival of precursor cells from the erythroid lineage during the differentiation process (41), or the Emp protein, recently shown to be important for nuclear extrusion from erythroblasts and their terminal maturation (42). However, even if these proteins are no longer essential in the mature cell, their identification remains of interest. It shows that proteasomal degradation of many species is not complete or maybe not even performed at all and that the RBC cytosol can constitute an easily accessible reservoir of biomarkers carrying potential information about dysfunctions taking place earlier in the differentiation process. This can be interesting in particular to study myeloproliferative disorders characterized by abnormal and malignant transformations of the precursor hematopoietic cells as in the case for example of erythroleukemia, erythrocytosis, or polycythemia vera, pathologies associated with increased hematopoiesis and overproduction of RBCs (43). In the case of polycythemia vera, a somatic mutation of the JAK2 gene leading to dysfunction of the corresponding kinase has been shown to be involved in pathogenesis (44). However, no convenient serum biomarker exists today for this disease, and screening the RBC content by differential proteomics could help in finding more efficient diagnosis tests. The combinatorial peptide library approach could be a powerful tool for such biomarker research projects as it significantly increases the number of detectable species in global protein profiling strategies. In addition, we hope that the important data set of proteins provided here will prove to be useful to the community for more dedicated studies focusing on specific protein species and aiming at understanding better the molecular bases of RBC biology, terminal erythroid differentiation, hemoglobin switching, or RBC diseases. Indeed extensive proteomics data sets can become the basis of targeted mass spectrometry analyses in which the instrument is used to monitor and quantify with high sensitivity and specificity particular proteotypic peptides uniquely associated with a protein of interest (45). The mass spectrometer can indeed be operated in the multiple reaction monitoring mode where the monitoring of "transitions" composed of one parent ion and one fragment ion allows the very sensitive detection of the selected proteins in a complex mixture. This emerging targeted work flow thus relies on proper selection of an observable proteotypic peptide and of an optimal transition, which can be more efficient if based on experimental data. To this end, we made accessible all the MS and MS/MS data acquired during this study. The final protein lists, peptide information, and MS/MS spectral data can thus be browsed through the MFPaQ interface at http://proteomique.ipbs.fr/rbc.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, July 9, 2008, DOI 10.1074/mcp.M800037-MCP200
1 The abbreviations used are: RBC, red blood cell; WBC, white blood cell; ADH, alcohol dehydrogenase; TUC, 2 M thiourea, 7 M urea, 2% CHAPS; UCA, 9 M urea, citric acid to pH 3.3; HOS, hydro-organic solution; FDR, false discovery rate; XIC, extracted ion chromatogram; Hb, hemoglobin; PK, pyruvate kinase; 2D, two-dimensional; 1D, one-dimensional; LTQ, linear trap quadrupole; IPI, International Protein Index; MFPaQ, Mascot File Parsing and Quantification; ICIS, Interactive Chemical Information System; GO, Gene Ontology; E1, ubiquitin-activating enzyme; E2, ubiquitin carrier protein; E3, ubiquitin-protein isopeptide ligase. ![]()
2 E. Boschetti and F. Berger, personal communication. ![]()
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
Both authors contributed equally to this work. ![]()

Supported by a grant from Fondazione Cariplo, Milan, Italy. To whom correspondence may be addressed: Dept. of Chemistry, Polytechnic of Milan, Via Mancinelli 7, Milan 20131, Italy. E-mail: piergiorgio.righetti{at}polimi.it

Supported by the CNRS, the Génopole Toulouse Midi-Pyrénées, the Région Midi-Pyrénées, and grants from the Institut National du Cancer, Agence Nationale de la Recherche, and Fondation pour la Recherche Médicale. To whom correspondence may be addressed: Inst. de Pharmacologie et de Biologie Structurale, 205 route de Narbonne, 31077 Toulouse cedex 4, France. E-mail: bernard.monsarrat{at}ipbs.fr
| REFERENCES |
|---|
|
|
|---|
-globin gene.
Blood 106, 1466
–1472
,
, and
globin messenger RNAs are expressed in adults.
Blood 74, 629
–637
gene be a real globin?
Nature 329, 465
–466[CrossRef][Medline]
-globin is transcribed in human fetal erythroid tissues.
Nature 329, 551
–554[CrossRef][Medline]
-globin cluster: fetal/adult pattern of
-globin gene expression.
Blood 80, 1586
–1591
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |