Identification and Stoichiometry of Glycosylphosphatidylinositol-anchored Membrane Proteins of the Human Malaria Parasite Plasmodium falciparum*S

Most proteins that coat the surface of the extracellular forms of the human malaria parasite Plasmodium falciparum are attached to the plasma membrane via glycosylphosphatidylinositol (GPI) anchors. These proteins are exposed to neutralizing antibodies, and several are advanced vaccine candidates. To identify the GPI-anchored proteome of P. falciparum we used a combination of proteomic and computational approaches. Focusing on the clinically relevant blood stage of the life cycle, proteomic analysis of proteins labeled with radioactive glucosamine identified GPI anchoring on 11 proteins (merozoite surface protein (MSP)-1, -2, -4, -5, -10, rhoptry-associated membrane antigen, apical sushi protein, Pf92, Pf38, Pf12, and Pf34). These proteins represent ∼94% of the GPI-anchored schizont/merozoite proteome and constitute by far the largest validated set of GPI-anchored proteins in this organism. Moreover MSP-1 and MSP-2 were present in similar copy number, and we estimated that together these proteins comprise approximately two-thirds of the total membrane-associated surface coat. This is the first time the stoichiometry of MSPs has been examined. We observed that available software performed poorly in predicting GPI anchoring on P. falciparum proteins where such modification had been validated by proteomics. Therefore, we developed a hidden Markov model (GPI-HMM) trained on P. falciparum sequences and used this to rank all proteins encoded in the completed P. falciparum genome according to their likelihood of being GPI-anchored. GPI-HMM predicted GPI modification on all validated proteins, on several known membrane proteins, and on a number of novel, presumably surface, proteins expressed in the blood, insect, and/or pre-erythrocytic stages of the life cycle. Together this work identified 11 and predicted a further 19 GPI-anchored proteins in P. falciparum.

The most prevalent form of protein glycosylation in malaria parasites is the C-terminal addition of a glycosylphosphatidylinositol (GPI) 1 membrane anchor. These glycolipid anchors are added to the C terminus of proteins in the endoplasmic reticulum (ER) and serve to anchor these proteins to the outer leaflet of the plasma membrane. GPI-anchored proteins are common on all extracellular forms of the parasite and can elicit strong immune responses that appear to play an important role in acquired and/or vaccine-induced immunity (1). For this reason antimalarial vaccines incorporating recombinant GPI-anchored proteins are currently being developed to protect against this devastating disease. Although a number of GPI-anchored proteins have been characterized already and their functions and potential as vaccines are currently being explored, genomic and proteomic studies suggest that many more GPI-anchored surface proteins await discovery in Plasmodium falciparum, the most important cause of human malaria (2)(3)(4).
The GPI anchoring of proteins is one of the most ancient and widespread forms of protein glycosylation in eukaryotes, and the higher level rules governing the attachment of these glycolipids appear to be strongly conserved (5). The preprotein sequences of GPI-anchored proteins are characterized by being book-ended by an N-terminal hydrophobic signal sequence that directs co-translational insertion into the ER and a C-terminal hydrophobic region that guides the attachment of a GPI anchor. No other hydrophobic stretches that could serve as internal transmembrane domains are usually present. During co-translational insertion into the ER mem-brane the C-terminal hydrophobic region is recognized by a transamidase complex that simultaneously removes the hydrophobic region and replaces it with a presynthesized GPI anchor (5). The cleavage site (between and ϩ 1 amino acids) is a short distance upstream of the hydrophobic region and is usually comprised of three amino acids with small side chains (6,7).
The surfaces of the various extracellular forms of the malaria parasite, the merozoite, gamete, ookinete, and sporozoite, are coated by different proteins that are either known or presumed to be GPI-anchored. The identities of some of these proteins are known in P. falciparum including for example, merozoite surface protein 1 (MSP-1) and MSP-2 on merozoites (8,9), Pfs48/45 on gametes (10), Pfs25 and Pfs28 on ookinetes (11,12), and circumsporozoite protein (CSP) on sporozoites (13,14). Using a variety of approaches, a number of other proteins have been predicted as likely to possess GPI anchors, in our own case using proteomic analysis of detergent-resistant membrane preparations (4). A cursory examination of the C-terminal sequences of known or predicted GPI-anchored proteins reveals that they share the general characteristics of eukaryotic GPI anchor signals, namely a C-terminal stretch of 13-18 non-charged amino acids enriched with hydrophobic amino acids. The cleavage sites of three P. falciparum GPI anchor signal sequences are known (MSP-1, -2, and -4), and they, like other eukaryotic GPI attachment sites, are largely comprised of the small amino acids serine and asparagine (15). Despite this knowledge, very few P. falciparum proteins have been confirmed as possessing GPI anchors.
The objective of this study was to identify all the proteins present in the currently annotated version of the P. falciparum genome (2) (www.PlasmoDB.org) that are most likely to be GPI-anchored. The paucity of validated GPI-anchored proteins in this organism, together with the evolutionary distance of this parasite from other species in which GPI anchoring has been well studied, led to uncertainty in the use of available programs that predict the presence of GPI anchors from primary sequence data. Because of this we used a combination of biochemical and bioinformatic approaches to determine the GPI proteome of P. falciparum. Proteomic analysis was used to identify GPI-anchored proteins synthesized late in the blood-stage cycle and that eventually reside on the merozoite form of the parasite. For this procedure, GPI anchors were first labeled with [ 3 H]glucosamine, and GPI-anchored proteins were enriched by detergent-phase partitioning. Eleven different GPI-anchored proteins were assigned using this approach, and these were rank ordered according to their relative copy number. Up to another nine lower abundance [ 3 H]glucosamine-labeled GPI-anchored proteins were also apparent at this life stage, but these remain unidentified. To predict the identity of these lower abundance proteins as well as those present at other life stages we developed a program, GPI-HMM, trained on P. falciparum sequences to search all annotated and predicted protein sequences derived from the P. falciparum genome. Using this approach we confidently predicted that the P. falciparum genome encodes 30 GPI-anchored proteins. Sixteen of these have a late bloodstage expression profile, which is in good agreement with proteomic analysis. With the emergence of genome sequences across the Plasmodium genus, these studies provide a framework for predicting the nature of surfaces of the various extracellular forms of these parasites.

EXPERIMENTAL PROCEDURES
Parasite Culture and Metabolic Labeling-P. falciparum (3D7 line) parasites were synchronized with sorbitol and cultured at 8 -10% parasitemia according to standard procedures (16,17). Specific metabolic labeling of GPI-anchored proteins was achieved by incubating ϳ1 ϫ 10 10 parasite-infected erythrocytes with 10 Ci/ml D-[6-3 H]glucosamine hydrochloride in glucose-free RPMI 1640 medium supplemented with 10 mM fructose, 25 mM Hepes, 0.2 mM hypoxanthine at 37°C for 4 h. Infected erythrocytes were pelleted by centrifugation at 2,800 rpm for 10 min (Beckman GS-6 centrifuge), lysed with saponin (0.15% saponin for 10 min on ice), and extensively washed in cold TBS supplemented with a mixture of protease inhibitors (TBSϩPIs; Roche Complete protease inhibitor mixture and/or Calbiochem Protease Inhibitor Mixture Set III). Parasites were resuspended to a volume of 1 ml in TBSϩPIs, snap frozen in liquid nitrogen, and stored at Ϫ80°C until required.
Detergent Extraction and Two-phase Partitioning of GPI-anchored Proteins-Detergent extraction and enrichment of GPI-anchored proteins was achieved by Triton X-114 (TX-114) two-phase separation. Parasite pellets were equilibrated in 2.4 ml of ice-cold TBS with protease inhibitors at a final protein concentration of Ͻ4 mg/ml and extracted with 0.6 ml of precondensed TX-114 stock solution (ϳ2% final concentration). Samples were maintained on ice for 30 min with occasional mixing and centrifuged at 4°C at 100,000 ϫ g for 30 min to remove TX-114-insoluble material. The supernatant was incubated at 33°C for 10 min and centrifuged at 10,000 rpm for 10 min at room temperature to induce phase separation. Aqueous and organic phases were recovered and mixed with fresh buffer or detergent as appropriate, and the phase separation procedure was repeated twice more. Aliquots of total lysate and TX-114-insoluble and TX-114soluble aqueous or detergent-enriched fractions were collected and quantified using BCA protein assay reagent (Pierce). The resulting TX-114-soluble detergent fractions enriched in GPI-anchored proteins were pooled, and proteins were precipitated with 3-4 volumes of acetone at Ϫ20°C.
Two-dimensional (2D) Gel Electrophoresis-Two-dimensional gel electrophoresis was performed using conditions required for optimal extraction and separation of P. falciparum-infected erythrocyte proteins (19). Isoelectric focusing was carried out on wide range immobilized pH gradients (11 cm long, pH 3-10 non-linear Immobiline DryStrip gels; Amersham Biosciences) using the Protean IEF cell system (Bio-Rad). The GPI-enriched fraction (ϳ200 g of acetone precipitate) was redissolved in 300 l of rehydration/sample buffer (7 M urea, 2 M thiourea, 2% ASB-14, 100 mM DTT, 0.4% ampholytes). Samples were loaded by passive rehydration for 12 h and focused at a current limit of 50 A/IPG strip using a fast voltage gradient (8000 V maximum, 24,000 V-h) at 20°C. The second dimension was carried out on 10% polyacrylamide gels (18 cm ϫ 16 cm ϫ 1.5 mm) using a Hoefer SE 600 system at 200 V constant voltage and 10°C until the dye front reached the bottom of the gel. Analytical gels were transferred to Immobilon-P SQ PVDF membranes, and D-[6-3 H]glucosamine-labeled protein spots were visualized by autoradiography as above. To visualize proteins after 2D electrophoresis preparative gels were stained with potassium salt silver stain, Bio Safe TM Coomassie stain (Bio-Rad), or Imperial TM protein stain (Pierce) using protocols compatible with electrospray ionization-mass spectrometry (20) or the manufacturer's instructions. Computer-aided 2D image analysis of digitized autoradiographs and Coomassie-stained gel images was carried out using ImageJ and ImageQuant Version 5.0. The relative amount of GPI-anchored protein was calculated from densitometric scans of autoradiographs of [ 3 H]glucosamine-labeled total schizont proteins separated by high resolution 1D gel electrophoresis. To estimate the relative amounts of multiple GPI-anchored proteins present in individual 1D bands, densitometric scans of autoradiographs of 2D gels of the same material were used as indicated in Table I.
Mass Spectrometry-To identify putative novel GPI-anchored P. falciparum schizont proteins, the regions matching the position of D-[6-3 H]glucosamine-labeled protein spots were excised from preparative 2D gels and extensively washed in deionized water. Excised gel spots were digested automatically with trypsin (0.05 g) as described previously (21). Tryptic digests were dried to ϳ10 l by centrifugal lyophilization (Savant model AES1010, Savant) for ESI ion trap MS/MS (LCQ-Deca, Finnigan). Protein digests (ϳ10 l of 1% formic acid) were transferred into 100-l glass autosampler vials, and peptides were fractionated by capillary reversed-phase HPLC (Agilent Model 1100 capillary HPLC system) using a butyl-silica 150 ϫ 0.15mm-inner diameter reversed-phase capillary column (ProteCol TM , C 4 , 3 m, 300 Å; SGE Australia) developed with a linear 60-min gradient from 0 to 100% B where Solvent A was 0.1% (v/v) aqueous formic acid and Solvent B was 0.1% aqueous formic acid, 60% (v/v) acetonitrile with a flow rate of 0.8 l/min. The capillary HPLC system was coupled on line to the ESI ion trap mass spectrometer for automated MS/MS analysis of individually isolated peptide ions (21).
Uninterpreted CID spectra were filtered excluding spectra with less than 30 peaks using the LCQ-DTA program as part of Bioworks 3.1 sr1 (Finnigan, San Jose, CA). The parameters used to create the peak lists are as follows: minimum mass, 400 Da; maximum mass, 5000 Da; grouping tolerance, 1.5; intermediate scans, 1; minimum group count, 1; LCQ-DTA auto charge state calculation; 10 peaks minimum per spectrum; peptide charge states, 1ϩ, 2ϩ, and 3ϩ; Ϯ2-Da peptide mass tolerance; Ϯ0.5-Da MS/MS fragment mass tolerance. Parent ion masses were determined based on the isotope cluster spacing in the zoom scan spectrum, and individual spectra files (.dta file extension) were generated. These files were then automatically searched using Mascot TM Version 2.1 (Matrix Science, London, UK) against the latest LudwigNR database (21) as well as the P. falciparum strain 3D7 genomic data peptide sequence database in FASTA format down-loaded from the PlasmoDB website (www.plasmodb.org). Note that the PlasmoDB RAMA sequence (MAL7P1.208) is incorrect and was replaced with another whose accuracy had been verified by cDNA sequencing (Genbank accession number Q89710). Searches were conducted with the carboxymethylation of cysteine as a fixed modification (ϩ58 Da), variable oxidation of methionine (Ϯ16 Da), and the allowance for up to four missed tryptic cleavages (22). Peptide identities were chosen to be correct with Mascot scores of 40 and above and were also manually validated. Peptide identities with Mascot scores of less than 40 were all manually validated and deemed as positive identities according to the fragmentation principles as published previously (22).
Prediction of GPI Anchoring-A hidden Markov model was used to predict GPI anchoring of P. falciparum proteins (GPI-HMM), and the topology for the model was chosen to correspond to the known overall structure of such proteins (23). The ER signal region was chosen to be identical to that in SignalP-HMM (24); the middle region is a simple looping state with a background amino acid emission distribution; the GPI signal region was trained using the amino acid sequences of the C-terminal regions of 14 previously characterized proteins proven or thought to be GPI-anchored and aligned by predicted -site. The training procedure for the GPI signal region used sequence weighting and a Dirichlet mixture prior similar to that used by HMMer (25). For a given protein, a score is calculated that shows how well the protein fits the model as compared with a null model, which is just a single looping state that is the same as the middle region. A positive score indicates a better fit than the null model. To calibrate the model, we used a dataset ("negative test set") of 1909 proteins known not to be GPI-anchored but that have a transmembrane domain within 50 amino acids of the C terminus and come from various organisms.
Many proteins with transmembrane domains score highly, so to increase the power of our predictions we also automatically flag any protein that is predicted to have at least three transmembrane domains by TMHMM (26). This model was then used to score each protein in the P. falciparum proteome. An estimated p value for each protein is then calculated as the proportion of the negative test set that has not been flagged and scores at least as well as the given protein. Note that this p value is conservative due to the choice of a negative test set closer to GPI-anchored proteins than the majority of P. falciparum proteins. To measure the robustness of this model, a leave-one-out cross-validation procedure was applied. For each protein in the training set, a new model was trained with that protein removed, and the resulting model was applied to the removed protein.

RESULTS AND DISCUSSION
Validation of GPI Anchoring on 11 Schizont-stage Proteins-The overall aim of this study was to determine the GPI-anchored proteome of P. falciparum. As the blood stages are amenable to continuous in vitro culture we used several approaches to identify the GPI-anchored proteins present at this point in the life cycle and to help derive a more comprehensive training set for the development of software to predict the remainder of the P. falciparum GPI-anchored proteome. Due to fatty acid acylation of inositol in the glycolipid structure, GPI-anchored Plasmodium proteins are resistant to cleavage by phosphatidylinositol-specific phospholipase C, thereby preventing the use of this common enzymatic identification method (27)(28)(29). Therefore, we used biosynthetic labeling and TX-114 detergent partitioning techniques in conjunction with Western blot analyses, 2D mapping, and mass spectrometry-based sequencing methods to identify GPI-anchored proteins.
The GPI anchors of schizont-stage P. falciparum proteins were specifically labeled with [ 3 H]glucosamine because in P. falciparum little, if any, N-linked glycosylation occurs (30,31), and virtually all amino-sugar precursor is incorporated into the glycan moiety of the GPI anchor (32). Total schizont-stage proteins were fractionated by TX-114 two-phase partitioning, a process that separates soluble proteins into aqueous and detergent-phase fractions. Densitometric analysis of protein equivalents of the ensuing fractions (total, insoluble, aqueous, and detergent) showed that TX-114 partitioning resulted in a specific 5-10-fold enrichment of detectable [ 3 H]glucosaminelabeled proteins in the TX-114 detergent-soluble phase (Fig.  1, A and B). Pulse-chase analyses at various points in the 48-h blood-stage cycle revealed that [ 3 H]glucosamine-labeled proteins were most strongly expressed late in the cycle (Fig. 1C). That the expression of most asexual blood-stage GPI-anchored proteins peaked during schizont stages is congruent with peak transcription of the genes encoding enzymes that synthesize and attach the GPI anchors (33) (see Supplemental  2). Anti-MSP-1 19 recognized major bands of 220 kDa (band 1) and 16 kDa (band 15), which correspond to full-length MSP-1 and MSP-1 19 , respectively. This antibody also recognized minor bands corresponding to MSP-1 42 (band 10) and a 125-kDa band (band 3) that presumably represents natural products. Anti-MSP-10 labeled a single band of ϳ75 kDa (band 6) that is similar to its predicted and previously published size (35). Anti-RAMA antibodies labeled a major band at ϳ55 kDa (band 8) and a lower abundance larger band of ϳ160 kDa (band 2) that probably corresponds to the full-length protein.
The 55-kDa species is the expected dominant C-terminally processed fragment of RAMA that appears shortly after the expression of full-length protein (36). MSP-2 is present as three bands: a major 50-kDa band (band 9) that corresponds to the full-length protein and lesser amounts of a faster migrating 42-kDa band (band 10) and a 110-kDa, presumably SDS-resistant dimeric, species (band 4). Co-migrating with the 42-kDa MSP-2 species is MSP-4 and MSP-5, all of which comprise band 10 (Fig. 2). These data led to a designation for nine of the 15 [ 3 H]glucosamine-labeled bands. Of the remaining six unidentified bands five labeled strongly (bands 5, 11, 12, 13, and 14) and one labeled weakly (band 7) with [ 3 H]glucosamine. Hence a number of unidentified GPI-anchored proteins appeared to be relatively abundantly expressed at schizont stage.
LC-MS/MS was performed on bands 1-14 excised from 1D SDS-PAGE gels corresponding to [ 3 H]glucosamine-labeled proteins. In each band, peptides corresponding to more than one protein were detected including obvious contaminants such as human keratins, immunoglobulin, and erythrocytic proteins (Supplemental Table S1). Eight of the bands included proteins either known (MSP-1, MSP-2, and RAMA) or predicted to be GPI-anchored (e.g. Pf92, Pf12, and Pf38 (4)). In some instances multiple potential GPI-anchored proteins were identified in the same band (Supplemental Table S1). Other bands did not contain any detectable GPI-anchored proteins even though autoradiographic data and/or Western blot analyses indicated that they should be present in these samples. One possible reason for the lack of detection is the masking of less abundant GPI-anchored proteins by the presence of highly abundant non-GPI-anchored proteins in the analyzed samples.
To generate a more comprehensive map of GPI-anchored schizont proteome we performed 2D gel electrophoresis and subjected [ 3 H]glucosamine-labeled spots to peptide se-quencing by LC-MS/MS. [ 3 H]Glucosamine-labeled proteins were precipitated from the TX-114 detergent phase, separated on 2D gels, and blotted. Autoradiographs of the resulting protein blots produced a highly reproducible pattern of ϳ30 [ 3 H]glucosamine-labeled 2D spots (Fig. 3, left panel). Regions corresponding to individual radiolabeled spots were then excised from duplicate Coomassie-or silver-stained gels (e.g. Fig. 3, right panel), and their proteins were sequenced by LC-MS/MS (Supplemental Table S2 and Fig. S2). The isoelectric point, molecular weight, and approximate corresponding 1D gel band of each assigned protein spot, as measured from five separate 2D gels, is summarized in Table I. This approach proved much more conclusive than the 1D approach with known or potential GPI-anchored proteins dominating the peptide coverage in 21 of 30 analyzed 2D spots (Supplemental Table S2). Some proteins were identified by LC-MS/MS in more than one 2D spot; these usually represent expected C-terminal processing fragments. Consistent with this, the observed isoelectric points and peptide coverage of most [ 3 H]glucosamine-labeled protein spots were in good agreement with the predicted isoelectric points and polypeptide structure of naturally processed forms of the respective GPI-anchored proteins. Probably the best example is MSP-1, which accounted for eight of the 30 [ 3 H]glucosamine-labeled protein spots (Table I and Supplemental Fig. S2). These included full-length MSP-1 with peptide coverage of 57% (spot 1) as well as all known naturally occurring C-terminal proteolytic fragments (spots 3, 4, 11, 13, 15, and 16) and MSP-1 19 (spot 30). We also detected 13 peptides of RAMA and 11 peptides of apical sushi protein (ASP) in two separate spots (spots 9 and 10) with an apparent isoelectric point, molecular weight, and sequence coverage agreeing with that of their respective natural C-terminal 60-and 50-kDa processed forms (36,37).
The acidic region of 2D gels contained multiple partially reduced forms of the abundant known GPI-anchored merozoite surface proteins MSP-2 (spots 12 and 18), MSP-4 (spots 14 and 19), and MSP-5 (spot 14), which were difficult to resolve due to their nearly identical isoelectric points and molecular weights. In agreement with 1D Western blot results for bands 2, 4, and 15 (Fig. 2), immunostaining of 2D Western blots also confirmed that the corresponding spots (spots 2, 5, and 30) were recognized by antibodies against RAMA, MSP-2, or MSP-1 19 , respectively (Table I, data not shown). As in the 1D proteomic analysis we also detected the three newly identified surface antigens Pf92, Pf38, and Pf12. These were predicted to be GPI-anchored based on primary sequence and their presence in detergent-resistant membrane preparations (4). The 2D gel proteomic approach confirmed the posttranslational modification and TX-114-partitioning of these proteins, strongly supporting the likelihood that they do possess GPI anchors (Table I). Moreover we confirmed the presence of GPI anchors on MSP-10 and ASP, two known apical surface proteins that unlike most other GPI-anchored mero- zoite surface proteins were not detected in low density detergent-resistant membrane fractions (4,38). Finally we also detected a novel putative GPI-anchored protein, PFD0955w. PFD0955w has a predicted molecular mass of ϳ34 kDa, and we therefore have named it Pf34. In total, the presence of GPI moieties was confidently assigned to 11 different proteins in mature blood-stage P. falciparum parasites (MSP-1, -2, -4, -5, -10, RAMA, ASP, Pf92, Pf38, Pf12, and Pf34).
In addition to the 21 identified GPI-anchored protein spots, nine minor but distinct [ 3 H]glucosamine-labeled protein spots were observed that could not be authenticated (Table I and Fig. 2). Seven of these 2D gel spots yielded no detectable parasite-related peptide sequences (spots 8, 21, 23, 26, 27, and 28), and two merely contained peptide sequences of non-GPI-anchored P. falciparum proteins (spots 20 and 29). Interestingly spot 29 yielded unique high peptide coverage for the hypothetical protein PFF0335c (formerly MAL6P1.71) in four independent experiments. Apart from apparently labeling with [ 3 H]glucosamine, the same spot also incorporated the radioactive GPI lipid precursors [ 3 H]palmitate and [ 3 H]myristate (data not shown). Moreover PFF0335c partitioned in the hydrophobic TX-114 detergent phase, and proteomic analysis revealed that it is particularly abundant in a low density detergent-resistant membrane fraction of P. falciparum (4). However, peptides covering the putative C-terminal hydrophobic domain of this protein were detected by LC-MS/MS (Supplemental Fig. S2), indicating that this sequence is not removed to add a GPI anchor. Hence assuming that the current gene model for this protein is correct, it is unlikely that PFF0335c is GPI-anchored but instead appears to co-migrate with an as yet unidentified GPI-anchored protein. This is probably also the case for spot 20, which yielded significant peptides for multiple highly expressed non-GPI-anchored P. falciparum proteins (Supplemental Table S2). The nine spots that have so far eluded identification were presumably below the detection limit of our LC-MS/MS system and/or masked by other comigrating highly abundant proteins. It is also possible that additional GPI-anchored proteins may also be present in some of the 21 spots that have an assignment but that these are masked by more abundant species. Also we cannot completely exclude the possibility that some spots appear as a result of a small degree of N-or O-linked glycosylation. Exactly how many additional GPI-anchored schizont proteins remain to be assigned is uncertain, but it is probably less than nine as it is likely that at least some of the 2D spots represent different proteolytic or oligomeric forms of the same protein.
Using the assignments from the 1D and 2D analyses (Table I), the relative copy number of GPI-anchored proteins was estimated from densitometric scans of autoradiographs of [ 3 H]glucosamine-labeled proteins. Because each GPI-anchored protein has a single anchor, the level of radiolabel is proportional to the number of molecules of each protein.
Remarkably the 11 identified GPI-anchored proteins constitute ϳ94% of the GPI-anchored schizont proteome (Fig. 4). Hence even when considered together the nine unidentified 2D gel spots are clearly low in copy number. Individually MSP-1, MSP-2, and RAMA were the most abundant, comprising ϳ31, 21, and 10% of all GPI-anchored proteins, respectively (Fig. 4). Of these, only MSP-1 and MSP-2 are located on the surface, and hence this analysis demonstrates that these two proteins dominate the merozoite surface, comprising approximately two-thirds of all the GPI-anchored proteins residing in this location. Four proteins (Pf38, Pf12, Pf92, and MSP-10) each comprise ϳ5% of the GPI-anchored proteome. MSP-4 and -5 could not be clearly separated by 2D gel electrophoresis and, similar to ASP and Pf34, each represent ϳ2-3% of GPI-anchored proteins at the schizont stage.
Predicting the GPI-anchored Proteome of P. falciparum-We tested the suitability of three web-based programs, Big-PI, GPI-SOM, and DGPI (39,40), 2 to predict the presence of GPI anchors on validated P. falciparum proteins. No existing GPI predictors identified all 11 sequences as being GPIanchored; Big-PI correctly identified five validated proteins, DGPI identified four, and GPI-SOM identified seven (Table II, shaded). Two validated proteins (MSP-5 and ASP) were not predicted to be GPI-anchored by any of the algorithms surveyed (Table II). As these programs were trained with fungal, animal, and plant proteins it is not surprising that they are not optimal for the phylogenetically distant organism P. falciparum. For this reason a hidden Markov model, termed GPI-HMM, trained specifically with the 14 known or probable 2 J. Kronegg and D. Buloz, personal communication. P. falciparum GPI-anchored protein sequences was developed (summarized in Fig. 5; details under "Experimental Procedures"). This training set included nine confirmed GPI-anchored proteins and five predicted GPI-anchored proteins (Table II). The five predicted proteins possess N-and Cterminal hydrophobic domains, and all are known surface proteins expressed at different life stages. Four of the five are homologues of confirmed GPI-anchored proteins (epidermal growth factor (EGF) or Cys 6 domain-containing surface proteins). However, to guard against the possibility that one or more of these proteins did not possess a GPI anchor a leaveone-out cross-validation procedure was performed by individually testing each protein with a training set made without it. Using this procedure, 13 of the 14 proteins in the training set had a positive score for a GPI anchor including all five proteins for which GPI anchoring was only predicted (Supplemental Table S3). The one exception, MSP-2 (PFB0300c), is a validated GPI-anchored protein so this protein remained in the final training set. An HMM-derived score was generated for each predicted P. falciparum protein, and the top scoring 100 proteins are shown in Supplemental Table S4. The top 60 proteins all had GPI-HMM scores above zero. A high rate of false positive hits was anticipated because P. falciparum contains many proteins with ER signal sequences together with elements that target the protein to the relic chloroplast, the apicoplast, or to the erythrocytic cytosol. Proteins targeted to the erythrocyte cytosol are particularly confounding for GPI predictors because many of these Plasmodium export element motif-containing proteins have C-terminal transmembrane domains (TMDs) (42)(43)(44). There is no evidence that GPI-anchored proteins are transported to the apicoplast or the erythrocyte cytosol. For this reason we have indicated which putative GPI-anchored proteins may not be GPI-anchored but instead are targeted to the apicoplast and erythrocyte cytosol (Supplemental Table S4). Also regarded as probable false positives were several high scoring small proteins (Ͻ120 aa) with N-and C-terminal hydrophobic regions because apart from MSP-1 19 (which is the proteolytic product of a much larger protein) spots corresponding to small proteins (ϳ100 aa) were not observed in our blood-stage proteomic analysis. Proteins possessing both a GPI anchor and an internal TMD are exceedingly rare, therefore all proteins with a total of three or more TMDs were also rejected as likely false positives (Supplemental Table S4). Some proteins with high GPI-HMM scores contained two or more charged residues in the last 15 aa that would tend to disrupt the hydrophobic TMD of typical GPI anchor signals. We consider it unlikely that these proteins are anchored, and the reason they received high scores was that their ER signal sequences and processing sites for addition GPI anchors were very similar to the training set, although the potential of the C-terminal amino acids to form a TMD was low (Supplemental Table S4).
In all, 26 of 60 positive scoring proteins survived the manual cull and were considered most likely to be GPI-anchored (Table II). Another four proteins that were just below the cutoff were included in the final list because they retained low p values (0.01-0.02) (Table II and Supplemental Table S4). This list includes all 11 biochemically validated blood-stage proteins as well as CSP and RMP-1, two proteins for which considerable evidence of GPI anchoring is available (45,46). Note also that two biochemically validated GPI-anchored proteins (Pf92 and Pf34) were not included in the training set yet scored highly (Table II). Also present on the final GPI-HMM list but not in- FIG. 4. Stoichiometric ratios of schizont-stage GPI-anchored proteins. The mean percent contribution (ϮS.D. of total detected counts) of each GPI-anchored protein as measured by densitometric scanning of eight independent autoradiographs is indicated. cluded in the training set were several genes (PFF0620c, PFD0215c/Pfs36p, and PF10_0302/Pfs28) independently predicted to be GPI-anchored based on homology to known GPIanchored surface proteins (47,48). Also none of the proteins on this list have been shown not to possess a GPI anchor.

Expression Profile and Domain Structure of GPI-anchored
Proteins-Transcription profiles for each of the genes encoding the 30 GPI-anchored proteins predicted by GPI-HMM were extracted from the Affymetrix (multiple life stages) and glass slide 70-mer microarray (blood stage only) databases

TABLE II
The GPI-anchored proteome of P. falciparum predicted using GPI-HMM a Known or predicted GPI-anchored proteins used to train the GPI-HMM are indicated (ϩ). A name is given in parentheses if the protein has been identified. A positive prediction for GPI anchoring using Big-PI, GPI-SOM, and DGPI programs is indicated (Y), whereas biochemically validated GPI-anchored proteins are shaded. (34,49). Transcription of 28 of these genes was detected in the Affymetrix array analysis, and a Venn diagram displaying the expression patterns in blood, gametocyte, and sporozoite stages was generated (Fig. 6). Note that RAMA was excluded from this analysis as probes corresponding to this gene were not present on either microarray chip, whereas PFF0620c transcripts were not detected in any life stage. Expression levels (according to the Affymetrix arrays) of strictly less than 10% of the total level of transcription for a particular gene were considered background and were not taken into consideration in assigning stage specificity. Seventeen of the 28 (61%) genes were assigned to just one of the blood, gametocyte, or sporozoite stages (Fig. 6). Eight genes demonstrated a degree of expression in two stages, whereas three genes (Pf38, Pf12, and msp-4) appear to be expressed at a reasonable level in all stages (Fig. 6). More detailed transcriptional profiles are displayed in Fig. 7.
Including RAMA and PFF0620c, 22 of the 30 GPI-anchored proteins are expressed to some degree in blood stages. As expected, the majority of these blood-stage proteins, at least 16, are maximally expressed late in the cycle. This agrees well with the schizont-stage proteomic studies described above that detected a maximum of 20 different GPI-anchored proteins. We predict that at least some of the unidentified [ 3 H]glucosamine-labeled proteins will correspond to the previously unassigned genes with a schizont-stage expression profile. Given their expression pattern in both transcriptional studies, PF11_0373 and PF14_0293 are likely candidates in this regard (Table II and Fig. 7). As mentioned above, however, most if not all of these are likely to be present only as low abundance GPI-anchored proteins. Two genes, PF14_0201 (encoding Pf113) (4) and PFE0125w, are more broadly expressed in the blood stage, and one, RMP-1, is synthesized in the ring stages (45). Both Pf113 and RMP-1 have been demonstrated recently as having characteristics of GPI-anchored proteins and are isolated in detergent-resistant membranes (4,45).
In addition to the known gamete and ookinete surface antigens, two novel proteins, PFE0220w and PFF0975c, are transcribed in gametocytes and are likely to be present on the surface of one or the other of these extracellular stages. Similarly in addition to the known sporozoite antigens, CSP and P36p, several novel GPI-anchored proteins are predicted to be present at this life stage, most notably PFC0750w and PF08_0008 (Fig. 7).
P. falciparum GPI-anchored proteins vary tremendously in FIG. 6. Venn diagram of the numbers of GPI-anchored proteins expressed during asexual blood, gametocyte, and sporozoite stages. The expression profiles were obtained from Affymetrix microarray data (49). The percentage of proteins expressed at particular stage(s) is also shown.

FIG. 5. Structure of a GPI-anchored protein showing N-terminal ER secretion signal and C-terminal hydrophobic domain.
A, after translation the Cterminal domain is cleaved between the and ϩ 1 amino acids, and the GPIanchor is attached to the amino acid. B, the topology of GPI-HMM highlighting the three main sections of the model. The middle region is a simple looping state with a background amino acid distribution. The first part of the GPI signal is a linear set of states each with its own emission distribution. The second part is a variable length region with each state having an identical emission distribution. size with their predicted proproteins ranging between 24 kDa (Pfs25/Pfs28) and 195 kDa (MSP-1) (Fig. 7). One feature shared by most predicted P. falciparum GPI-anchored proteins is the presence of cysteine-rich domains that probably form disulfide bonds and contribute to globular structures. In some cases these domains have homology to domains of other eukaryotic proteins. EGF-like domains are found in MSP-1, -4, -5, -8, -10, Pfs25, Pfs28, and chr10.phat_47, and a sushi domain is seen in ASP (37).
One particularly interesting group of cysteine-rich proteins are the six-cysteine (Cys 6 ) protein family (4,47). These proteins appear to be restricted to the phylum Apicomplexa and are present as paired domains in the GPI-anchored proteins Pf12, Pf38, Pf47, Pfs48/45, Pf52 (also known as Pf36p), and PFF0620c. In this study we assigned GPI anchors to Pf38 and Pf12, two proteins recently identified in schizont-stage detergent-resistant membranes (4). A previous study suggested that mRNA of the rodent malaria or- FIG. 7. The predicted GPI-anchored proteome of P. falciparum. A, gene model (PlasmoDB No.) and protein names if known for the top 30 GPI-anchored proteins predicted by GPI-HMM. Circled numbers at left represent how the proteins were ordered in this diagram: 1, schizont-stage GPI-anchored proteins whose abundance has been quantified by proteomic analyses arranged from most to least abundant; 2, genes expressed during blood as well as other stages; 3, genes expressed only in gametocytes; 4, genes expressed only in sporozoites. B, glass slide microarray transcription data for genes encoding putative GPI-anchored proteins over the 48-h asexual blood-stage cell cycle (34). Data were not available for all genes but where shown yellow indicates expression and blue indicates no expression. C, Affymetrix microarray data for GPI-anchored proteins from asexual blood stages, gametocytes, and sporozoites (49). The size of the different colored bars is proportional to the amount of expression at that particular stage. ER, early rings; LR, late rings; ET, early trophozoites; LR, late trophozoites; ES, early schizonts; LS, late schizonts; M, merozoites; Ga, gametocytes; Sp, sporozoites. D, diagrammatic representation of protein structure of GPI-anchored proteins. N/A, data not available. 6-Cys, six-cysteine. thologue of Pf38 is present in gametocytes (48), but we showed here and in a prior study (4) that Pf38 is also expressed in asexual blood stages. Evidence for Pf12 expression was first detected in a COS cell expression library of P. falciparum genomic DNAs screened with malaria immune African serum (50). We confirmed here that Pf12 is translated in blood stages, and its protein is present at levels similar to those of Pf38; both proteins each comprise ϳ5% of total GPI-anchored proteins in schizonts. These Cys 6 proteins appear to be related to the surface antigen protein family of Toxoplasma gondii that are mostly GPIanchored and are thought to bind the sulfated proteoglycans on host cells (51,52). Pf12 and Pf38 are natural antigens and are localized to the plasma membrane of mature merozoites, although Pf38 also localizes to apical organelles (4). As Pf12 and Pf38 appear also to be expressed in gametocytes and pre-erythrocytic stages, there is a possibility that vaccines derived from these proteins could act during multiple points of the life cycle of the parasite (4).
Many other predicted GPI-anchored proteins possess cysteine-rich regions that remain to be characterized (Fig. 7D, green boxes). Once such protein is Pf92 that was also discovered in a proteomic analysis of detergent-resistant membrane preparations and is localized to the merozoite surface (4). The mature form of Pf92 contains 14 cysteine residues that could form up to seven disulfide bridges. We could not detect any other proteins that were homologous to Pf92 in the P. falciparum genome, but in rodent and primate Plasmodium species orthologues are clearly present (data not shown).
Another feature shared by many GPI-anchored proteins is the presence of runs of short repetitive amino acid motifs that are generally hydrophilic and probably lie on the protein surface (Fig. 7D). It has been noted previously that the length and copy number of these motifs is greater in asexual antigen proteins compared with other Plasmodium proteins (53). It has been proposed that these repetitive regions serve as a "smoke-screen" to elicit a strong but ineffective immune response from the host and have expanded recently as a result of selective pressure (54). That many of the GPI-anchored proteins expressed in asexual blood-stage merozoites and therefore exposed to the immune system contain repetitive arrays is in agreement with this proposal.
In summary, in this study we used two approaches to predict the entire GPI-anchored proteome of P. falciparum. First, as the blood stages are amenable to continuous in vitro culture we used a proteomic approach to identify [ 3 H]glucosamine-labeled proteins at this life stage, all of which are assumed to possess GPI anchors. Consistent with a previous study (55), we showed that GPI-anchored proteins are mostly synthesized late in the blood-stage cycle and hence are expected to be present on the surface or in the secretory organelles of the invasive merozoite form of the parasite. Thirty [ 3 H]glucosamine-labeled species were observed by 2D gel electrophoresis, and 21 of these were assigned to 11 GPI-anchored proteins. Hence a maximum of nine GPI-anchored proteins remain to be identified in schizonts. Even when considered together these unidentified proteins only represent a very small proportion (ϳ6%) of the total GPI-anchored protein content at this life stage. In this regard, this study represents the first formal examination of the stoichiometry of P. falciparum merozoite surface proteins and confirms a widely held suspicion; just two proteins, MSP-1 and MSP-2, comprise a large proportion (approximately two-thirds) of the membrane protein surface coat. Despite this, a number of other GPIanchored proteins, including the newly identified proteins Pf92, Pf38, Pf34, and Pf12, are present at relatively high copy number.
Second, in parallel with the proteomic approach we used predictive software to rank all proteins in the P. falciparum genome according to the likelihood that they are GPI-anchored. We observed that pre-existing GPI prediction programs performed unsatisfactorily with P. falciparum genome data and failed to predict many proteins known to bear GPI anchors. For this reason we created a new algorithm to predict GPI-anchored P. falciparum proteins. A final list of 30 GPI-anchored was derived, a list that includes a number of novel proteins predicted to be present on the surface of one or more of the extracellular life stages.
An important aspect of future studies is to establish the functions of these proteins. Thus far, significant progress has only been made in understanding the function(s) of GPI-anchored proteins expressed exclusively in sexual and pre-erythrocytic stages because their genes can be disrupted or modified by insertional mutagenesis in bloodstage cultures. Gene knock-outs have revealed that Pfs48/45 is required for fusion of the microgametocyte to the macrogametocyte possibly by acting as an adherence molecule (56). Knockouts of the ookinete-stage Pbs21 and Pbs25 implicate these proteins, particularly the former, in protecting the parasite from host defenses because oocyst maturation is curtailed in Pbs21 knock-outs (57). CSP is the dominant surface coat protein on sporozoites and is essential for sporozoite formation in the oocyst (58), and importantly in the context of the present study, recent mutational analysis of the putative CSP GPI attachment sequence demonstrated the functional importance of this mode of membrane attachment (46). Another sporozoite protein, P36p, is a GPI-anchored Cys 6 domain-containing protein that appears to be involved in the recognition and/or invasion of hepatocytes (59,60).
Insertional mutagenesis of GPI-anchored proteins expressed during late blood stages is technically difficult, although gene modifications have provided some functional insight into the double EGF domains of MSP-1 (61,62). Of attempts to knock out genes encoding seven schizont-stage P. falciparum GPI-anchored proteins, only one (that of MSP-5) has proven successful. 3 Most GPI-anchored proteins are therefore probably necessary for blood-stage growth, and other methodologies will be required to resolve their functions. Newly developed techniques in P. falciparum such as inducible gene expression may prove useful in this regard (63).