Proteomics Analysis of the Excretory/Secretory Component of the Blood-feeding Stage of the Hookworm, Ancylostoma caninum

Hookworms are blood-feeding intestinal parasites of mammalian hosts and are one of the major human ail-ments affecting (cid:1) 600 million people worldwide. These parasites form an intimate association with the host and are able to avoid vigorous immune responses in many ways including skewing of the response phenotype to promote parasite survival and longevity. The primary interface between the parasite and the host is the excreto-ry/secretory component, a complex mixture of proteins, carbohydrates, and lipids secreted from the surface or oral openings of the parasite. The composition of this complex mixture is for the most part unknown but is likely to contain proteins important for the parasitic lifestyle and hence suitable as drug or vaccine targets. Using a strategy combining the traditional technology of one-dimensional SDS-PAGE and the newer fractionation technology of OFFGEL electrophoresis we identified 105 proteins from the excretory/secretory products of the blood-feeding stage of the dog hookworm, Ancylostoma caninum . Highly represented among the identified proteins were lectins, including three C-type lectins and three (cid:1) -galactoside-specific S-type galectins, as well as a number of proteases belonging to the three major classes found in nematodes, aspartic, cysteine, and metalloproteases. Interestingly 28% of the identified proteins were homologous to activation-associated secreted proteins, a family of cysteine-rich secreted proteins belonging to the sterol carrier protein/Tpx-1/Ag5/ PR-1/Sc-7 (TAPS) superfamily. Thirty-four of these proteins were identified suggesting an important role in host-parasite interactions. Other protein families identified included hyaluronidases, lysozyme-like

Hookworms are blood-feeding intestinal nematode parasites. Adult worms lodge in the small intestine where they cause significant blood loss resulting in iron deficiency anemia (1). Hookworm infection is one of the major human ailments affecting ϳ600 million people worldwide (2), particularly in poor rural areas of the tropics and subtropics (3,4). Currently benzimidazole antihelminthics provide periodic removal of adult worms from patients but rapid reinfection (5), and growing drug resistance (4) necessitate novel approaches for controlling hookworm infection.
The primary interface between the hookworm and its host is the excretory/secretory (ES) 1 component, a secreted mixture of proteins and other compounds, from the oral openings or outer surfaces of the worms. In helminth parasites the ES proteins orchestrate a wide range of activities crucial for their survival and propagation. ES products are essential for penetration of the host and tissue invasion (6), feeding (7), reproduction (8), and evasion of the host immune system (9). The importance of the ES component for so many aspects of the helminth life cycle makes an understanding of its contents essential for the identification of novel vaccine (5) and drug targets.
ES products allow helminth parasites to escape a vigorous immunological response: this is achieved in numerous ways including skewing of the response phenotype to promote parasite survival and longevity (10). This manifests in different ways including induction of regulatory cytokines (11), binding to and inactivation of chemokines (12), and the alteration of the response of ES-primed dendritic cells to bacterial infection (13). Recently great interest has been shown in the possibility of harnessing the immunomodulatory effects of helminths and their ES products for the treatment of allergies or other inflammatory diseases (10). In mouse models, helminth infection has been shown to protect against allergen-induced airway hyper-responsiveness (14) and experimental colitis (15) suggesting the viability of such an approach in humans. Indeed the human hookworm, Necator americanus, is under investigation as a therapy for inflammatory bowel disease (16).
Advances in mass spectrometry and associated technologies have now made possible the rapid and sensitive determination of the protein content of biological fluids such as hookworm ES products. The traditional approach to characterizing complex protein mixtures relies on 1D or two-dimensional gel electrophoresis in concert with in-gel digestion of bands or spots and analysis with mass spectrometry. Although a very useful approach, such studies are labor-intensive and relatively insensitive because of the extensive processing of gel fragments. The relatively new technique of OGE (17)(18)(19) focuses peptides or proteins according to their isoelectric point. Focusing in this way allows for relatively large scale fractionations and reduces the processing required before mass spectrometry. It is possible using this approach to rapidly identify proteins in complex mixtures using either a shotgun approach with predigested protein fragments or on a protein-by-protein basis by focusing the mixture prior to tryptic digestion.
To date an inventory of ES proteins has only been generated in the giant sheep liver fluke Fasciola hepatica (20), the L1 larvae of Trichinella spiralis (21), the ruminant gastrointestinal parasite Haemonchus contortus (22), the filarial parasite Brugia malayi (23), adult and larval forms of Teladorsagia circumcincta (24), and the blood fluke Schistosoma mansoni (25,26). Previous efforts to identify hookworm ES proteins focused on individual proteins (or small families), often using a gene-first approach followed by production and characterization of recombinant ES proteins (e.g. Ref. 27). In this study we provide a survey of ES proteins from the adult stage of the bloodfeeding canine hookworm, Ancylostoma caninum, using a traditional proteomics approach of 1D electrophoresis followed by in-gel digestion and MS analysis and directly compare and contrast the findings with those obtained when we utilized the novel technique of OGE. The coevolution of hookworms with their mammalian hosts is so intimate that these large parasites live for many years but elicit minimal pathology when infection intensities are low. Hookworms are therefore highly "successful parasites," and much of this success can be attributed to the proteins they secrete into their environment. Determination of the protein content of the ES component will therefore provide valid targets for the development of novel therapeutics to control both hookworm infection (5) and autoimmune disorders (9).

EXPERIMENTAL PROCEDURES
The approach illustrated in Fig. 1 was followed to maximize protein identification using MS/MS data after fractionation of A. caninum ES proteins using three different methods: 1) 1D SDS-PAGE followed by in-gel tryptic digests, 2) the fractionation of whole A. caninum ES proteins using protein OGE followed by tryptic digest, and 3) tryptic fragmentation of whole A. caninum ES products followed by peptide OGE. As only 130 A. caninum proteins were contained in the nr database at NCBI as of November 2007, a database was constructed containing all A. caninum DNA sequences present in GenBank TM (138,151 sequences), and this was used for all Mascot queries.
Preparation of ES Proteins-Adult A. caninum were recovered from the small intestines of euthanized stray dogs at The University of Queensland Veterinary School. Worms were washed in PBS three times and then cultured for 3 h at 37°C with 5% CO 2 in RPMI 1640 medium, 100 units/ml penicillin G sodium, 100 g/ml streptomycin sulfate, and 0.25 g/ml amphotericin B. The culture medium was removed after 3 h, discarded, and replaced with fresh culture medium. Worms were cultured for a further 12 h before culture medium was removed and retained as the source of ES products. ES products were concentrated using a microconcentrator with a 10-kDa cutoff membrane and buffer-exchanged into PBS. Worms were visually observed regularly to ensure that all were motile during in vitro culture. Immotile worms were removed as soon as they were detected. The protein concentration of the ES products was estimated using absorbance at 280 nm.
Electrophoresis and In-gel Digestion-Two 10-l aliquots of a 9 mg/ml solution of A. caninum ES products were incubated for 2 h at 37°C with an equal volume of Laemmli sample buffer. The samples were then applied to a 0.75-mm-thick 4% stacking, 12.5% resolving gel (prepared using the Bio-Rad PROTEAN 3 system with overnight curing) for SDS-PAGE according to Laemmli (28). Electrophoresis was carried out using a maximum of 40 mA/gel and a maximum of 300 V. Proteins were stained using Coomassie Brilliant Blue and destained in 50:8.75:41.25 methanol/acetic acid/water (v/v/v).
Lanes containing ES proteins were sliced into small fragments that were twice destained by incubating in 50% acetonitrile, 25 mM NH 4 HCO 3 for 15 min at 37°C and then dried using a vacuum centrifuge. After destaining, the gel slices were resuspended in 20 mM DTT and reduced for 1 h at 65°C. The 20 mM DTT was then removed, and the samples were alkylated by the addition of 1 M iodoacetamide to a final concentration of 50 mM and incubation at 22°C in darkness for 40 min. Gel slices were washed three times for 45 min in 25 mM NH 4 HCO 3 and then dried in a vacuum centrifuge. The dried gel slices were subsequently swollen with 20 l of 40 mM NH 4 HCO 3 , 10% acetonitrile containing 20 g/ml trypsin (Sigma) for 1 h at 22°C. An additional 50 l of the same solution was added to the samples and incubated overnight at 37°C. The digest supernatant was removed from the gel slices, and residual peptides were washed from the gel slices by incubating three times with 0.1% TFA for 45 min at 37°C. The original supernatant and extracts were combined and reduced to ϳ10 l in a vacuum centrifuge before mass spectral analysis.
OFFGEL Electrophoresis-For pI-based peptide separation 240 g of ES protein was reduced by the addition of DTT to 20 mM and 10% SDS to 2% (w/v) and incubation at 65°C for 1 h. Alkylation was then achieved by adding iodoacetamide to 50 mM and incubating for 40 min in darkness at 22°C. The protein sample was co-precipitated with 1 l of a 1 g/l solution of trypsin by adding 9 volumes of methanol at Ϫ20°C. After incubation overnight at Ϫ20°C the sample was resuspended in 25 mM NH 4 HCO 3 and incubated at 37°C for 5 h with the addition of an additional 1 g of trypsin after 3 h. For protein-based OFFGEL electrophoresis, an 80-l aliquot of a 3.0 g/l solution of ES protein was buffer-exchanged into 25 mM NH 4 HCO 3 . The 3100 OFFGEL Fractionator and OFFGEL kit pH 3-10 (Agilent Technologies) with a 24-well setup were prepared according to the manufacturer's protocols. The ES protein digests and undigested protein samples were diluted in the peptide-and proteinfocusing buffers, respectively, to a final volume of 3.6 ml, and 150 l was loaded into each well. The samples were focused with a maximum current of 50 A until 50 kV-h were achieved. Peptide fractions were harvested and desalted using Millipore C 18 ZipTips, and the resulting eluate was dried down using a vacuum centrifuge, resuspended in 25 mM NH 4 HCO 3 , and dried down again before mass spectral analysis. Protein fractions were harvested, and iodoacetamide was added to 50 mM final concentration. The samples were incubated for 40 min in darkness at 22°C and then desalted using Millipore C 4 ZipTips before lyophilization and resuspension in 25 mM NH 4 HCO 3 . The protein samples were digested by incubation with 1 g of trypsin for 5 h before lyophilization and mass spectral analysis.
HPLC/MS Analysis-Five-microliter volumes of tryptic digests were used for all LC-MS analyses. LC-MS/MS analysis was performed using an Ultimate 3000 nano-LC system (Dionex) with a CAP-LC flow splitter coupled to a micrOTOF-Q (Bruker) instrument operated with a low flow electrospray needle. The column used was a Vydac monomeric C 18 5-Å 150-m ϫ 150-mm column with a flow rate of 1.2 l/min. The mobile phase buffers used for the gradient program were water with 0.1% (v/v) formic acid (A) and acetonitrile/ water (4:1) with 0.1% (v/v) formic acid (B). The gradient program consisted of 5% B for 5 min, linear ramping to 55% B over 60 min, linear ramping to 90% B over 1 min, holding at 90% B for 9 min, ramping back to 5% B over 1 min, and holding at 5% B for 20 min. The mass spectrometer scanned between 50 and 3000 m/z, and data were acquired for 50 min for each LC-MS/MS run. Data acquisition was facilitated using Hystar (Bruker), and data were processed using Data Analysis (Bruker) using the default settings. The mass spectrometer used an autoMSn methodology that collected MS2 spectra for the two most intense ions in each full scan spectrum. The scan time was set to 0.5 s for the survey scan, and the MS2 spectra recorded were the result of two microscans, giving an overall duty cycle of 2.5 s. Dynamic exclusion was used such that after two MS2 spectra the precursor would be added to an exclusion list for 1 min. This allowed the collection of the maximum number of MS2 spectra during the analysis. Calibration was performed immediately prior to the analysis using a 1:100 dilution of ES tuning mixture for an LC/mass selective detector ion trap (Agilent G2431A). Peak lists generated  (1), fractionated in a protein OGE prior to alkylation and tryptic digest (2), or reduced and alkylated using DTT and iodoacetamide before tryptic digestion and peptide OGE (3). All samples were analyzed by LC-MS/MS using a Bruker micrOTOF-Q. MS/MS data sets from the protein OGE and gel digest fractions were submitted to Mascot searches individually, whereas data sets from each well of the peptide OGE were combined into a single file before submission to Mascot.
using the default settings in Data Analysis (Bruker) were imported into a Postgres structured query language database and formatted for Mascot searching using software specifically written for the project. 2 Mascot results were also stored in this database for analysis and removal of redundant proteins. Searches were performed using version 2.2.02 of Mascot with a 20-ppm tolerance on the precursor and 0.5-amu tolerance on the product ions, allowing for methionine oxidation as a variable modification, carbamidomethylation as a fixed modification, two missed cleavages, charge states 1, 2, and 3, and trypsin as the enzyme, and the identifications were evaluated using MudPIT scoring. A threshold of 5% probability (p Ͻ 0.05) of a false positive was used for all Mascot searches, and a decoy database was used in the peptide OGE searches to estimate a false positive rate. Searches were conducted on a custom-built database consisting of 138,151 A. caninum DNA sequences (effective database size, 828,906 sequences) deposited in the NCBI databases as of November 19,2007. Single peptide identifications were verified manually, and acceptable spectra were manually annotated.
Bioinformatics Analysis-Protein descriptions were assigned to EST Mascot hits using BLASTX on the nr protein databases from NCBI (bit score Ͼ30) when the reading frame of the Mascot hit was the same as the BLAST hit. Lutefisk v1.0.5 (29) was used to derive de novo peptide sequences from high quality unassigned spectra using the default Q-TOF parameters. Spectra producing de novo sequence with a Pr(C) score greater than 0.60 were deemed to be high quality and used in Web-based MS-BLAST searches (30). Gene ontology (GO) categories were assigned using a local copy of Interproscan (version 17.0) (31), and sequence alignments were generated using ClustalW (32). The unrooted phylogenetic tree was inferred using the PHYLIP package (33) using protein sequence parsimony method to infer maximum parsimony trees on 100 bootstrap replicates and CONSENSE to infer a consensus tree. PSIPRED (34) was used to obtain secondary structure prediction at the PSIPRED Web server using the default parameters. Putative glycosylation sites were predicted using the on-line version of Prosite (35).

SDS-PAGE and In-gel Digestion of A. caninum ES Prod-
ucts-SDS-PAGE analysis of 90 g of A. caninum ES products revealed a complex mixture of proteins ranging from 250 kDa to less than 5 kDa. Staining intensity showed that the majority of the proteins in the ES products were between 5 and 15 kDa (Fig. 2), although groupings of proteins were apparent between 25-37 kDa and 75-100 kDa. Twenty-six slices of each lane from the gel were used in gel digests, and tandem MS/MS data were used in Mascot searches. Of the 26 gel slices, significant (p Ͻ 0.05) protein identifications were made in 18 using MudPIT scoring. In the case of single peptide identifications the peptide spectra were manually verified (supplemental Table 1). Identifications with shared peptides were retained if each contained at least one unique peptide above the significance threshold (supplemental Table  2), and for grouped proteins the highest scoring identification was retained. The full list of identifications is provided in supplemental Table 3. In total the non-redundant set of protein identifications numbered 29 from 12 gel slices (Fig. 2).
The apparent molecular weights of the identified proteins were calculated using a standard curve derived from the migration of the protein standards through the gel. These values were then compared with the theoretical molecular weights calculated using the BLAST-assigned protein se-2 J. Mulvenna, unpublished data.

FIG. 2.
A. caninum ES products on a 4% stacking, 12% resolving gel after SDS-PAGE and staining with Coomassie Brilliant Blue are shown. Lane 1 contains 10 l of Bio-Rad Kaleidoscope Prestained Standards, and the remaining lanes each contain ϳ90 g of ES products. Molecular weights of the standards are on the left side of the gel, and the numbering of the 26 excised gel slices is shown on the right. The non-redundant protein identifications are shown in the gel slice in which the highest Mowse score was recorded. The Mowse score is shown along with the unique peptides identified (Unique Peps.), the theoretical molecular weight determined from the database sequence (TMW), and the difference between the theoretical molecular weight and the apparent molecular weight as determined by the mobility of the protein on the SDS-PAGE gel (MW delta). The presence of a signal sequence is indicated (Sig): Y denotes the presence of a signal sequence in the database sequence as determined using SignalP, F denotes that the top BLAST hit contained a signal sequence using SignalP, and N denotes the absence of a signal sequence in either the database sequence or the highest BLAST hit. The number of predicted N-linked glycosylation sites found in the protein sequence using Prosite is shown (Glyc. Sites) along with the description of the protein identified as determined by searching the NCBI nr database using BLASTX. ND, no data. quences. Nine proteins had an apparent molecular mass at least 25 kDa greater than the theoretical molecular mass, the differences ranging between 128 and 28 kDa. No proteins migrated at a markedly lower molecular weight than predicted by the theoretical molecular weight. The presence of signal sequences in the identified proteins was determined using SignalP. If the sequence contained a complete open reading frame or just the correct N-terminal reading frame it was used for signal peptide analysis, but when only 5Ј-truncated nucleotide sequences were available, the presence of a signal peptide was inferred if homologous sequences identified by BLAST searches contained a signal sequence. When analyzed in this manner, seven (23%) of the identified proteins contained signal sequences, and 17 (57%) were homologous to full-length sequences containing signal peptides, resulting in a total of 80% of the identified proteins classified as secreted.
Protein OGE Analysis of A. caninum ES Products-ES products were then fractionated over a pH range of 3-10 using protein OGE. Each of the resulting 24 fractions was desalted with C 4 ZipTips, digested with trypsin, and analyzed with LC-MS/MS. Using the same search and validation method as above a total of 85 significant (p Ͻ 0.05) protein identifications were made in 18 OGE wells resulting in 28 unique identifications (Table I) (supplemental Tables 4 -6). The protein OGE successfully resolved 21 (75%) of the proteins into Յ2 adjacent wells, although seven proteins were identified in more than two wells. Notably the highest scoring identification, putative tissue inhibitor of metalloprotease (gi͉22347361), was identified in nine of the 24 wells. Signal sequences were found in eight of the identified sequences, and eight of the sequences were homologous to a protein containing a signal sequence, resulting in a total of 57% of identified proteins containing putative signal sequences. Eight of the identified proteins were derived from full-length nucleotide sequences, and the theoretical pI values of these proteins were calculated and compared with the apparent pI values from the well in which the highest scoring identification was made. In four cases the discrepancy between the theoretical and apparent pI values was greater than the pI range spanned by a single well (ϳ3 pI units) ( Table I).
Identification of N-Linked Glycosylation Sites-To account for discrepancies between the apparent and theoretical molecular weight and pI in the gel digests and protein OGE all sequences were scanned for putative N-linked glycosylation sites using Prosite. Glycosylation is a relatively common posttranslational modification of parasitic worm proteins (36), and when present, it may affect both the molecular weight and pI of a protein. Potential N-glycosylation sites are specific to the consensus sequence Asn-Xaa-(Ser/Thr), and the coincidence of this motif and an apparent molecular weight or pI at odds with the theoretical value suggests glycosylation. In the protein OGE 13 proteins had putative N-linked glycosylation sites (Table I), and five of these were full-length hits with a pre-sumed accurate theoretical pI calculation. All but two of the full-length proteins with a predicted glycosylation site had a difference between the theoretical pI and apparent pI of greater than 0.3 pI units. In the remaining two cases the identification was made in more than one well, indicating that in at least one of the wells a discrepancy would exist. In the gel digest 12 proteins contained predicted N-glycosylation sites (Fig. 2), and in six cases the proteins showed hindered mobility on the SDS-PAGE gel, corresponding to a difference between the apparent molecular mass and the theoretical molecular mass of at least 8 kDa.
Shotgun Analysis of A. caninum ES Products Using Peptide OGE-A final shotgun analysis of the hookworm ES proteome was then conducted by digesting a 240-l sample of hookworm ES products for peptide OGE. After MS/MS analysis a combined data set from the 24 wells was used in Mascot searches using MudPIT scoring. A search was also conducted against a decoy database resulting in a false positive probability of 4.48%, a value below the cutoff probability (p Ͻ 0.05) used to evaluate protein identifications. In total, the peptide OGE provided 136 protein identifications comprised of 84 unique identifications (Table II) (supplemental Tables  7-9). Over 90% of the peptides were successfully resolved to Յ2 wells. Signal sequences were found in 12 (14%) of the identified proteins, and 42 (50%) were found to be homologous to proteins containing a signal sequence.
Comparison of the Three Methods-When considered as a whole, the three experiments provided 105 non-redundant protein identifications. Seven proteins were identified using all methods (Fig. 3), and these comprised the highest scoring hits with an average Mowse score of 164. Each of the three methods identified proteins that were unique to that method with the peptide OGE contributing 60, the protein OGE contributing nine, and the gel digests contributing 10 unique protein identifications. The protein OGE provided the highest confidence identifications with an average Mowse score of 120 compared with 112 and 88 for the peptide OGE and gel digests, respectively.
MS-BLAST Analysis of Unassigned Peptides in the Peptide OGE-Analysis of the peptide OGE provided 4331 MS/MS spectra of which 570 were assigned as peptide sequences during Mascot searches. High quality unassigned spectra were identified by de novo sequencing resulting in 1092 significant sequences (Pr Ͼ 0.60). These sequences were submitted to the MS-BLAST server at Harvard Genetics for similarity searches against the NCBI nr protein database. Fifteen proteins were identified with statistically significant scores above 200 and with four or more high scoring segment pairs. The majority of the proteins identified were from protein families identified during the course of the Mascot searches, although three proteins (gi͉20199089, gi͉73995248, and gi͉126333582) were dissimilar to previously identified proteins (Table III).
Functional Annotation of the Identified Proteins-The combined non-redundant protein identifications from all methods  a The presence of a signal sequence in the identified database sequence is denoted with Y, F denotes that the proteins belongs to a family of proteins containing a signal sequence, and N denotes no signal sequence.
b Full length or partial sequence is denoted by yes or no; unknown refers to the situation in which no signal sequence or homologous protein was present, but the sequence contained a poly-A tail and enough sequence to provide the possibility of a full length sequence. c The difference between the theoretical pI of the protein derived from the identified database sequence and the apparent pI indicated by the well in which the identification was made. The well used in the calculation was the well containing the highest scoring Mowse score for the protein. Numbers in parentheses are negative. The difference between the theoretical pI and apparent pI of identified sequences that were not matched to a full-length nucleotide sequence was not calculated, and these are denoted with a dash (-). d Glycosylation sites. were functionally annotated using a local version of Interproscan. In total, 167 GO terms were returned from the three organizing categories of the GO database: cellular location, biological process, and molecular function. Thirty-four location ontologies were returned, and the extracellular term represented the highest category with 59% of the total. There were 47 biological process terms returned with proteolysis providing the highest scoring term (17%), although the carbohydrate metabolism term was also well represented (10%). There were more molecular function terms returned (86) than the other two categories, representing a wider variety of terms with the most returned term, catalytic activity, representing only 7% of the total ontologies. The greatest proportion of the proteins identified were homologous to the activation-associated proteins (ASPs), a group of cysteine-rich secreted proteins that belong to the SCP/TAPS family (Pfam). In total, 29 (28%) of the identified proteins were homologous to members of this family, and percent identity between the originating ESTs ranged from 42% (gi͉156184190 and gi͉59254161) to 92% (gi͉158018316 and gi͉440279). Four ASPs have been characterized in adult A. caninum, Ac-ASP- 3-6 (27), and all four were identified  a The presence of a signal sequence in the identified database sequence is denoted with Y, F denotes that the proteins belongs to a family of proteins containing a signal sequence, and N denotes no signal sequence.
here. Ac-ASP-1 and -2 are specific to infective third stage larvae (37,38) and were not identified. The SCP domain is characterized by six relatively conserved blocks (A-F) interspersed with less conserved sequence (Fig. 4A), and the SCP domains of the novel ASPs identified here followed this organization (Fig. 4B). Two residues were particularly well conserved: a Glu residue in Block B and a Trp in Block D were present in almost all sequences. The structure of one N. americanus ASP, Na-ASP-2, has been solved using x-ray crystallography (39), and a structural alignment of these proteins with Na-ASP-2 showed regions of conserved secondary structure corresponding to the conserved blocks, particularly so in Blocks A, B, D, and E (Fig. 4B). An unrooted phylogenetic tree of the SCP domains of each putative ASP was inferred using the protein sequence parsimony method. The phylogenetic tree revealed three major groups of ASPs, each containing at least one domain from a previously characterized A. caninum ASP. DISCUSSION The aims of the work described herein were 2-fold. First, we wanted to characterize the ES proteome of adult hookworms to gain insight into the molecular mechanisms by which these worms establish chronic infections in their mammalian hosts. Second, we wanted to compare the hookworm ES proteome data sets obtained when established technologies (SDS-PAGE/LC-MS/MS) were compared and contrasted with the newer technology of OGE.
In total 105 proteins were identified from A. caninum ES products, most of which have not been described previously. The cellular location ontologies in conjunction with the high percentage of proteins containing (or homologous to proteins that contain) a signal sequence confirm that the majority of proteins identified in this study are extracellular proteins. The small number of intracellular proteins identified indicates probable leakage of abundant intracellular proteins from cells that were sloughed or vomited from live worms. Serious cell damage is not likely as major structural constituents of cells such as ribosomes and cytoskeletal proteins were not present. OGE proved to be a useful technique for the determination of the protein content of hookworm ES products. Both the peptide and protein OGE experiments resulted in a greater number of proteins identified and with higher Mowse scores than did the SDS-PAGE gel digests. The increased efficacy of the OGE technique is most likely due to the paucity of sample processing required when contrasted with obtaining tryptic fragments from polyacrylamide gel fragments. The peptide OGE experiment was more effective at identifying proteins than the protein OGE, although the protein OGE provided higher Mowse scores. One explanation for this may be the saturation of the C 4 ZipTips, which were used prior to tryptic digestion in the protein OGE experiment, with highly abundant proteins. For example, the six proteins with the highest Mowse scores accounted for 55% of the total identified peptides, suggesting that future protein OGE experiments would benefit by the depletion of highly abundant proteins before the analysis.
The proteins identified in the ES products of A. caninum represent a diverse range of activities, but of particular interest for immunomodulatory activity are the lectins. Six lectins were identified; three of them were among the 10 highest scoring proteins in the peptide OGE, including three calciumdependent C-type lectins (gi͉158017101, gi͉158007955, and gi͉157990551) and three ␤-galactoside-specific S-type galectins (gi͉6450228, gi͉158012665, and gi͉156185039). This large family of carbohydrate-binding glycoproteins are widely distributed throughout the animal kingdom and are thought to be involved in many physiological processes including activation of the vertebrate immune system (40). In parasites, it has been hypothesized that lectins in the ES products may be able to subvert the host immune system by binding to carbohydrate moieties on the surface of host cells (41,42). In particular, host galectins have been shown to modulate a range of immune responses including cell recruitment, activation, development, and apoptosis (43,44), and the recent report of the eosinophil chemokinetic activity of galectins from H. contortus (45) suggests that parasite galectins may be mimicking host proteins. The relative abundance of lectins in A. caninum ES products suggests an important role for these proteins in parasite-host interactions.
ASPs, a group within the SCP/TAPS family of proteins (Pfam), were the most highly represented protein family present in the ES products. A. caninum appears to have a large repertoire of these proteins: four were characterized in the ES products of adult A. caninum using a gene-first approach (27), a further two were identified as ESTs from A. caninum gut tissue (46), and a large number were identified via downregulation (37,38) or up-regulation (47) of their mRNAs in third stage larvae (L 3 ) of A. caninum during the switch from a free-living to parasitic lifestyle. The identification of a further 25 ASPs described herein expands the total number of these proteins in A. caninum. Six ASPs in particular (gi͉158018545, gi͉158009159, gi͉158017414, gi͉158009795, gi͉15618419, and gi͉158014533) were identified in the gel digests and appear to be heavily glycosylated as evidenced by large discrepancies between the apparent and theoretical molecular weights. Several members of this family have been reported to contain predicted N-and O-linked glycosylation sites (27,37,38,48), and the evidence presented here suggests that this may be a common post-translational modification in the family, and considerable interest has been shown in glycoproteins as important elements in parasite biology, particularly in interactions with the host (36). Allelic variation of ASP-1 from A. caninum has been reported (49), and it is possible to mistake allelic variation with gene duplication and divergence. In this case, however, the percent identity between the two sequences was 98% at the nucleotide level. The highest identity between any two of the sequences reported here is 92% and somewhat lower than this between the remaining sequences, suggesting that allelic variation is not the cause of the observed sequence diversity.
The structure of Na-ASP-2, an ASP containing a single SCP domain, from the human hookworm N. americanus has been solved (39), and it shows a three-layer ␣␤␣ sandwich fold braced by five disulfide bonds, a structure repeated in other SCP domain-containing proteins (50,51). The apparent conservation of this structural framework in the ASPs described here combined with the greater variability exhibited in regions outside of the core suggests that these proteins may have evolved a number of different functionalities based on the conserved framework. Cysteine-bonded structural frameworks are a relatively common organizing principle and have been described in other protein families (41,52) where they allow for the evolution of a diverse range of activities. The large number of ASPs now reported in A. caninum may indicate that a similar process is occurring and reinforces the potential importance of these proteins in parasite biology.
Proteases were also well represented in the ES products. A. caninum ES products and/or gut tissue is known to contain proteases that are central to tissue digestion and nutrition (53). Hookworms are thought to digest hemoglobin with a cascade of aspartic, cysteine, and metalloproteases (54 -57). In this work new proteases were described belonging to the three major classes found in nematodes: one aspartic protease (gi͉158018545), three cathepsin B-like cysteine proteases (gi͉984959, gi͉158012119, and gi͉158012027), and one neprilysin-like metalloprotease (gi͉14318583) were identified, indicating that proteolytic enzymes are highly abundant in the ES products as well as in the gut cells and lumen (46). Three astacin-like zinc metalloproteases (gi͉157988693, gi͉110007378, and gi͉158019664) were also found in the ES products; these proteins appear to be involved in skin and tissue migration in L 3 larvae (6, 58) and may play a similar role in tissue degradation for the adult worm. Interestingly a protein homologous to hyaluronidase was also identified in the ES products. To our understanding, this is the first report of a purified hyaluronidase from a parasitic nematode, and the protein likely corresponds to the hyaluronidase activity reported from crude ES products (59). Hookworm hyaluronidase is thought to facilitate tissue penetration by depolymerizing mucopolysaccharide hyaluronic acid, a major component of the dermal ground substance (59).
The database used in Mascot searches was constructed from all A. caninum DNA sequences in the NCBI. This comprised ϳ130,000 genome survey sequences, ESTs, and core Only 13% of the total spectra generated during the peptide OGE were assigned by Mascot despite a large proportion of the unassigned spectra (29%) being flagged as high quality by de novo sequencing. The MS-BLAST results show that a number of these high quality spectra could have been assigned to proteins identified during MS/MS searches but were not as they were either missed by Mascot or were not represented in the database because of an incomplete nucleotide sequence. Nonetheless given the number of unassigned spectra, the presence of several novel proteins in the MS-BLAST, and the incomplete coverage of the A. caninum genome it is certain that novel proteins are yet to be identified. The work presented here expands our knowledge of the protein composition of A. caninum ES products, knowledge that is directly relatable to the human hookworms N. americanus and Ancylostoma duodenale, both serious public health problems in developing countries. The work provides a platform for molecular investigation of host-parasite interactions and compliments the rapidly expanding gene data sets for these parasites. As such many of these proteins will prove to be useful leads in antihelminthic vaccine/drug development FIG. 4. A, the domain organization of the characterized ASP from parasitic nematodes. To date, single and double SCP domain proteins have been identified. Each single domain protein contains a signal peptide, an SCP domain, and a cysteine-rich tail. Each double domain protein has two SCP domains and two cysteine-rich tail regions that are C-terminal to the SCP domains. B, an alignment of six of the identified ASPs along with the sequence of Na-ASP-2 from N. americanus (Protein Data Bank code 1U53). The conserved cysteine residues are highlighted in yellow, and the conserved blocks characteristic of the SCP domain are indicated. Secondary structure is indicated in green for helical regions and red for ␤ strands. Secondary structure was determined from the crystal structure for Na-ASP-2 and predicted using PSIPRED for the other sequences shown. The highly conserved Glu and Trp in Blocks B and D are highlighted using an asterisk. C, a phylogenetic tree inferred using the protein sequence parsimony method in the PHYLIP package. Those proteins that matched previously characterized A. caninum proteins are indicated by ASP3-6 for Ac-ASP-3 to -6, PI for platelet inhibitor, and NIF for neutrophil-inhibitory factor. The putative grouping is highlighted in gray.
or, potentially, as therapeutic agents for inflammatory or autoimmune diseases. * This research was supported by project and program grants from the National Health and Medical Research Council, Australia (NHMRC) and was undertaken using infrastructure provided by the Australian Government through the National Collaborative Research Infrastructure Strategy via Bioplatforms Australia. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.