Proteomics and Phylogenetic Analysis of the Cathepsin L Protease Family of the Helminth Pathogen Fasciola hepatica

Cathepsin L proteases secreted by the helminth pathogen Fasciola hepatica have functions in parasite virulence including tissue invasion and suppression of host immune responses. Using proteomics methods alongside phylogenetic studies we characterized the profile of cathepsin L proteases secreted by adult F. hepatica and hence identified those involved in host-pathogen interaction. Phylogenetic analyses showed that the Fasciola cathepsin L gene family expanded by a series of gene duplications followed by divergence that gave rise to three clades associated with mature adult worms (Clades 1, 2, and 5) and two clades specific to infective juvenile stages (Clades 3 and 4). Consistent with these observations our proteomics studies identified representatives from Clades 1, 2, and 5 but not from Clades 3 and 4 in adult F. hepatica secretory products. Clades 1 and 2 account for 67.39 and 27.63% of total secreted cathepsin Ls, respectively, suggesting that their expansion was positively driven and that these proteases are most critical for parasite survival and adaptation. Sequence comparison studies revealed that the expansion of cathepsin Ls by gene duplication was followed by residue changes in the S2 pocket of the active site. Our biochemical studies showed that these changes result in alterations in substrate binding and suggested that the divergence of the cathepsin L family produced a repertoire of enzymes with overlapping and complementary substrate specificities that could cleave host macromolecules more efficiently. Although the cathepsin Ls are produced as zymogens containing a prosegment and mature domain, all secreted enzymes identified by MS were processed to mature active enzymes. The prosegment region was highly conserved between the clades except at the boundary of prosegment and mature enzyme. Despite the lack of conservation at this section, sites for exogenous cleavage by asparaginyl endopeptidases and a Leu-Ser↓His motif for autocatalytic cleavage by cathepsin Ls were preserved.

The helminth pathogens Fasciola hepatica and Fasciola gigantica are the causative agents of liver fluke disease (fasciolosis) in sheep and cattle. Although infections of F. hepatica occur predominantly in regions with temperate climates, the parasite has been reported on all continents (except Antarctica) as a result of introduction by European settlers. In contrast, F. gigantica infections are largely restricted to tropical regions (1). Fasciolosis is also an important food-borne zoonotic disease of humans with estimates of 2.4 -17 million people infected worldwide; a further 91.1 million people are currently living at risk of infection (2)(3)(4). Human disease is particularly prevalent in the Andean countries of South America, Egypt, Iran, and Vietnam where farming practices allow infected animals to roam among plants used for consumption (3,4). Following ingestion of contaminated vegetation infective parasite larvae migrate from the intestine into the liver where they cause significant tissue injury and induce immunologically related damage before they move into the bile ducts. The parasites can remain for up to 1-2 years in the bile ducts of cattle and as long as 20 years in sheep (1). Studies in our laboratory have shown that the most predominant molecules secreted by F. hepatica parasites in vitro are cathepsin L cysteine proteases (5,6), and a recent analysis of bile taken from animals harboring adult parasites confirmed that these enzymes also represent the majority of protein produced in situ (7). The secretion of proteases facilitates migration of the parasite through host tissue and the degradation of host macromolecules to provide essential free amino acids for the parasite (6). Furthermore although it has been known for several decades that fasciolid parasites secrete a variety of molecules that suppress the immune re-sponses of their host (8 -10), cathepsin L proteases are considered the principle participants; the parasite enzymes cleave host antibodies specifically in the hinge region to prevent antibody-mediated cell damage (5) and alter the function of cells of the innate and adaptive cellular immune systems to suppress the development of protective Th1-driven responses (11).
The F. hepatica cathepsin L proteases are represented by a large gene family that expanded within the genus Fasciola by a series of gene duplications that resulted in a monophyletic group consisting of several discreet clades (12). The functional diversity of the various members of the gene family and their relationship to pathogen virulence and host adaptation are of particular interest (6,12). Using molecular clock analysis, Irving et al. (12) estimated that the duplications and divergence of the family occurred over the last 135 million years, and the timing of duplications correlates with the evolution of rodents, ruminants, and higher mammals. However, most of the duplications took place ϳ25 million years ago at about the time climatic conditions favored the development of grasslands and the expansion of common hosts of F. hepatica, suggesting that the divergence of the cathepsin L protease family was important in the evolution and adaptation of the parasite to a wider host range (12). At the molecular level, this divergence involved changes in residues within the active site of the enzymes in particular at positions that are known to occupy the S2 subsite and are critical to determining the substrate specificities of the proteases (6,12,13). It was suggested that these changes gave rise to proteases with overlapping and complementary specificities that allowed the parasite to degrade a wider variety of macromolecules (6,12).
In the present study, we analyzed and characterized the profile of cathepsin L proteases secreted by adult F. hepatica by two-dimensional gel electrophoresis (2-DE) 1 and MS to determine the relative importance of the various cathepsin L groups to parasite virulence and adaptation. We found that these parasites secreted cathepsin L proteases that were representative of all three adult clades identified by phylogenetics; however, the proteases of Clade 1 (FhCL1) and Clade 2 (FhCL2) were by far the most predominant, consistent with their greater divergence and expansion in the family. Proteases of Clades 3 and 4 that were identified only from the infectious juvenile stages were not detected in the adult stage secretory products suggesting a specific role for these proteases in initiating host infection through the intestinal wall. A subclade of Clade 1 (FhCL1C) for which genes are known in F. gigantica but not in F. hepatica were also not represented in the secretory products of the F. hepatica parasite, supporting the suggestion of Irving et al. (12) that this subclade expanded after the separation of the two species. Comparative biochemistry and sequence alignments showed that the F. hepatica repertoire of virulence-associated cathepsin Ls was established by a process of gene duplication followed by refinement of the active site residues that influence substrate specificity. Clade-specific variations also took place at the boundary between prosegment and mature enzyme, but specific cleavage sites required for activation of the cathepsin L zymogens were preserved.

EXPERIMENTAL PROCEDURES
Alignments and Phylogenetic Analysis-Phylogenetic trees were created using 32 selected F. hepatica and F. gigantica cathepsin L DNA sequences. Carica papaya papain (GenBank TM accession number M15203) was used to root the fasciolid sequences. All of the nucleotide sequences used for tree construction encoded the fulllength cathepsin L precursor protein including the prosegment region but excluding the signal peptide. The DNA sequences were initially aligned using ClustalW (14), and the trees were created using the bootstrapped (1000 trials) neighbor-joining method of MEGA version 4.0 (15) using the Kimura 2 parameter model with uniform rates for all sites. The GenBank accession numbers of the cathepsin L sequences used for alignment and phylogenetic analyses are as follows: (AY428949), FgCL5_thC (AF239265), FhCL5_au4 (L33772), and FhCL5_au5 (AF271385). The naming scheme reflects the different clades identified and the origin of the sequences represented by the Internet country domain.
The prosegment regions of F. hepatica and F. gigantica cathepsin L proteinases were aligned using ClustalW (14). The 26-amino acid sequences used (residues P1 to P91, fluke cathepsin L numbering) were truncated by removal of the predicted N-terminal signal peptide and the mature enzyme sequence. Amino acid consensus sequences were created by MULTALIN (16) and inserted manually into the alignment. The residues lining the S2 pocket of the active site of the various fluke cathepsin Ls were identified using a combination of primary sequence alignments (14) and analysis of the recently determined atomic structure of F. hepatica cathepsin L1 (Ref. 17; Protein Data Bank code 2O6X) and are shown in Table II.
Preparation of F. hepatica Excretory-Secretory Proteins-Adult F. hepatica, 16 weeks postinfection, were recovered from the liver tissue and bile ducts of Merino sheep (experimentally infected with 200 metacercariae) and washed in prewarmed (37°C) 0.1 M PBS, pH 7.3. Flukes were then transferred to prewarmed (37°C) RPMI 1640 medium (Invitrogen) containing 2 mM L-glutamine, 30 mM HEPES, 0.1% (w/v) glucose, and 2.5 g/ml gentamycin and incubated for 8 h at 37°C. The culture medium containing F. hepatica excretory and secretory (ES) proteins was pooled and concentrated using Centricon columns (Millipore) with a 3-kDa molecular mass cutoff to a final concentration of 1 mg/ml and stored in aliquots at Ϫ20°C. Two-dimensional Electrophoresis and Gel Imaging-F. hepatica ES proteins (100 g) were precipitated with 5 volumes of room temperature acetone and recovered by centrifugation at 5000 ϫ g for 10 min. The protein pellets were resuspended in ProteoPrep (Sigma) extraction solution 4 (7 M urea, 2 M thiourea, 1% C7B z O, and 40 mM Tris). The samples were reduced with 5 mM tributylphosphine and alkylated with 20 mM acrylamide monomer in a single 90-min step. Excess acrylamide was subsequently quenched by the addition of 10 mM DTT. For separation in the first dimension, samples (150 l) were actively loaded by rehydrating 11-cm pH 4 -7 ReadyStrip IPG strips (Bio-Rad) with 100 l of 7 M urea, 2 M thiourea, and 1% C7B z O for 30 min. IEF was carried out using a 3-h convex ramp from 100 V to 3 kV with a further 5-h linear ramp to 10 kV where the voltage was held until 100 kV-h was reached using an IsoelectrIQ 2 IEF device (Proteome Systems). After focusing, IPG strips were equilibrated in 7 M urea, 250 mM Tris-HCl (pH 8.8), and 1% SDS (w/v) for 25 min. For separation in the second dimension, the IPG strips were laid on 4 -12% Criterion XT gels (Bio-Rad) and run at 200 V for 45 min or until completion. After separation, proteins were visualized by staining with Flamingo fluorescent protein stain (Bio-Rad). Stained 2-D gels were imaged with a PharosFX laser imaging system (Bio-Rad), and normalized spot quantities were determined using PDQuest version 8.01 software (Bio-Rad). Spot quantity represents the total intensity of a spot in an image and corresponds to the total amount of protein within a spot in a gel. In the present study, Gaussian analysis was used to determine the spot quantities using the following formula: spot height ϫ ϫ x ϫ y where spot height is the peak of Gaussian representation of the spot, x is the standard deviation of the Gaussian distribution of the spot in the x axis, and y is the standard deviation of the Gaussian distribution of the spot in the y axis.
Mass Spectrometry-For excision of peptide spots, gels were overstained with colloidal Coomassie Blue G-250 (Sigma) overnight before destaining with 10% methanol (v/v) and 7% acetic acid (v/v). Selected protein spots were excised using an EXQuest robotic spot cutter with PDQuest software (Bio-Rad). The excised spots were in-gel digested with trypsin (Promega), and the peptides were analyzed by nano-LC-ESI-MS/MS using a Tempo nano-LC system (Applied Biosystems) with a C 18 column (Vydac) coupled to a QSTAR Elite hybrid quadrupole-quadrupole time-of-flight (Qq-TOF) mass spectrometer running in information-dependent acquisition mode (Applied Biosystems). Peak list files generated by the Protein Pilot version 1.0 software (Applied Biosystems) using default parameters were exported to a local MASCOT version 2.1.0 (Matrix Science) search engine for protein data base searching.
Data Base Searching-MS/MS data were used to search 3,239,079 entries in the Mass Spectrometry Protein Sequence Database (MSDB) (August 9, 2006) using MASCOT version 2.1.0 (Matrix Science) with the enzyme specificity set to trypsin. Propionamide (acrylamide) modification of cysteines was used as a fixed parameter, and oxidation of methionines was set as a variable protein modification. The mass tolerance was set at 1.0 Da for precursor ions and 0.3 Da for fragment ions. Only one missed cleavage was allowed. Matches achieving a molecular weight search (MOWSE) score Ͼ70 were considered to be significant (18,19). However, other criteria were considered in assigning a positive identification including concordance between the calculated theoretical molecular mass and pI values of the protein and the observed position of the peptide by 2-DE. To account for matches to multiple members of the Fasciola cathepsin family, we looked for peptides specific to individual enzymes or clades (see "Results").
Enzyme Assays and Kinetics with Fluorogenic Peptide Substrates-The activity of recombinant F. hepatica cathepsin L1 (FhCL1A_ie1) and cathepsin L2 (FhCL2_ie2) enzymes (20) was determined by a fluorometric assay using the synthetic substrates Z-Phe-Arg-NHMec, Z-Leu-Arg-NHMec, and Z-Pro-Arg-NHMec. Initial rates of hydrolysis of the fluorogenic dipeptide substrates were measured by monitoring the release of the fluorogenic leaving group (NHMec) at an excitation wavelength of 370 nm and an emission wavelength of 460 nm using a Bio-Tek KC4 microfluorometer. The kinetic constants k cat and K m were determined by nonlinear regression analysis. Initial rates were obtained at 37°C over a range of substrate concentrations spanning K m (0.2-200 M) and at fixed enzyme concentrations (0.5-5 nM). Assays were performed in 0.1 M PBS, pH 7.3, and in 100 mM sodium acetate buffer, pH 5.5, each containing 2.5 mM DTT and 2.5 mM EDTA.

Phylogenetic Analysis of Fasciola Cathepsin L Sequences-
The evolutionary relationship between F. hepatica and F. gigantica cathepsin L gene sequences (46 in all) deposited in the public data bases and Wellcome Trust Sanger Institute helminth data bases was investigated at the molecular level by constructing a bootstrapped neighbor-joining tree. Fourteen of the Fasciola cathepsin L sequences were not fulllength cDNAs and/or differed by only a small number of nucleotides and thus likely represent different alleles rather than individual genes (12,21). This is possible, and perhaps likely, given the occurrence of triploid fasciolids (with an extra allele available) in both temperate and tropical regions (22)(23)(24). However, Grams et al. (21) estimated from Southern blot analysis that at least 10 cathepsin L genes, formed by duplication events, exist in F. gigantica, but that other, more divergent sequences would not have been detected as a result of the stringent hybridization conditions used. Until significant genomic sequence information is available for a member of the genus, the contribution of allelic variation to the current repertoire of Fasciola cathepsin Ls remains to be fully determined.
A phylogenetic analysis of 24 F. hepatica and eight F. gigantica full-length sequences revealed that these separated into five well supported clades that arose by a series of gene duplications (Fig. 1). The two initial gene duplications separated the cathepsins isolated from the infective newly excysted juvenile parasites (Clade 3, FhCL3 and Clade 4, FhCL4) from three clades expressed in the adult worm stage (clades FhCL1, -2, and -5). Following this, another gene duplication led to the separation of the adult clades FhCL1 and FhCL5 from clade FhCL2. The phylogenetic tree also showed that the Fasciola clade FhCL1 has undergone the greatest expansion and is represented by three distinct subclades: FhCL1A, FhCL1B, and FhCL1C.
It is noteworthy that all clades contain sequences from both F. hepatica and F. gigantica. However, subclades FhCL1A and FhCL1B are composed exclusively of F. hepatica sequences, whereas clade FhCL1C contains only one F. hepatica cathepsin L. This latter sequence was identified from a Japanese isolate that is most likely a F. hepatica/F. gigantica hybrid (12,23), which indicates that clade FhCL1C may be exclusive to F. gigantica. This phylogenetic analysis, therefore, indicates that the early duplication events in the cathep-sin L gene family occurred before the speciation of the F. hepatica and F. gigantica fasciolids and that expansion of subclades FhCL1A/FhCL1B and FhCL1C occurred after the segregation of these two species. Irving et al. (12) made a similar observation and suggested that divergence of the FhCL1 clade reflected adaptation of the "temperate" F. hepatica and the "tropical" F. gigantica to different host species (see "Discussion").
Identification of Cathepsin L Proteases Secreted by Adult F. hepatica-ES proteins secreted by isolated adult F. hepatica during in vitro cultivation were precipitated from culture supernatants and analyzed by 2-DE. Using the current sample preparation method and pH range, 30 major peptide spots and 50 less intense peptides of varying molecular mass and pI values were visible on 2-D gels of F. hepatica ES proteins ( Fig.  2A). To identify the secreted cathepsin L enzymes, these 80 peptide spots were excised from 2-D gels, and a proteomics analysis was carried out using nano-LC-ESI-MS/MS. The resulting ion mass data were used to search data bases. In line with previous proteomics studies of helminth secretory proteins, a MOWSE score Ͼ70 was considered to be significant (18,19).
Although the predicted molecular mass of all F. hepatica cathepsin Ls (mature form) used to generate the phylogram is ϳ24 kDa, the predicted pI values differ considerably and range from 4.54 for FhCL2_chC to 8.13 for FhCL3_nl64 (based on conceptual translation of the cDNAs). Fifteen of the major peptide spots were definitively identified as F. hepatica cathepsin L proteases (Table I and Fig. 2B). These displayed a similar observed molecular mass of 24 kDa, whereas the pI values ranged from 4.36 (spot 20) to 6.42 (spot 7). A further eight peptides (spots 8, 9, 13, 14, 15, 16, 18, and 19) were also matched to fluke cathepsin Ls; albeit these were assigned on the basis of single peptide matches, and at present their identity could be regarded as tentative (data not shown). The position of spots 18 and 19 on the 2-D gels suggests that they had not resolved correctly, whereas spots 8, 13, 14, 15, 16, and 17 are likely the result of proteolytic degradation because they migrated at a low molecular size. The 15 definitively identified cathepsin Ls represented subclade FhCL1A (seven spots), subclade FhCL1B (four spots), clade FhCL2 (three spots), and clade FhCL5 (one spot) ( Fig. 2A).
Representatives of clades FhCL3 and FhCL4 were not detected in the adult ES proteins. However, this substantiates our phylogenetic data because sequences derived from the genes belonging to these clades have been isolated only from infective juveniles and therefore may not be expected to be expressed and secreted by adult parasites (25). F. hepatica enzymes representing clade FhCL1C were also not identified; this phylogenetic clade was only detected in F. gigantica or F. hepatica/F. gigantica hybrids (see section above).
Fasciola cathepsin Ls are highly conserved at the amino acid level; thus it is likely that the tryptic peptides will match to multiple members of the enzyme family. To account for this potential redundancy, we looked for matches to sequencespecific or clade-specific peptides. All peptide spots that were identified as Clade 1 enzymes (with the exception of spot 5) matched with a Clade 1-specific sequence, VTG-YYTVHSGSEVELK (ion m/z 590). The tryptic digest of spot 5 displayed an ion following MS/MS (m/z 590.37) that is likely to be the same peptide, although it did not match as a result of the highly stringent criteria used for data base searching in the current study. Another Clade 1-specific peptide, NSWGLSWGER (ion m/z 596), was matched in almost all  Table I.

Identification of adult F. hepatica ES proteins by nano-LC-ESI-MS/MS
Spot numbers refer to those shown in Fig. 2B. The revised fluke cathepsin L nomenclature presented in the current study is given along with the existing GenBank name and accession number for each sequence. Relative expression levels of the cathepsin L protein spots (shown as a percentage of total cathepsin L levels in the gel) are shown. these spots (with the exception of spots 1 and 17). Spot 1 was identified as FhCL1A_pe, and this was supported by the presence of an ion (m/z 621) corresponding to an amino acid sequence of NSWGSYWGER that is found only in this enzyme. Moreover there was an almost exact match between the theoretical and observed molecular mass and pI values for this peptide spot. Spots 2 and 3 were identified as Fasciola cathepsin Ls FhCL1A_pt1 and FhCL1A_pt2, respectively. The mature enzymes show 96% sequence identity at the amino acid level but were distinguished by the presence of an FhCL1A_pt2-specific ion (m/z 835 with two oxidized methionines) in the mass spectra obtained from spot 3. In the absence of any FhCL1A_pt1-specific ions from spot 2, its assignment as this enzyme must remain tentative, although the presence of the 590 and 596 ions support its identification as a Clade 1 enzyme. Spots 4 -7 were assigned to FhCL1A_tr. The amino acid sequence ESGYVTGVK that was matched with the 470 ion of spots 4 and 5 is also present in the primary sequences of FhCL1A_pt2 and FhCL1B_nl1. However, no FhCL1A_pt2-or FhCL1B_nl1-specific ions were detected in the mass spectra obtained from these spots supporting their identification as FhCL1A_tr enzymes. The 470 ion was not observed for spots 6 and 7, but the presence of the 590 and 596 ions again supports their identification as Clade 1 enzymes. Four spots were identified as FhCL1B_nl1 (spots 10 -12 and 17). An ion (m/z 724) matching an amino acid sequence of FGLETESSYPYR was present in the mass spectra from all four spots. Although this sequence is also present in Clade 5 cathepsin Ls, the presence of Clade 1-specific ions (m/z 590 and 596) supports their assignment as FhCL1B_nl1 enzymes. A further three spots were identified as the Clade 2 enzyme FhCL2_chC (spots 20 -22). The mass spectra from these spots contained several ions that match Clade 2-specific amino acid sequences including DYYYVTEVK (m/z 590), VTGYYTVHSGDEIELK (m/z 604), and LTHAVLAVGYGSQDG-TDYWIVK (m/z 798) as well as an ion (m/z 856) that matched with the FhCL2_chC-specific amino acid sequence ASASF-SEQQLVDCTR. The theoretical and observed molecular mass and pI values were also in close agreement for these spots.
Consequently the assignment of spots 20 -22 as FhCL2_chC is considered to be very robust. A single spot (spot 23) was assigned to the Clade 5 enzyme FhCL5_au5. The mass spectra from spot 23 contained two ions (m/z 724 and 863) that match peptides from Clade 5 cathepsin Ls but also match FhCL1B_nl1 and Clade 2 sequences, respectively. The absence of any other Clade 1-or Clade 2-specific ions together with the excellent match between the theoretical and observed molecular mass/pI values strongly supports the identification of spot 23 as a Clade 5 cathepsin L.
Relative Expression Levels of Secreted F. hepatica Cathepsin Ls-Densitometry was performed on peptide spots from several imaged 2-D gels of F. hepatica ES proteins to quantify the level of expression of each enzyme relative to each other. The raw data for each spot were converted to a percentage of the total amount of cathepsin L in the gels ( Fig. 1 and Table I Variation in the S2 Subsite of the Active Site and Its Influence on Enzyme Kinetics-The substrate specificity of cathepsin L cysteine proteases is determined by the composition and arrangement of amino acids that create the biochemical characteristics of the S2 subsite of the active site (26). Using primary sequence alignments and analysis of the atomic structure of F. hepatica cathepsin L1, the key residues in this pocket that interact with the P2 amino acid of the substrate have been identified as those situated at positions 67, 68, 133, 157, 160, and 205 (17). A comparison of the amino acids that occupy these positions in the various phylogenetic clades of the F. hepatica cathepsin L family is presented in Table II and reveals a number of substitutions that could have a critical impact on their substrate preferences. In particular, the most variation between the clades occurs in residues at positions 67 (Leu, Tyr, Trp, or Phe) and 205 (Leu, Val, or Phe), residues considered to be most important for substrate recognition. Variation was also observed at position 157, which lies at the opening of the S2 pocket and acts as a "gatekeeper" residue that influences the type of residue that can access the pocket (17). Interestingly the only difference found between the subclades FhCL1A and FhCL1B of clade FhCL1 is seen at position 157; however, the amino acid residing here was always a hydrophobic Leu or a Val.
To demonstrate the influence of the variations found in the S2 pocket of the active site, substrate kinetic studies were performed using purified recombinant proteases derived from sequences representing members of the major secreted F. hepatica clades, FhCL1 (FhCL1A_ie1) and FhCL2 (FhCL2_ie2). Kinetic constants were obtained for three fluorogenic peptide substrates possessing different residues at their P2 sites: Z-Phe-Arg-NHMec, Z-Leu-Arg-NHMec, and Z-Pro-Arg-NHMec (Table III). Overall F. hepatica cathepsin L1 showed an affinity (k cat /K m ) for the substrates in the order Z-Leu-Arg-NHMec Ͼ Z-Phe-Arg-NHMec Ͼ Ͼ Z-Pro-Arg-NHMec, whereas for F. hepatica cathepsin L2 the order of affinity was Z-Leu-Arg-NHMec Ͼ Ͼ Z-Phe-Arg-NHMec Ϸ Z-Pro-Arg-NHMec. Both enzymes showed higher affinities for substrates Z-Phe-Arg-NHMec and Z-Leu-Arg-NHMec at lower pH (on average a 2-fold increase in affinity at pH 5.5 compared with pH 7.3). For both these substrates the catalytic rates (k cat /K m ) of cathepsin L1 were markedly greater than those of cathepsin L2 (ϳ25 times greater for Z-Phe-Arg-NHMec and on average 8 times greater for Z-Leu-Arg-NHMec at both pH values). However, although cathepsins L1 and L2 showed a lower affinity for the substrate containing a proline at the P2 position, cathepsin L2 displayed a greater preference for Z-Pro-Arg-NHMec with an ϳ6-fold greater affinity for this substrate at pH 5.5 and 3-fold greater affinity at pH 7.3 than did cathepsin L1.
Bioinformatics Analysis of Fasciola Cathepsin L Prosegments-Cathepsin Ls are stored in specialized secretory vesicles within the parasite's gut epithelial cells as inactive zymogens consisting of a prosegment and mature enzyme domain (27). The prosegment is removed by catalytic cleavage following secretion into the parasite intestine to reveal an active mature protease (28). Our MS studies showed that all cathepsin L proteases secreted by adult F. hepatica lack the prosegment and are thus fully processed mature enzymes. Phylogenetic analyses using only the prosegment domains of the Fasciola cathepsin Ls used in the present study produced a tree similar to that presented in Fig. 1 indicating parallel adaptation of the prosegment and mature domains (data not shown). An alignment of the prosegments (residues P1 to P91) shows that the N-terminal and intermediate regions (residues P1 to P70) of these enzymes are remarkably conserved across all cathepsin L clades (Fig. 3). What is particularly noteworthy, however, is that the C-terminal portion of the F. hepatica cathepsin L prosegments (21 residues, P70 to P91) shows striking variability between the phylogenetic clades but is conserved within each clade. This is particularly evident in the final five residues that form the boundary between the prosegment and mature enzyme and gives each clade a signature sequence as shown in Fig. 3. Despite the lack of conservation in this signature section, each still reserves an Asn residue and thereby retains a highly specific cleavage site for the trans-processing enzyme asparaginyl endopeptidase that initiates or "kick-starts" the enzyme activation process (28). Another highly conserved feature within the non-conserved C-terminal region of the prosegment is a Leu-Ser2His motif that we have shown is essential for autocatalytic cleavage between cathepsin L proteases once activation is initiated by the asparaginyl endopeptidase (20). As can be seen in Fig. 3  enzyme clades of adult F. hepatica with the notable change to Leu-Ser2Arg in the adult FhCL2 clade and to Leu-Ser2Asp in the infective juvenile FhCL3 clade. DISCUSSION The longevity of a parasite species in nature is dependent on its ability to invade new hosts (29). F. hepatica, a helminth parasite of cattle and sheep, is of European origins, but its geographical distribution has expanded over the last 5 centuries as a result of global colonizations by Europeans and the associated continual export of livestock. The parasite's expansion, however, has been greatly facilitated by what seems to be a remarkable ability to adapt to new hosts; thus the parasite can develop, mature, and produce viable offspring even in very recently encountered species such as llama and alpaca in South America, camels in Africa, and kangaroos in Australia (3). F. hepatica also infects a wide variety of wild animals including deer, rabbits, hare, boars, beavers, and otters that collectively are major reservoir host populations that contribute significantly to the worldwide dissemination of the disease and to its local transmission patterns. F. gigantica diverged from F. hepatica 17 million years ago (12) and penetrated more tropical regions in Asia and the Far East where it is the predominant parasitic disease of cattle and water buffalo (4). In the last 15 years human fasciolosis has emerged as an important zoonotic disease in the Andean countries of South America, Egypt, Thailand, and Iran, all regions where animal disease is highly prevalent and farm management practices allow contamination of edible aquatic plants by the infective juvenile larvae (3,30).
Cathepsin L proteases most likely played, and continue to play, a critical role in adapting these helminth parasites to new host species (6,12,13). Fasciola cathepsin Ls have several well defined functions that are essential to parasite-host biology including degradation of host macromolecules and suppression of immune responses (for a review, see Ref. 6). In the present study we showed that adult F. hepatica secretes a range of these proteases with varying substrate specificities. Previous proteomics studies have shown that the only proteins secreted by adult worms residing in the bile ducts are cathepsin L proteases (7), and our in vitro experiments show that these are secreted in abundance (with other minor proteins that may be artifacts of the culture method) at an estimated 0.5-1.0 g/parasite/h (28). This high level of enzyme production is not surprising when it is considered that once inside the bile duct the parasite needs to digest a large quantity of host red blood cells (350 ϫ 10 6 /h) to support the enormous production of progeny (30 -50,000 eggs/day/ worm; Ref. 28). The heavy reliance on cathepsin Ls is also reflected in a high gene transcription rate, and therefore, ϳ10% of cDNAs in a ϳ5000 expressed sequence tag library prepared from adult fluke mRNA were found to encode cathepsin L proteases (Ref. 28 and Wellcome Trust Sanger Institute helminth data bases).
Phylogenetic analysis of Fasciola cathepsin L gene sequences showed that this family of proteases arose by gene duplication and divergence into five main clades (Refs. 6 and 12 and this study). However, it was not clear whether proteases from all of these clades are secreted by the parasite and hence play a role in host-parasite interaction. Therefore, we carried out a proteomics characterization of the secretory products of adult parasites with particular focus on identifying proteases. We found peptide spots representative of the adult cathepsin L clades FhCL1, FhCL2, and FhCL5, but no members of the FhCL3 and FhCL4 clades were identified. This suggests that expression of Fasciola cathepsin Ls is developmentally regulated and is consistent with the specific expression of FhCL3 and FhCL4 genes by juvenile flukes. Alternatively the FhCL3 and FhCL4 genes may encode proteases that are not secreted extracorporeally by adult F. hepatica but instead perform internal functions such as protein turnover, membrane biogenesis, or egg production.
The three adult cathepsin L clades were not represented in equal proportions at the protein expression level in the secretions, and this somewhat corresponded to the diversity of genes found in the phylogenetic studies (Figs. 1 and 2 and Table I). Clades FhCL1 and FhCL2 contain eight and four members in the current gene family and account for 67.39% (11 spots) and 27.63% (three spots) of the total secreted cathepsin Ls observed by 2-DE, respectively. In contrast, the clade FhCL5 contained three members but was represented by a single peptide spot and contributed only 4.98% to the total amount of secreted proteases. The level of gene divergence and protein production presumably reflects the relative need for each enzyme to perform the functions they play during parasite infection.
An analysis of the critical S1 and S2 active site residues of the cathepsin L family indicated that all fluke cathepsin Ls are functional (i.e. no degenerate active sites), which implies a positive evolutionary drive to create a repertoire of functionally active proteases. However, examination of the S2 subsite residues that are responsible for determining substrate specificity clearly demonstrated positive selection at the three most influential positions, i.e. at residues 67, 157, and 205. Our biochemical data using recombinant versions of Clade 1 (FhCL1A) and Clade 2 (FhCL2) proteases presented in Table  III show how significant changes in these positions can be; for example, FhCL1A (Leu-67, Val-157, and Leu-205) cleaves substrates with the hydrophobic residues (Phe and Leu) in the P2 position with catalytic rates (k cat /K m ) that are 25-and 8-fold greater, respectively, than FhCL2 (Tyr-67, Leu-157, and Leu-205). In contrast, FhCL1A exhibits a low preference for substrates with Pro in the P2 position. This suggests that the active site of the Clade 1 enzymes have opened up to accommodate larger residues and can cleave these with greater efficiency. FhCL2, on the other hand, readily accommodates a P2 Pro residue suggesting that the evolutionary trajectory followed by members of Clade 2 favored active site changes that allowed a better accommodation of the short and bulky hydrophobic residue Pro. Recently Stack et al. (17) correlated the ability to accommodate Pro in the S2 subsite in FhCL2 with the capacity to cleave native collagen, which contains a repeat motif of Gly-Pro-X. Because collagen is a predominant component of interstitial matrices, including the bile duct wall, this property would have provided the parasite with the tools for degrading and penetrating host tissues, thus enabling it to feed on blood from underlying vessels. Interestingly FhCL5, which diverged from FhCL2 prior to its divergence from FhCL1 (see Fig. 1), exhibits an intermediate S2 subsite (Leu-67, Leu-157, and Leu-205) and does not readily accommodate substrates with a P2 proline residue (31). Thus, it can be envisaged that Fasciola evolved a series of enzymes with overlapping substrate specificities to create a more efficient tissue-degrading protease secretory system.
The FhCL1A and the FhCL1B subclades of FhCL1 consist exclusively of F. hepatica sequences. In contrast, the FhCL1C subclade is represented by four cDNA sequences from F. gigantica and only one from a F. hepatica/F. gigantica hybrid identified from a Japanese isolate (12,23). This clear segregation suggests that the expansion of subclades FhCL1A/ FhCL1B and FhCL1C occurred after the divergence of F. hepatica and F. gigantica species. Irving et al. (12) suggested that the separate expansion of these subclades in the two Fasciola species reflects the adaptation of these parasites to different host breeds and species: F. hepatica typically infects cattle and sheep in temperate areas, whereas F. gigantica has penetrated tropical regions in Asia and the Far East where it is the predominant parasitic disease of cattle and water buffalo (4).
The significance of the group of cathepsin L proteases in fasciolid biology is further emphasized by the complete absence of other endoproteases in the secretory proteome of adult flukes. Previous studies have detected significant levels of transcript for cathepsin B cysteine proteases in F. hepatica and F. gigantica infective juveniles and immature liver stages, but the expression of these in adult parasites was low (32)(33)(34). Additionally immunoblotting experiments using antisera raised against a recombinant F. hepatica cathepsin B detected this protease in proteins secreted by immature parasites but not by adult parasites (35). These data imply that the endoproteolytic needs of the mature parasites residing in the bile duct, which primarily involves feeding on host blood because the parasite exists in an immunologically safe location, are met solely by the cathepsin Ls (6). By contrast, the infective parasite larvae must not only penetrate the intestinal wall of the host but also make their way across the large tissue mass of the liver and at the same time defend against immune attack. Thus, a greater range of enzyme types may be required to complete these multiple tasks. A recent report using RNA interference methodology demonstrated that knockdown of either cathepsin B or cathepsin L transcripts in infective juveniles reduced their ability to penetrate the intesti-nal wall of the host and implied a role for both enzyme types in host invasion (36). Interference of cathepsin L expression, however, had the greater impact on invasion possibly because the infective juvenile-specific cathepsin Ls are specifically adapted to this function. Cysteine proteases have also been implicated in the invasion process by the skin-penetrating parasites Schistosoma mansoni (37) and Trichobilharzia regenti (38) pointing to a common mechanism of host invasion among these trematodes.
Gene duplication is considered a prime means by which all organisms generate molecules with new functions; however, the mechanism by which this occurs has become the topic of much debate largely due to the increased availability of genome sequence information (39 -42). In the much cited early theory of gene duplication put forward by Ohno (43) the new function appears subsequent to the duplication event, and the most common fate of one of the pair of the duplicated genes is loss of function. However, more recent theories suggest that ancestral genes possess the multiple functions prior to duplication that are then preserved and refined in the duplicated genes by subsequent positive selection to evolve paralogs with distinct functions (44, 45). It is likely that the expansion of the Fasciola cathepsin L gene family arose by a mechanism of duplication and refinement given the overlap in substrate specificity of the family members. In this scenario, the gene expressed by the infective juveniles, FhCL3, is the ancestral gene that possessed the combined properties of the FhCL4, FhCL2, FhCL5, and FhCL1 lineages it gave rise to. During the refinement process, however, and in particular in the adult stage, it is possible that the FhCL5 clade, which is represented by just a few genes and is poorly expressed, became reduced in importance and that its role was gradually overtaken by the expanding and highly expressed FhCL1 and FhCL2 enzymes. The partnership between the existing FhCL1 and FhCL2 enzymes would be expected to be maintained because each possesses quite distinct specificities that provide the adult parasite with the advantage of being able to cleave a wider variety of target protein/tissue substrates and with greater efficiency.
F. hepatica cathepsins are synthesized within gastrodermal epithelial cells and stored in secretory vesicles as inactive zymogens (46). The zymogens contain an N-terminal extension or prosegment that regulates cathepsin activity by binding to the substrate cleft (47) and acts as a molecular chaperone to ensure correct folding of the enzyme (48). Activation by removal of the prosegment must take place within the parasite gut before secretion into the medium as our studies show that all the cathepsin L proteases produced by adult F. hepatica lack the prosegment. We proposed that the cleavage events that lead to the removal of the prosegment take place in two steps: (a) a bimolecular process whereby a small number of cathepsin L molecules are trans-activated by another enzyme, asparaginyl endopeptidase, which is also localized within the intestinal epithelial cells of Fasciola (49), through cleavages at the C-terminal side of Asn residues lying close to the junction of the prosegment and mature domain followed by (b) the rapid cleavage of prosegments from other cathepsin Ls by the activated cathepsin L molecules through cleavage at a Leu-Ser2His motif (20). Despite the high variability in the C-terminal region of the prosegment that forms a boundary with the mature domain, Asn residues and the Leu-Ser2His motif are preserved in this section in all clades, therefore supporting our idea that these cleavage sites are critical for zymogen processing and activation.
Our proteomics and phylogenetic analysis has shown that adult F. hepatica evolved a large repertoire of proteases of one mechanistic class, cysteine proteases, to perform various critical functions that allow the parasite to survive within the host. It is likely that the expansion of the cathepsin L family was central to the success of this parasite in terms of its ability to infect and adapt to new hosts. Because of their pivotal role in fasciolid biology cathepsin L proteases are, therefore, considered primary candidates for the development of first generation vaccines against fasciolosis (4, 6).