|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 2:541-549, 2003.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
-helices with an internal, hydrophobic, ligand-binding pocket, but the beetle OBP lacks one of the disulfide bonds immediately adjacent to this pocket. To explore this difference and to sample isoform diversity, T. molitor hemolymph OBPs were fractionated by size-exclusion chromatography and reversed-phase high performance liquid chromatography. Selected fractions were reduced and alkylated, and tryptic peptides were sequenced by tandem mass spectrometry. Partial sequences of 7 different isoforms were obtained and used to clone 9 new cDNAs encoding OBPs with identities from 32 to 99%. The more divergent isoforms have numerous substitutions of hydrophobic residues that presumably alter the shape and specificity of the ligand-binding pocket. These isoforms all lack the same third disulfide bridge and are more similar to one another than to any of the 38 OBPs in Drosophila melanogaster. They have presumably arisen via gene duplication following separation of the major insect orders.
Individual insect species produce numerous OBPs, which has become increasingly evident in part from genomic and cDNA sequencing projects. For example, as many as 38 isoforms have been identified in the fly Drosophila melanogaster with deduced molecular masses ranging from 11 to 24 kDa for the monomeric isoforms (6, 7, 8). The primary sequences of these and other insect OBPs are not usually well conserved, but most members share 6 Cys and a similar patterning of hydrophobic and hydrophilic residues that defines the helical regions. Different roles have been postulated for two D. melanogaster isoforms, LUSH and PBPRP-2. LUSH- flies do not avoid high concentrations of ethanol, which suggests that LUSH has a direct role in sensory perception (9). In contrast, PBPRP-2 is only found in antennal lymph external to the sensory dendrites, which led to the suggestion that it may not be directly involved in sensory perception but instead may have a role in odorant clearance (10). However, little is known about the function of most OBPs.
It would appear that OBPs are not restricted to sensory roles as some are present in non-sensory tissues. For example, OBPs have been found in the hemolymph of the medfly Ceratitis capitata (11) and the mealworm beetle Tenebrio molitor (12). Additional examples include sericotropin from the brain of the moth Galleria mellonella (13) and the two B proteins from the accessory sex gland of T. molitor (TmolB1 and TmolB2 (14)). Although clear functions have not been ascribed to these proteins, it is likely that they also bind small hydrophobic molecules.
The structures of two OBPs have been solved. The 12-kDa T. molitor hemolymph protein (THP12, now renamed THP12a) (15) and the PBP from the moth Bombyx mori (16, 17) are both hexahelical with two or three disulfide bonds linking adjacent helices. Although their primary sequences are only 11% identical, they are clearly homologous as the largely amphipathic helices of both proteins pack in a similar manner (1.6 Å root mean square deviation between backbone atoms) to form a cavity lined with hydrophobic residues (8). The crystal structure revealed that the B. mori PBP (BmorPBP) binds the sex pheromone bombykol in this cavity. However, there are interesting differences as well. NMR analysis showed that at low pH, the significantly longer C-terminal region of BmorPBP forms an additional helix that can enter this cavity (17). This might serve as a possible mechanism to displace the ligand.
Structures are available for other proteins that are thought to have similar functions to insect OBPs. For example, the chemosensory protein of the moth Mamestra brassicae is also hexahelical with two disulfide bridges (18). However, chemosensory proteins do not share sequence similarity with OBPs; the helices are arranged differently, and the disulfide bonds appear to stabilize loops rather than to link helices. The OBPs of vertebrates are drastically different as they consist largely of an eight-stranded ß-barrel (19).
The non-sensory OBPs mentioned above, four D. melanogaster isoforms (8), and a number of as yet unpublished insect ESTs that have been deposited in GenBankTM lack one of the three disulfide bonds found in most OBPs, which links the third and sixth helix and is adjacent to the cavity. We refer to this group of proteins here as 4-Cys isoforms because they contain only 4 of the 6 conserved Cys. Previous work (12) suggested that additional OBPs were present in T. molitor hemolymph. We have investigated these isoforms to assess their diversity, to determine their relationship to the other insect 4- and 6-Cys OBPs, and to obtain additional OBPs for functional and structural studies.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Intact Protein Molecular Mass Determination
Electrospray ionization mass spectrometry was performed on a Micromass Q-TOF2 mass spectrometer (Micromass, Manchester, United Kingdom) in positive ion mode. Average molecular masses of intact proteins were determined by flow injection analysis using a Waters CapLC system with a carrier solvent of 50:50 acetonitrile:water containing 0.1% formic acid at a flow rate of 30 µl/min. Spectra were acquired in an m/z range of 6002000 using a capillary voltage of 3.5 kV, a cone voltage of 50 V, and a desolvation temperature of 250 °C. The instrument was initially calibrated using a standard solution of horse heart myoglobin (5 pmol/µl, 16,951.49 Da, Sigma). The multiply charged raw data were baseline-subtracted and deconvoluted using MaxEnt1 (22). Acquisition and data analysis were all performed using the MassLynx 3.5 software package supplied by Micromass.
Peptide Sequence Determination
Peptide sequence information was obtained on tryptic digests of HPLC-purified protein fractions using a nanospray ionization source on the Q-TOF2 instrument. Concentrated digest samples (35 µl) were sprayed from borosilicate capillaries (Type F, Micromass). The time-of-flight analyzer was calibrated using an MS/MS spectrum of [Glu1]fibrinopeptide B (Sigma). Survey and MS/MS spectra were acquired in an m/z range of 502000 using a cone voltage of 35 V and capillary voltages ranging from 700 to 900 V to optimize spray. Data-dependent acquisition parameters were set to select doubly and triply charged precursor ions for MS/MS analysis. Fragmentation was achieved using argon as the collision gas and varying the collision energy depending on the charge state and the m/z value of the precursor ion. MS/MS spectra were processed by background subtraction and deconvoluted using the MaxEnt3 module of MassLynx 3.5.
Cloning of cDNAs
The following degenerate oligonucleotides were synthesized based on high confidence peptide sequences of two isoforms: sense primer encoding ATEAGDT, 5'-GCN ACN GAR GCN GGN GAY AC-3', and antisense primer encoding PEETAFQT, 5'-GT YTG RAA NGC NGT YTC YTC NGG-3'. Approximately 2 x 104 plaque-forming units of a fat body cDNA library (23) were used as the templates in anchor PCR reactions using each of the above primers in combination with the appropriate vector primer (T3 or T7). The first round of amplification was carried out using 1 µM degenerate primer, 100 nM anchor primer, 2.5 units of Taq DNA polymerase (MBI Fermentas), 200 µM each deoxyribonucleotide triphosphate, 1.5 mM MgCl2, and the supplied buffer. Cycle conditions were as follows: initial denaturation of phage/primer mixture at 99 °C (5 min); hold at 80 °C while enzyme, buffer, and deoxyribonucleotide triphosphates were added; and 30 cycles of 95 °C for 1 min, 52 °C for 1 min, and 72 °C for 4 min. Following reamplification of 1 µl of each reaction for 25 cycles as above, Taq was removed by proteinase K digestion (20 µg/150 µl for 30 min at 37 °C) followed by phenol/chloroform extraction. After precipitation, DNA was blunt-ended with T4 DNA polymerase (New England Biolabs) as described by Sambrook et al. (24) and gel-purified (Qiagen). Fragments were subcloned into pCR®-Blunt II-TOPO (Invitrogen) and sequenced (Cortec, Kingston, Ontario, Canada). Non-degenerate primers were used to amplify only the coding portions of the two subcloned fragments and the previously isolated THP12a cDNA (12). These fragments were 32P-labeled and used to screen the cDNA library at low stringency as described previously (21) except that all washes were done at 50 °C. Plaque-purified phage were in vivo excised using R408 helper phage (Stratagene), and sequencing was performed on purified plasmid (Qiaprep miniprep kit, Qiagen).
Phylogenetic Analysis
New protein sequences were manually aligned with selected sequences from a previous alignment (12) derived using a secondary structure mask. The distance matrix was used to generate a neighbor-joining tree (25) using ClustalX (version 8.1) (26) with 1000 bootstrap trials.
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
12-kDa isoforms eluting at 33.4, 33.9, and 37.2% B (Table I) were quite similar to the previously characterized THP12a (Fig. 4). Therefore, we reasoned that the corresponding cDNAs could be obtained using THP12a as a probe. The isoforms at 38.7 and 39.6% B appeared similar to each other but quite different from THP12a, and the isoform at 42.1% B was unique. Therefore, oligonucleotides were designed to the highest confidence sequences with the lowest codon degeneracy that did not contain Leu/Ile or Lys/Gln (Fig. 4) from the isoforms at 39.6 and 42.1% B. These were used to amplify cDNA fragments from the library by anchor PCR.
Characteristics of cDNA Sequences
The larval fat body cDNA library was screened at low stringency using the cDNA encoding THP12a. A total of eight unique cDNAs (data not shown) encoding 5 additional isoforms (THP12bf, Fig. 4) were obtained. Two of these encoded the previously cloned THP12a, and although one had 3 silent changes in the coding region and a single change within the 3'-untranslated region (3'-UTR), the second was merely polyadenylated 10 bases further down. The two cDNAs that encode THP12b differ at a single silent position. Although THP12b only differs from THP12a at two amino acid positions, there are up to 8 silent changes and 5 changes within the 3'-UTR, including two insertions/deletions of one and four bases, at which the four cDNA sequences differed. The 1114 differences (2.93.6% divergence) between the cDNAs encoding THP12a and THP12b suggest either that they were derived from a recently duplicated gene or that considerable genetic diversity exists within the insect colony. The additional four isoforms encode THP12cf, all of which are more distinct and are presumably derived from four separate loci as they share only 6085% amino acid sequence identity (Figs. 4 and 5).
|
Characteristics of the New Isoforms
The cDNAs encode 10 different THP isoforms sharing 3299% sequence identity (Figs. 4 and 5). A representative sequence encoding each new isoform has been deposited in GenBankTM with accession numbers AY153772AY153780. The sequence for THP12a was deposited previously with GenBankTM/EBI accession number U24237. The deduced masses of the mature proteins encoded by nine of these 10 deduced proteins matched the experimentally determined masses of native proteins (Fig. 1).
Although we obtained intact protein and tryptic fragment masses as well as sequence data for seven proteins, only three deduced proteins (THP12b, THP13a, and THP13d) exactly matched observed proteins (Table I and Fig. 4). The other four proteins appear quite similar to those deduced from three cDNAs (THP12d, THP12e, and THP13a) as indicated by sequence and tryptic fragment mass matches. However, the intact protein masses differed, and in some cases, the mass of a single tryptic fragment also differed. These mass differences cannot easily be explained by post-translational modifications or by proteolysis of the N or C termini of the cDNA-encoded species, but they could arise from one or a few polymorphisms. The presence of numerous similar but largely uncharacterized species (Fig. 1) and the fact that only three of seven proteins appear to exactly correspond to a cDNA suggest that we have probably isolated fewer than half of all the possible cDNA sequences for this protein family.
All of the isoforms obtained contain the 4 Cys residues, which link helices 1 and 3 as well as 5 and 6 but do not contain the additional pair found in 6-Cys OBPs. Only THP12f contains an additional Cys residue in an ectopic location with no apparent partner. The similarities between the isoforms suggest that they adopt similar protein folds. For example, there is a similar patterning of hydrophobic and hydrophilic residues between isoforms, consistent with the amphipathic helical regions of the known structure. Also, both Gly and Pro are well conserved at several positions and are found primarily in the presumed loop regions.
The only differences between the predicted and observed masses were due to disulfide bond formation or the conversion of the N-terminal Gln residue to pyroglutamate (THP13bd), a modification that was also seen with the antifreeze proteins present in T. molitor hemolymph (21). The signal-peptide cleavage sites were accurately predicted using the program SignalP (28). The exception was THP13a, for which most of the signal-peptide sequence is unknown. However, enough sequence and mass information was available to determine the actual N-terminal residue, and the three residues that precede it (AQA) correspond well to the -3, -1 rule for small residues. Four isoforms (THP12ad) contain the consensus N-glycosylation signal NXS at the C terminus but would not be glycosylated as additional C-terminal residues are required for this to occur (29). However, THP12f does contain additional C-terminal residues, which may explain why a matching mass was not observed for this sequence. It may also form an intermolecular disulfide bond through its additional, unpaired Cys, but this is unlikely as this residue is expected to reside in the interior of the protein (Fig. 4).
The pattern of amino acid substitution, particularly between the 12- and 13-kDa proteins, appears to be far from random (Fig. 4). Amino acids that have important roles in protein structure, such as the 4-Cys residues, which form two disulfide bridges, as well as two Gly residues within turns are absolutely conserved. Numerous surface residues, including 4 acidic and 7 basic residues (Fig. 4, see asterisks), are also conserved and appear to be involved in forming salt bridges. However, residues found in the interior of the protein are less conserved, particularly those that line the binding pocket (Fig. 4, denoted by P). This suggests that the binding pocket may have been subject to divergent evolution and that the 12- and 13-kDa groups or subsets may bind to different classes of compounds.
Phylogenetic Comparisons
The cDNA sequences of two pairs of isoforms (THP12a versus THP12b and THP13b versus THP13c) differ by 3.6% or less, suggesting that they may be encoded at the same locus. The same cannot be said of the other six isoforms, which differ by over 15% at the amino acid level. Therefore, there is a minimum of eight different genes encoding this gene family in the beetle and likely 23x this number. The lower molecular mass isoforms (THP12af) are more similar to each other than to the higher molecular mass isoforms (THP13ad), which form a second grouping (Fig. 5). However, these hemolymph isoforms do not appear to be monophyletic as the THP13 group is more similar to the T. molitor accessory gland proteins, TmolB1 and -B2, than to the THP12 group. Nevertheless, these T. molitor isoforms are more similar to each other than to OBPs of other insects, including any of the 38 isoforms (partial data shown) found in D. melanogaster (8). Therefore, OBPs appear to be evolving rapidly, and the T. molitor isoforms were duplicated following the divergence of the major insect orders.
The data neither support nor contradict a common origin for all 4-Cys isoforms. However, the 4-Cys isoforms that have been recovered from Diptera, Coleoptera, and Lepidoptera cluster within each order (Fig. 5, dotted boxes), suggesting that the third disulfide bond is neither easily lost nor gained during evolution. The only exception would appear to be one D. melanogaster OBP (Dmel99A), a 6-Cys isoform that clusters with 4-Cys isoforms. A noticeable trend is that the 6-Cys isoforms appear to be expressed in antennae, whereas 4-Cys isoforms are frequently expressed elsewhere. More expression data will reveal whether this relationship will hold.
| DISCUSSION |
|---|
|
|
|---|
20 peaks within the HPLC profile, and the evidence suggests that the mass differences observed result from differences in amino acid sequences. We have obtained cDNAs encoding 10 unique proteins, ranging from 12,052 to 13,492 Da, which encode 4-Cys OBPs present in the hemolymph of the beetle T. molitor. These cDNAs are encoded by at least 8 different genes, but there are likely 23-fold more. Some of these additional genes might belong to the THP12 family because cDNAs encoding isoforms with similar fragment sequences (12,024- and 12,032-Da isoforms) were not obtained; also, many other masses were observed within the THP12 isoform-containing region of the HPLC profile (for example, 12,334 and 12,206) (Fig. 1). In addition, 810 bands were observed in a low stringency Southern blot of genomic DNA from individual insects probed with THP12a cDNA (12). Others may be more distinct, such as the 13,096- and 12,381-Da isoforms, for which no additional information was obtained, as they lie outside the region of the HPLC profile showing high cross-reactivity to anti-THP12a antibodies (12). Overall, this analysis has provided a good estimate of the isoform diversity within this family of proteins and has revealed that OBPs are the most abundant smaller proteins (within the 620-kDa range) in hemolymph. Currently, most known insect OBPs contain 6 Cys, and in D. melanogaster, where the whole genome is known, only four of the 38 OBPs belong to the 4-Cys group (8). Three are known from the hemolymph of the medfly C. capitata (11). The number of 4-Cys isoforms and indeed the number of OBPs in general are increasing dramatically as a result of the various genome and EST sequencing projects. However, T. molitor is the only beetle for which 4-Cys isoforms have been recovered. It is evident that the 4-Cys isoforms from T. molitor, including those found in the accessory sex gland, have arisen by gene duplication after the major insect orders arose over 300 million years ago (30) as they are more similar to each other than to any of the 38 OBPs found in D. melanogaster. This indicates that this gene family is undergoing rapid evolution and that the genes are being duplicated frequently.
The 4-Cys OBPs found in T. molitor hemolymph were more variable than we initially expected, showing as little as 32% amino acid identity. Despite this, the sequences aligned very well as there were only a few single amino acid deletions or insertions plus some variability in the length of the N and C termini. The differences, especially between more divergent isoforms, are not consistent with major changes to the hexahelical structure of the protein but are consistent with changes to the binding pocket. Therefore, it is possible that these OBPs have arisen to carry a number of different compounds specific to T. molitor or to beetles and that functionally important residues in the vicinity of the binding pocket are subject to divergent evolution. Alternatively, natural selection may have produced binding pockets of similar shape in different isoforms from different insects for the purpose of carrying the same compound. Unfortunately, the rapid evolution of insect OBPs, whether they contain 4 or 6 Cys, means that the complete evolutionary history of the 4-Cys isoforms may be very difficult to resolve because they are frequently too dissimilar to enable an accurate assessment of their relatedness.
Insects possess divergent OBPs found in a wide variety of different tissues. These insect OBPs appear to be functionally analogous to the structurally unrelated lipocalins of vertebrates, which are also highly divergent (31). Lipocalins have adopted a wide range of roles from binding compounds such as odorants, pheromones, and fatty acids to enzymatic functions in a wide variety of tissues. Lipocalins are also found in insects, but they appear to be far less abundant as only two lipocalins have been reported in D. melanogaster (32). Therefore, insect OBPs rather than lipocalins may be the major transporters of small hydrophobic compounds in insects, although some may well have adopted unexpected roles. Unfortunately, as of yet the functions of most insect OBPs have not been determined. It is our hypothesis that the beetle THPs are carriers of a number of small hydrophobic compounds that would normally be transported through the hemolymph, and we are currently working toward testing this hypothesis.
| ACKNOWLEDGMENTS |
|---|
Note Added in ProofA third OBP structure, that of the alcohol-binding protein LUSH from Drosophila, has recently been solved (see Kruse, S. W., Zhao, R., Smith, D. P., and Jones, D. N. M. (2003) Structure of a specific alcohol-binding site defined by the odorant binding protein LUSH from Drosophila melanogaster. Nat. Struct. Biol. 10, 694700).
| FOOTNOTES |
|---|
Published, MCP Papers in Press, July 25, 2003, DOI 10.1074/mcp.M300018-MCP200
1 The abbreviations used are: OBP, odorant-binding protein; EST, expressed sequence tag; HPLC, high performance liquid chromatography; MS, mass spectrometry; MS/MS, tandem mass spectrometry; PBP, pheromone-binding protein; PBPRP, PBP-related protein; THP, Tenebrio hemolymph protein; UTR, untranslated region. ![]()
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplementary material.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EBI Data Bank with accession number(s) AY153772AY153780. ![]()
|| Supported in part by a Killam Research Fellowship. ![]()
To whom correspondence should be addressed. Tel.: 1-613-533-2984; Fax: 1-613-533-2497; E-mail: grahamla{at}post.queensu.ca
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |