|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:2336-2349, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,


,**
From the
Protein Maturation, Cell Fate, and Therapeutics, Institut des Sciences du Végétal, UPR2355, CNRS, Bâtiment 23, 1 avenue de la Terrasse, F-91198 Gif-sur-Yvette cedex, France, ¶ Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322-0300, and || Department of Chemistry, Loyola University, Chicago, Illinois 60626
| ABSTRACT |
|---|
|
|
|---|
The role of NME remains poorly understood (2), but this process is recognized to be the major source of N-terminal amino acid diversity. It is thought that up to 80% of the proteins of any given proteome undergo this modification (810). Early biochemical and genetic studies indicated that this activity was very specific for peptides with an N-terminal Met residue (P1 position according to Schechters nomenclature, Ref. 11), and the penultimate position (P1') was identified as the major determinant for cleavage (for reviews, see Refs. 12 and 13). The rule of thumb is that cleavage occurs if the side chain is small enough as is the case for Ala, Cys, Pro, Ser, Thr, and Val. According to this rule, cleavage is not possible for larger side chains. This stochastic rule, which emerged from early bioinformatics analysis based on compilation of the few protein sequences available at the time (14), was confirmed by biochemical analysis of MAP activity in vitro with about a dozen model tri- and tetrapeptides (6, 15, 16). Edman degradation sequencing of two reporter proteins in E. coli was used to further the analysis (17, 18). The authors of these studies suggested that the process was statistical rather than stochastic and that cleavage efficiency was correlated with side-chain length or gyration radius as defined by Levitt (19) at P1' (20). This is fully confirmed by the structural analysis of many MAPs (3, 4). Cleavage probability was found to be highest for Gly (97%) followed by Ala, Thr (90%), Pro, Ser, and Val (84%). Cleavage was less probable for the substrates Cys (71%), Ile (18%), Asp, Leu, and Asn (16%). The underlying idea is that the S1' binding pocket, into which the P1' side chain must fit, is small and tolerates smaller side chains with Gly being the optimal residue. However, these two analyses were based on only two reporter proteins, although the comparison was straightforward as the protein sequences differed only at position P1'. These findings form the basis of our current understanding of the process (for a review, see Ref. 1; see also Fig. 1 in Ref. 21), which is used for bioinformatics analysis of the process in genomes (see Scheme 1 and Fig. 4 in Ref. 22). More recent proteomics analysis in E. coli (23, 24) has generated a comprehensive overview, and the N terminus has now been clearly determined for the products of a total of 862 open reading frames (of the 4071 proteins in the E. coli proteome). However, the conclusions of this analysis (23) conflicted with the deduced rules of NME. Unlike proteins with Ala and Ser at P1', the authors concluded that those with Gly, Pro, and Thr in this position displayed fastidious, "variable cleavage." Finally proteins with Val at P1' were found to resist NME.
|
|
The aim of this study was to reconcile the data from in vivo and in vitro analysis using combined biochemical and bioinformatics analysis to draw a definitive and coherent picture of NME and its cleavage rules in vivo. We first used E. coli as a model system, making use of the large body of in vivo sequence determination data for the direct comparison of in vitro and in vivo data. We found that proteins with Thr and Val at P1' were poor substrates of MAP1 and MAP2, frequently resisting NME cleavage, and that the P2' and P3' positions had a strong effect on specificity for both MAP types. These elements would be expected to have different effects on intrinsic protein processing in the natural proteome or the overproduction of foreign, recombinant proteins in bacteria.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Methionine Aminopeptidase Purification
Native E. coli methionine aminopeptidase (EcMAP1) was overproduced from plasmid pXL1071 (29). JM83 cells expressing the plasmid were grown at 37 °C for 812 h in 2x TY medium (16 g/liter Bacto-tryptone, 10 g/liter Bacto-yeast extract, 5 g/liter NaCl and adjusted to pH 7.0 with NaOH) supplemented with 50 µg/ml ampicillin to an A600 of
0.9. Cells were induced with 0.3 mM isopropyl 1-thio-ß-D-galactopyranoside and incubated for a further 12 h with shaking. The cells were harvested by centrifugation and resuspended in 1020 ml of buffer A consisting of 50 mM KHPO4 (pH 7.5) and 0.2 mM CoCl2. The samples were sonicated, and cell debris were removed by centrifugation. The supernatant was subjected to 080% ammonium sulfate precipitation and centrifuged for 30 min at 4 °C. The pellet was resuspended in 5 ml of buffer A, applied to a Superose-6 column (1.6 x 60 cm; GE Healthcare) and eluted at a flow rate of 0.5 ml/min in buffer A. The pool with MAP activity (30 ml) was loaded on a Q-Sepharose (1.6 x 10 cm; GE Healthcare) anion-exchange column equilibrated in buffer A, and the sample was eluted with a 0.2 M/h linear NaCl gradient (2.5 ml/min). The proteins recovered were homogeneous and were stored at 30 °C in buffer A plus 55% glycerol. P. furiosus MAP2 (PfMAP2) was purified as described elsewhere (30).
MAP Activity Measurements
MAP activity was assayed at 30 °C by continuously monitoring the absorbance of oxidized o-dianisidine at 440 nm, coupling MAP activity to both L-amino-acid oxidase and peroxidase activities, according to Scheme 1 (where Aaa is any
-amino acid).
![]() |
![]() |
![]() |
![]() |
The conditions of this assay were set according to published observations (16, 3133). The standard assay was performed in a final volume of 100 µl in plastic cuvettes with a 1-cm optical path (UV-ettes; Eppendorf). Changes in absorbance over time were followed using an Ultrospec-4000 spectrophotometer (GE Healthcare) equipped with a thermostat and a six-position Peltier heated cell changer. The reaction mixtures (95 µl) contained (final concentrations in 100 µl) 45 mM Hepes, pH 7.4, 0.2 mM CoCl2 (Sigma; catalog number C8661) unless otherwise stated (Fig. 1A), 0.1 mg/ml o-dianisidine (Sigma; catalog number D9154, solution prepared from one tablet every other day and stored at 4 °C in the dark), 3 units of horseradish peroxidase (2000 units/ml, Sigma; catalog number P8415, stored for months at 20 °C), 0.5 units of L-amino-acid oxidase (63 units/ml, Sigma; catalog number A9378, stored for months at 4 °C in the dark), and 0.0120 mM peptide (see above). This premixture was incubated for 415 min in the spectrophotometer at 30 °C until the base line at 440 nm was stable. At this point, the spectrophotometer was set to zero. The reaction was started by adding 5 µl of purified 0.120 µM (final concentration) MAP in 50 mM Hepes, pH 7.4, 0.2 mM CoCl2, and 150 mM NaCl, and the reaction was followed for 215 min. The initial velocity could generally be calculated after the first 2 min for MAP1 and the first 5 min for MAP2. This assay can measure six velocities of 0.0010.5 A440/min in parallel; this is generally sufficient to determine the catalytic constants. Above 0.5 A440/min, MAP must be diluted. The measured velocities were transformed into s1 by dividing the values by the enzyme concentration used, the molar extinction coefficient of oxidized o-dianisidine (10,580 M1·cm1 as determined experimentally by incubating L-Met in the reaction mixture and allowing the reaction to continue to completion), and the length of the optical path (1 cm). Amino-acid oxidase has a broad enough substrate specificity for the efficient oxidation of many amino acids (31). Cysteine-containing peptides were systematically avoided as they are not compatible with the assay. This incompatibility is probably due to reduction, by the thiol group of Cys, of the H2O2 produced by amino-acid oxidase and used as the substrate of peroxidase, resulting in assay inhibition. We confirmed that the coupled assay was indeed inhibited by the Cys-Gly dipeptide. In a few cases, at concentrations higher than the Km value, we noted that establishment of the stationary phase was delayed, although the associated reason was unclear. This problem was easily avoided provided kinetics was followed for a longer time.
Interpretation of Kinetic Data
The kinetic parameters kcat and Km were obtained using Enzyme Kinetics module 1.1 of Sigma plot (version 8.0) by non-linear Michaelis-Menten equation fitting. The confidence limits given are those associated with the data set. The kinetic parameter kcat/Km was derived from iterative non-linear least square fits of the Michaelis-Menten equation using the experimental data (34). Confidence limits for the fitted kcat/Km values were determined by 100 Monte Carlo iterations using the experimental standard deviations on individual measurements. We obtained similar data with the two peptides of the same sequence obtained from (i) the same supplier and (ii) two different suppliers.
Sequence Databases and Protein Pattern Syntax
The 4071 ORF sequences of E. coli were retrieved from the EcoGene website (ecogene.org/VerifiedInfo.php?download=true). The compiled N-terminal protein sequence data were retrieved from bmb.med.miami.edu/EcoGene/EcoWeb/CeSSPages/VerifiedProts.htm (see also Ref. 35). The data were ordered and classified by categories for statistical analysis. The complete proteomes of Bacillus subtilis and P. furiosus were retrieved at www.ebi.ac.uk/integr8/FtpSearch.do;jsessionid=98818969299187E6D1B9222E3B694316?orgProteomeId=6 and www.ebi.ac.uk/integr8/OrganismHomeAction.do?orgProteomeID=77, respectively.
We searched for protein patterns (36) at www.infobiogen.fr/services/analyseq/cgi-bin/patternp_in.pl. The pattern syntax used in this text was: "<M" constrains the pattern to the N-terminal residue (i.e. an initiator Met), "(^XY)" means that residues X and Y (or a subset list) are excluded, and "(Y/Z)" means that residue Y or Z (or a subset list) is included. X, Y, or Z correspond to any normal amino acid.
| RESULTS |
|---|
|
|
|---|
We first assessed the relevance of our assay with our purified EcMAP1. We evaluated the impact of the nature of the cocatalytic metal cation (see references compiled in Ref. 2) (Fig. 1A). Cobalt cations appeared to be the most efficient and were used in all MAP assays. In this study, our final goal was to measure in vitro the values of catalytic cleavage of Met cleavage (i.e. kcat/Km) and to compare these data with those derived from N-terminal sequence analysis of proteins expressed in vivo. We therefore systematically measured the kcat and Km values associated with a given peptide. Fig. 1B provides an example of fit quality and shows the relevance of fitting the Michaelis-Menten model to the data. With peptide Met-Ala-Met-Lys-Ser, the substrate most efficiently processed in this study, a catalytic efficiency value of 81,700 ± 8,000 M1·s1, associated with Km = 0.05 ± 0.01 mM and kcat = 3.9 ± 0.2 s1, was measured. These data, obtained for more than 120 different peptides (compiled data are shown in Supplemental Table S1), showed that both kcat and Km values were extremely variable with ranges covering more than 2 orders of magnitude (0.014 s1 and 0.0515 mM, respectively). It would therefore not be possible to make reliable comparisons between the data obtained for peptide concentrations of 4 and 20 mM in previous publications. These findings also suggest that the enzyme worked below the kcat/Km rate of cleavage as the steady-state peptide concentration is about 0.1 mM.
In Vitro Characterization of E. coli Methionine Aminopeptidase: the S1 Site and the Influence of Peptide Length
We assessed whether EcMAP1 could be used as a broader specificity aminopeptidase by varying the nature of the P1 side chain (Fig. 2A). All experiments were carried out with tripeptides containing Ala and Ser in the second and third positions, respectively. Met-Ala-Ser proved to be one of the most efficient substrates in our analysis (see below and Supplemental Table S1). We found that Met and its unnatural norleucine (Nle) classic mimic were the only residues at position P1 that gave efficient cleavage. This finding is consistent with the tapered shape of the S1 pocket (41). The natural amino acids Leu and Phe could be processed in vitro but with a catalytic efficiency more than 3 orders of magnitudes lower. Similar results were obtained with methionine sulfoxide (Mox) or norvaline (Nva), which features an n-propyl side chain. Decreasing the length of the side chain to
-aminobutyrate (Aba), an unusual amino acid with a two-carbon linear side chain mimicking Cys and Ser, or Ala led to resistance to enzyme cleavage. We concluded that EcMAP1 could not further process its natural substrates and that the kinetic data were representative of a single cleavage site between P1 and P1' provided that Met was the first amino acid of the peptide.
|
carbon of the second residue had a critical influence on cleavage by EcMAP1. We also investigated the impact of the length of the polypeptide chain. We first confirmed that EcMAP1 cleaved dipeptides extremely inefficiently: Met-Ala or Met-Gly were cleaved with a kcat/Km value 2 orders of magnitude lower than that for the reference tripeptide with a serine in position P2' (Supplemental Table S1). In contrast, tripeptides and larger peptides starting with Met-Gly proved to be efficient substrates (Fig. 2B). We assessed the impact of side chain and length by comparing two peptides series. The last amino acid in these two series of peptides was Gly or Met, and the linker peptide consisted of Gly. The kcat/Km value was found to be maximal for peptides more than five residues long (Fig. 2B). Comparison of the data obtained for peptides of a given peptide length between the two series indicated that the nature of the side chain at position 3 (P2' site) or 4 (P3' site) had a strong effect and that position 5 (P4' site) also had a significant effect. A thorough analysis of the impact of the amino acids at these positions is therefore required.
The S1' Subsite of EcMAP1: Difficult Cleavage of the Thr and Val Residues
MAPs are known to cleave peptides selectively between the terminal Met and the penultimate or P1' residue. In two tripeptide series, one with Ser and the other with Gly at P2', we measured the catalytic efficiency of Met cleavage for the complete set of amino acids with the exception of Cys (Fig. 3A). Ala was cleaved most efficiently followed by Ser, Gly, and Pro. Thr and Val were cleaved less efficiently with kcat/Km values 2 orders of magnitude lower in both series. We could not determine kcat/Km values for other amino acids (at least 5 orders of magnitude lower). In particular, peptides with Ile, Asn, Asp, Met, or Leu at P1' were not cleaved. These findings contrast with two reports using reporter proteins in vivo (17, 18). At this stage, we do not know whether our in vitro assay was representative of in vivo conditions or whether it was not sensitive enough. Nevertheless there was a clear relationship between gyration radius (as defined by Levitt (19)) of the side chain at P1' and catalytic efficiency as reported previously (Fig. 3A). This relationship was fully confirmed in the context of the most efficient series, the one derived from tetrapeptide MXMK (with X = Ala, Gly, Pro, Ser, Thr, Val, Asn, Ile, Leu, Asp, Glu, Phe, Gln, Lys, or Arg; data reported in Supplemental Table S1). In contrast to previous reports suggesting that Gly in the penultimate position gave the most efficient processing, our data clearly show that the side chain of Ala is optimal at P1'. We analyzed two tripeptides with unusual P1' side chains. One contained Nva, and the second contained Aba, mimicking Cys and Ser, respectively. The deduced kcat/Km values for these peptides were entirely consistent with the model. This strongly suggests that the values for Cys would be similar to that of Pro, which has the most similar gyration radius (19).
|
Met excision was not systematic with Ala, Cys, Pro, Ser, Thr, and Val at P1' as 13% of such proteins escaped Met cleavage (NME). Cleavage efficiency was maximal with Ala (97%) and minimal with Thr (69%) and Val (64%) (Fig. 3B). This efficiency was similar to that found in the kinetic analysis shown in Fig. 3A. For each type of residue found at P1', we therefore plotted in vivo cleavage efficiency as a function of EcMAP1-mediated cleavage efficiency measured in vitro. There was an extremely strong correlation between the two data sets (Fig. 3C). Thus, NME or resistance to NME was limited essentially by MAP catalytic efficiency in vivo, and the data obtained in vitro were reliable for the modeling of NME in vivo. We defined a "twilight cleavage zone" as the catalytic cleavage efficiency leading to partial cleavage efficiency in vivo (Fig. 3). Below this zone, cleavage is considered as inefficient.
The S2' Subsite of EcMAP1: Pro and Glu Are Inefficiently Cleaved Residues
We first investigated the impact of the P2' amino acid in the context of two series of tripeptides beginning with Met-Ala, the most efficient substrate of EcMAP1, and Met-Gly, an intermediate substrate for EcMAP1. The P2' side chain clearly had a strong effect on the catalytic efficiency of Met excision catalyzed by EcMAP1. Some residues, such as Pro and Glu, decreased the kcat/Km ratio by more than an order of magnitude in both series with others, such as Asp, Ile, and Thr, significantly decreasing this value by at least 1 order of magnitude (Fig. 4A). Efficiency was optimal for peptides with Trp, Met, or Ser at P2' as these residues increased the kcat/Km ratio. Other residues gave intermediate values. We next analyzed the impact of the P2' position on the efficiency of cleavage in vivo using cleavage data for this position in the 385-sequence data library. The data are displayed in Fig. 4B. They show a clear trend toward difficult cleavage when the P2' residue is Asp, Glu, Ile, Pro, or Thr. Phe and Tyr were also suggested to decrease cleavage efficiency in vivo, although this prediction was not consistent with the data for catalysis. Nevertheless the small number of peptides containing these residues resulted in the data not being robust enough for confident prediction of the in vivo behavior linked to the presence of these two aromatic amino acids at position P2' (Fig. 4B). We concluded that the kinetic data were representative of the in vivo efficiency of cleavage.
In a second series of investigations of the impact of position P2' in EcMAP1 cleavage, we assessed tripeptides with Thr or Val at position P1', corresponding to substrates for fastidious cleavage by EcMAP1 (i.e. zone leading to the twilight cleavage zone). A similar trend was observed for this series (Fig. 5A): Pro and Glu and, to a lesser extent, Asp and Thr at P2' were less likely to result in cleavage than other residues. If we considered only those proteins starting with Val or Thr at P1', the number of examples was significantly reduced, and the statistics were therefore not robust enough for significant conclusions to be drawn, but Glu and Pro seemed to reduce cleavage efficiency (Fig. 5B). We concluded that the cleavage of proteins with <M(V/T)(E/P) patterns was virtually impossible in vivo and that the cleavage of proteins with <M(V/T)(D/T) patterns was difficult (i.e."inefficient"), resulting in partial Met retention in some cases.
|
|
We then studied a set of substrates with various residues in the P1' and P2' positions (the complete data are reported in Supplemental Table S2 and displayed in Fig. 7). For P1', the data obtained were similar to those obtained with EcMAP1 (Fig. 7A) and those available from Takara with peptide series MXAAA (where X is any natural amino acid except Cys). Thus, although MAP1 and MAP2 differ in terms of their amino acid sequences, the S1' binding pockets of these two types of enzyme have similar sieve capacities. Cleavage was optimal for Ala and Pro and minimal for Thr and Val. The enzyme showed a strong preference for large hydrophobic side chains, such as those of Met and Trp, in the P2' position with strong hindrance observed for residues such as Glu and Pro and, to a lesser extent, Thr, Ile, and Asp (Fig. 7B). Some differences were observed between MAP1 and MAP2. For example, Ser and, to a lesser extent, Lys in P2' were less optimal for MAP2 cleavage than for MAP1 cleavage, but the presence of these residues is unlikely to have a major effect on cleavage efficiency.
|
The Predicted N-terminal Proteome Shows a Strong Bias toward MAP Substrates with Respect to the Proteome Deduced from the Most Abundant Proteins
The E. coli data sets studied included both in vivo and in vitro studies, making this a unique system for further functional analysis of the impact of the N-terminal residue. In a previous analysis of the E. coli NME machinery, we showed that MAP was the primary determinant of the nature of the N terminus of a protein and that the most sensitive position for EcMAP1 action was position P1' (Fig. 3). We investigated the frequency and nature of particular N-terminal residues in E. coli. We extracted all proteins with the same penultimate (P1') residue from the complete proteome of E. coli (4071 entries) and calculated the percentage of proteins with each of the amino acids in the penultimate position. We compared this data set with that calculated only for proteins with N-terminal sequences that had been determined and shown to follow the rules of NME (738 entries; Fig. 8A). All residues predicted to undergo NME, except Val, the least efficiently processed of these substrates, were underrepresented by a factor of 23 in the complete proteome with respect to the restricted proteome. These data suggest that the accumulation of proteins sensitive to NME is favored in bacteria. Using the MAP cleavage prediction tool, TermiNator2, and the complete proteomes of E. coli (representative of Gram-negative bacteria), B. subtilis (representative of Gram-positive bacteria), and P. furiosus (representative of Archaea), we predicted the N-terminal amino acid distribution. In each of these cell types, the N-terminal amino acid distribution was very similar, featuring Met as the major N-terminal residue (Fig. 8B). Residues appearing as a result of NME corresponded to less than 30% of the proteome. These data indicate that (i) E. coli is a good model for NME studies in prokaryotes and that (ii) NME involves only a minority of the proteins in a given proteome in contrast to the general belief.
|
| DISCUSSION |
|---|
|
|
|---|
NME Prediction of Natural Bacterial Substrates and Recombinant Proteins Overproduced in E. coli
The nature of position P1' in a polypeptide chain affects cleavage efficiency, depending on the nature of the amino acid, from highly efficient cleavage, as observed with Ala, to fastidious cleavage, as seen with Thr and Val. MAP kcat/Km (catalytic efficiency) and in vivo cleavage efficiency were found to be directly correlated (Fig. 3C). This explains why most proteins resistant to NME according to the usual rules of thumb have Val or Thr in the penultimate residue. Positions P2' and, to a lesser extent, P3' and P4' also affected cleavage probability with the same series of amino acids, Asp, Glu, Pro, and Thr, having a negative effect. A Glu or a Pro at P2' appeared to have the most negative effect. For the prediction of MAP cleavage in vivo directly from analysis of the N terminus of a protein, it is important to take into account whether the protein is (i) a natural protein substrate or (ii) an overproduced recombinant protein. For natural substrates, we suggest that proteins beginning with a <M(^ACGPSTV) or <M(V/T)(E/P) (note that "(^XY)" means that residues X and Y or a subset list is excluded, and "(Y/Z)" means that residue Y or Z or a subset list is included; X, Y, or Z correspond to any normal amino acid) are unlikely to be cleaved. There is a strong risk of lack of cleavage for proteins beginning with <M(G/P/V/T)(D/T)(D/E/P/T), <M(G/P/V/T)(D/T)(^DEPT)(C/E/H/P), and <M(G/P/V/T)(^DEPT)(D/E/P/T)(C/E/H/P). In this last situation, incomplete maturation in vivo is most likely to occur, resulting in two types of the protein: one, major with an N-terminal Met, and the other, minor without an N-terminal Met. There are already a few well documented examples of such behavior (23). These simple rules account for 60% of the proteins displaying as yet unexplained Met retention in the proteome of E. coli.
E. coli is widely used for the overproduction of recombinant proteins. In the last 20 years, many instances of improperly processed N termini have been reported. This recurrent issue, known as the "methionine problem" (49), has not yet been resolved. MAP cleavage is thought to become limiting because the amount of overproduced protein exceeds the capacity for MAP activity. Our findings are entirely consistent with MAP activity being the limiting factor for NME in vivo (Fig. 3C). They also suggest that the conclusions drawn from studies based on overproduced reporter proteins (17, 18) should be interpreted with caution. Unexpected complete or partial Met retention is known to lead to many defects, including an increase in antigenicity or a decrease in biological activity. This is true for the N-terminal nucleophile hydrolase (Ntn) family of enzymes, which use both the free N terminus and the corresponding Thr, Ser, and Cys side chains in the catalytic mechanism (50) as reviewed in Ref. 51. Before envisaging the production of a protein in E. coli, particularly for heterologous proteins, the NME behavior of the protein must be correctly predicted to ensure that the expected protein, with the expected N terminus, is produced. The data reported herein is particularly useful in this respect as illustrated by the following examples concerning the production of human proteins of therapeutic interest by recombinant DNA methods in E. coli.
"Unexpected" blockage of Met cleavage has been reported for granulocyte colony-stimulating factor (Met-Thr-Pro-Leu; see Ref. 52) and
- or ß-hemoglobins (Met-Val-His-Leu; see Ref. 53). In all these cases, the overproduced protein contains a fastidious P1' (Val or Thr) residue as revealed in our study (Fig. 3), and one of these proteins also contains the most negative P2' residue, Pro. Other well documented examples of Met retention include the cytokine RANTES (regulated on activation normal T cell expressed and secreted) (Met-Ser-Pro-Tyr) and the interleukins 1ß and 2 (Met-Ala-Pro-Thr/Val; see Refs. 54 and 55). For the interleukins, Met cleavage is crucial as the N-terminal Ala must interact with the C-terminal Thr-133 (56). Our study provides a rationale for the retention of Met as both Pro and Thr at positions P2' and P3' tend to inhibit Met cleavage. Another example is provided by interleukin-6, which begins with Met-Pro-Val-Pro (57). We show here that a Pro at position P3' significantly decreases the efficiency of the process. We conclude from the aforementioned examples that, when a given protein is overproduced, substrates displaying fastidious cleavage by MAP may not be completely processed by this enzyme. The prediction of MAP cleavage must therefore be more stringent than for intrinsic proteins. We suggest that processing is highly unlikely to occur for overproduced proteins beginning with <M(A/G/P/S/T)(D/E/P/T)(D/E/P/T), <M(A/G/P/S/T)(^DEPT)X(C/E/H/P), <M(A/G/P/S/T)X(^DEPT)(C/E/H/P), or <MV(D/T) and that there is a risk of non-processing for proteins beginning with <M(A/G/P/S/T)(D/E/P/T) or <MV(^DEPT), leading to partial Met retention. This rule should be useful for predicting the likelihood of cleavage for recombinant proteins. To this aim, we therefore added an option relative to the overexpression of any favorite protein in the TermiNator2 prediction tool. If the protein has an N terminus associated with a risk of non-processing, various methods may be used to prevent Met retention: (i) adjusting expression conditions to optimize MAP activity according to the rate of protein synthesis in vivo (58), (ii) in vitro processing with purified wild-type or engineered MAP (5962), (iii) dual co-overproduction of MAP and the protein of interest in vivo (25, 63, 64), (iv) processing in vitro or in vivo of the N terminus with another amino- or endoprotease (6567), or (v) fusion of the ORF to a propeptide or ubiquitin sequence and use of endogenous signal peptidase specificity (68, 69). This last strategy has also proved successful for residues that cannot be unmasked by NME (this study, Fig. 3), such as the thrombin inhibitor hirudin, which must have Ile at its N terminus for biological activity, or human growth hormone, which must have Phe at its N terminus (7073).
Comparison of NME Rules between Eubacteria and Archaea
In Archaea, most mature proteins have Ala, Met, Gly, Ser, Thr, or Pro at their N terminus (10). The N-terminal Met of proteins is believed to be frequently cleaved by MAP2 in Archaea, although few data are available to confirm this. This assumption is consistent with the in vitro data presented here (Fig. 7A). Specifically we have demonstrated that the rules governing cleavage are similar for MAP2 and MAP1, involving not only the P1' site but also the true P2' and P3' sites (Fig. 7B).
Archaea display 89% signal peptide cleavage, a frequency significantly lower than that in eubacteria (1537%; Ref. 74). NME is therefore of greater importance for exposing the N termini of most proteins. The consequences of the observed differences in MAP substrate specificity for Met cleavage in vivo should be taken into account. In Archaea, the rules governing NME do not differ significantly from those of eubacteria with similar trends observed at P1' and P2'. We conclude that the use of identical cleavage rules should make it possible to predict NME accurately in Archaea.
Physiological Relevance of Improved NME Prediction for the E. coli Proteome
By improving NME prediction for natural proteins of the bacterial proteome, we were able to model the effect of this process on the whole proteome (Fig. 8). We observed an overrepresentation of proteins undergoing NME in bacteria: those that resisted the process and had sequences starting with a Met were underrepresented (Fig. 8C).
We tried to account for this discrepancy by modeling the impact of inadequate LPR assessment. The frequency of LPR used (14.4%) was directly deduced from data obtained from the 862-protein database. Interestingly this value is identical to that obtained from bioinformatics analysis using a maximal stringency cutoff applied to the complete proteome of Haemophilus influenzae (14% in Ref. 47). H. influenzae is a Gram-negative bacterium closely related to E. coli but with a significantly lower (by a factor of 2.5) number of associated ORFs (75). Doolittle (75) estimated that the signal peptide cleavage frequency might reach 1520%, which is slightly higher than the frequency used here. A maximal value of 28% was proposed with lower stringency scoring. This value is consistent with other predictions of 25% in H. influenzae and 29% in E. coli (74). A higher frequency of signal peptide cleavage might therefore account for the bias observed in the distribution of N-terminal amino acids. It should be borne in mind that the proteins analyzed correspond to a class of proteins with a compartmentation bias as these proteins are (i) soluble (as opposed to membrane proteins) and (ii) not secreted outside the bacterium as the only secreted proteins that can be retrieved by the fractionation strategies used are those exported to the periplasmic space. LPR is thought to be frequent in membrane and secreted proteins. We therefore modeled the impact of increasing LPR frequency. Only a huge increase in the proportion of these proteins to 42% (3 times experimental values) gave a satisfactory fit of N-terminal Met frequencies to the experimental values (Fig. 8C, end of arrows in gray bars). However, the correlation with Ala, Ser, and Thr was very weak with this value. Finally this value of 42% is much higher than the upper value of 29% proposed previously (74). A value of 29% gave a much better fit to the data (Fig. 8C, gray bars). We conclude that the protein pool undergoing signal peptide cleavage is probably underestimated in our study but that this underestimation alone is unlikely to account for the trend observed toward a lower content of proteins retaining their N-terminal Met.
Most of the data for the 862-sequence compilation concerned proteins extracted by two-dimensional gel electrophoresis (23). As a result of this, these proteins were probably the most abundant proteins in the bacterium. Another possible reason for the poor correlation between the data sets in Fig. 8C is likely due to the assumption that the genes for these abundant proteins may have been selected to encode residues such as Ala or Ser at P1' so that their N-terminal Met is efficiently cleaved and recycled. Met is indeed an "expensive" amino acid to biosynthesize. The capacity to recover a majority of the N-terminal Met (i.e. especially those corresponding to the most abundant proteins) could have been favored throughout bacterial evolution (see also concluding remarks in Ref. 21 for further discussion). One could also suggest that all or some of the proteins retaining their N-terminal Met are likely to be less stable. Recent data obtained in the search for proteins cleaved by the ClpAXP proteolytic machinery of E. coli have indeed identified an N-terminal motif including an N-terminal Met. The recurring N-terminal motif <MK
X
(where
is any hydrophobic amino acid and X is any amino acid) is involved in cleavage by the bacterial ClpAXP protease (76). This hypothesis seems likely given that Lys is the most frequent amino acid in position 2 (Fig. 8A) and prevents NME. However, a less stringent motif would be required to obtain a better fit to the experimental values. It should be noted that a single proteolytic cleavage of a mature protein by cellular machinery other than a leader peptidase and ClpAXP may lead to enhanced protein degradation. This hypothesis, in which protein half-life and signal peptide prediction accounts for the amino acid bias observed in experimentally determined protein N termini with respect to proteomic predictions, may also account for the accumulation of fewer proteins with an N-Met than predicted on the basis of the genome sequence (Fig. 8B).
Generalization, Introducing a Bioinformatics Tool to Predict NME
We show here that detailed characterization of the catalytic efficiency of MAP can be used to predict the in vivo Met cleavage of any protein of a given proteome. NME prediction within the proteome is indeed crucial as a database of protein N termini as representative (i.e. about 20%) as that available for E. coli will probably never be available for any other organism. However, published data for the two cytosolic Saccharomyces cerevisiae MAPs (ScMAPs) suggest that the substrate specificity of these enzymes may differ from that of prokaryotic MAPs. For example, studies of ScMAP1 have shown fastidious processing of peptides with Pro at P1' or with Val or Thr in this position (Fig. 4A in Ref. 40). Studies with ScMAP2 and low concentrations of peptide (i.e. mimicking kcat/Km conditions) have suggested fastidious processing for peptides with Pro at P1' but not with Val in this position (Table V in Ref. 77). Thus, in higher eukaryotes, prediction is complicated by the occurrence of at least one MAP1 and one MAP2, which do not necessarily have identical specificities. If the two enzymes have similar specific activities in vivo, cleavage efficiency is likely to be increased. However, this does not appear to be the case in yeast in which MAP1 behaves as the most active, major MAP (78). Data on relative MAP expression and regulation are therefore required to complement prediction algorithms.
The associated crucial physiological consequences, such as protein stabilization associated with Met removal, highlight the importance of correctly predicting the N termini of proteins as recently reported for higher eukaryotes (48). Further detailed kinetic characterization of the actual substrate specificity of each eukaryotic MAP is required to increase the confidence of predictions made with this tool.
| FOOTNOTES |
|---|
Published, MCP Papers in Press, September 8, 2006, DOI 10.1074/mcp.M600225-MCP200
1 The abbreviations used are: NME, N-terminal Met excision; LPR, leader peptide removal; MAP, methionine aminopeptidase; Nva, norvaline; Aba,
-aminobutyrate. ![]()
* This work was supported by the CNRS (France), CNRS (France) Grant PGP04-11, Fonds National de la Science (France) Grant BCMS-275, Association pour la Recherche sur le Cancer (Villejuif, France) Grant 3603, and National Science Foundation Grant CHE-0549221. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
Both first authors contributed equally to this work. ![]()
** To whom correspondence should be addressed. Tel.: 33169823612; Fax: 33169823607; E-mail: thierry.meinnel{at}isv.cnrs-gif.fr
| REFERENCES |
|---|
|
|
|---|
-acetyl transferase families.
Trends Biochem. Sci.
23, 263
267[CrossRef][Medline]