The Proteomics of N-terminal Methionine Cleavage*S

Methionine aminopeptidase (MAP) is a ubiquitous, essential enzyme involved in protein N-terminal methionine excision. According to the generally accepted cleavage rules for MAP, this enzyme cleaves all proteins with small side chains on the residue in the second position (P1′), but many exceptions are known. The substrate specificity of Escherichia coli MAP1 was studied in vitro with a large (>120) coherent array of peptides mimicking the natural substrates and kinetically analyzed in detail. Peptides with Val or Thr at P1′ were much less efficiently cleaved than those with Ala, Cys, Gly, Pro, or Ser in this position. Certain residues at P2′, P3′, and P4′ strongly slowed the reaction, and some proteins with Val and Thr at P1′ could not undergo Met cleavage. These in vitro data were fully consistent with data for 862 E. coli proteins with known N-terminal sequences in vivo. The specificity sites were found to be identical to those for the other type of MAPs, MAP2s, and a dedicated prediction tool for Met cleavage is now available. Taking into account the rules of MAP cleavage and leader peptide removal, the N termini of all proteins were predicted from the annotated genome and compared with data obtained in vivo. This analysis showed that proteins displaying N-Met cleavage are overrepresented in vivo. We conclude that protein secretion involving leader peptide cleavage is more frequent than generally thought.

Protein N-terminal methionine excision (NME) 1 is an essential cotranslational process that occurs in the cytoplasm of all organisms and in the two organelles (i.e. mitochondria and plastids) displaying protein synthesis (for reviews, see Refs. 1 and 2). NME involves two types of methionine aminopeptidase (MAP), MAP1 (type-I) and MAP2 (type-II), which have similar three-dimensional structures despite having only low levels of sequence identity (for a review, see Ref. 3). Higher eukaryotes have at least one MAP1 (MAP1A, also known as MetAP1b; Ref. 4) and one MAP2 in the cytoplasm and one MAP1 (MAP1D) in the organelles. Archaea, such as Pyrococcus furiosus, have one MAP2, and eubacteria, such as Escherichia coli, have only MAP1. In eubacteria and the organelles, a peptide deformylase systematically removes the N-formyl group, leading to the activation of MAP cleavage (5)(6)(7). Bacterial MAP1 cannot cleave N-blocked polypeptides (6).
The role of NME remains poorly understood (2), but this process is recognized to be the major source of N-terminal amino acid diversity. It is thought that up to 80% of the proteins of any given proteome undergo this modification (8 -10). Early biochemical and genetic studies indicated that this activity was very specific for peptides with an N-terminal Met residue (P1 position according to Schechter's nomenclature, Ref. 11), and the penultimate position (P1Ј) was identified as the major determinant for cleavage (for reviews, see Refs. 12 and 13). The rule of thumb is that cleavage occurs if the side chain is small enough as is the case for Ala, Cys, Pro, Ser, Thr, and Val. According to this rule, cleavage is not possible for larger side chains. This stochastic rule, which emerged from early bioinformatics analysis based on compilation of the few protein sequences available at the time (14), was confirmed by biochemical analysis of MAP activity in vitro with about a dozen model tri-and tetrapeptides (6,15,16). Edman degradation sequencing of two reporter proteins in E. coli was used to further the analysis (17,18). The authors of these studies suggested that the process was statistical rather than stochastic and that cleavage efficiency was correlated with side-chain length or gyration radius as defined by Levitt (19) at P1Ј (20). This is fully confirmed by the structural analysis of many MAPs (3,4). Cleavage probability was found to be highest for Gly (97%) followed by Ala, Thr (90%), Pro, Ser, and Val (84%). Cleavage was less probable for the substrates Cys (71%), Ile (18%), Asp, Leu, and Asn (16%). The underlying idea is that the S1Ј binding pocket, into which the P1Ј side chain must fit, is small and tolerates smaller side chains with Gly being the optimal residue. However, these two analyses were based on only two reporter proteins, although the comparison was straightforward as the protein sequences differed only at position P1Ј. These findings form the basis of our current understanding of the process (for a review, see Ref. 1; see also Fig. 1 in Ref. 21), which is used for bioinformatics analysis of the process in genomes (see Scheme 1 and Fig. 4 in Ref. 22). More recent proteomics analysis in E. coli (23,24) has generated a comprehensive overview, and the N terminus has now been clearly determined for the products of a total of 862 open reading frames (of the 4071 proteins in the E. coli proteome). However, the conclusions of this analysis (23) conflicted with the deduced rules of NME. Unlike proteins with Ala and Ser at P1Ј, the authors concluded that those with Gly, Pro, and Thr in this position displayed fastidious, "variable cleavage." Finally proteins with Val at P1Ј were found to resist NME.
It is therefore difficult to predict the rules of N-terminal cleavage reliably based on a compilation of all these data. For example, it is unclear whether proteins with Gly, Pro, Thr, Val, Cys, Ile, Asp, Asn, or Leu at P1Ј are cleaved and, if so, whether some of the rules for MAP cleavage have yet to be determined or whether other competing mechanisms, such as protein exportation, secretion, or insolubility due to membrane localization, may have biased analysis in vivo. This ambiguity makes reliable proteome annotations difficult for 27% of the proteins in the bacterial proteome and renders the production of recombinant proteins of therapeutic interest risky given the high antigenicity of the N terminus if incorrectly processed (25). This problem was initially encountered in the production of human hemoglobin (26). Closer examination of published biochemical analyses (6,15,16,27) showed that these analyses were not really systematic as they compared peptides of different lengths with different residues in positions 3 (P2Ј) and 4 (P3Ј). These differences may have greatly influenced data interpretation. In addition, by analogy with the cleavage rules of similar peptidases with extended recognition regions around the cleavage site, it is unclear whether MAP enzymes have a P2Ј or a P3Ј recognition site (S2Ј and S3Ј) and whether these two sites influence substrate specificity in vivo. Furthermore we noted that, in analyses in vitro, data were systematically compared at fixed peptide concentrations (usually 4 or 20 mM), although K m values are known to range from 1 to 5 mM, and the steady-state concentration of the nascent peptide is of the order of 0.1 mM (28). Thus, k cat /K m measurements (i.e. catalytic efficiency) were required for the modeling of MAP activity from in vitro studies, facilitating the comparison of coherent peptide series essential for the drawing of definitive conclusions.
The aim of this study was to reconcile the data from in vivo and in vitro analysis using combined biochemical and bioinformatics analysis to draw a definitive and coherent picture of NME and its cleavage rules in vivo. We first used E. coli as a model system, making use of the large body of in vivo sequence determination data for the direct comparison of in vitro and in vivo data. We found that proteins with Thr and Val at P1Ј were poor substrates of MAP1 and MAP2, frequently resisting NME cleavage, and that the P2Ј and P3Ј positions had a strong effect on specificity for both MAP types. These elements would be expected to have different effects on intrinsic protein processing in the natural proteome or the overproduction of foreign, recombinant proteins in bacteria.

EXPERIMENTAL PROCEDURES
Chemicals and Peptides-All chemicals were purchased from Sigma. Most peptides were synthesized as custom products at Genscript Corp. (Piscataway, NJ). Others were purchased from Bachem (Well-am-Rhein, Germany) or Sigma. All peptides were Ͼ95% pure and were dissolved in water at a final concentration of 50 -150 mM. Heating at 50°C was required to achieve dissolution in a few cases. Four peptides (MGF, MFG, MWG, MGW, and MPG) had to be dissolved in water plus dimethyl sulfoxide (5-10% final concentration) at a concentration of 3-10 mM. Each addition of 1% dimethyl sulfoxide to the MAP-coupled assay was found to decrease MAP activity by only 10%.
Methionine Aminopeptidase Purification-Native E. coli methionine aminopeptidase (EcMAP1) was overproduced from plasmid pXL1071 (29). JM83 cells expressing the plasmid were grown at 37°C for 8 -12 h in 2ϫ TY medium (16 g/liter Bacto-tryptone, 10 g/liter Bacto-yeast extract, 5 g/liter NaCl and adjusted to pH 7.0 with NaOH) supplemented with 50 g/ml ampicillin to an A 600 of ϳ 0.9. Cells were induced with 0.3 mM isopropyl 1-thio-␤-D-galactopyranoside and incubated for a further 12 h with shaking. The cells were harvested by centrifugation and resuspended in 10 -20 ml of buffer A consisting of 50 mM KHPO 4 (pH 7.5) and 0.2 mM CoCl 2 . The samples were sonicated, and cell debris were removed by centrifugation. The supernatant was subjected to 0 -80% ammonium sulfate precipitation and centrifuged for 30 min at 4°C. The pellet was resuspended in 5 ml of buffer A, applied to a Superose-6 column (1.6 ϫ 60 cm; GE Healthcare) and eluted at a flow rate of 0.5 ml/min in buffer A. The pool with MAP activity (30 ml) was loaded on a Q-Sepharose (1.6 ϫ 10 cm; GE Healthcare) anion-exchange column equilibrated in buffer A, and the sample was eluted with a 0.2 M/h linear NaCl gradient (2.5 ml/min). The proteins recovered were homogeneous and were stored at Ϫ30°C in buffer A plus 55% glycerol. P. furiosus MAP2 (PfMAP2) was purified as described elsewhere (30).
MAP Activity Measurements-MAP activity was assayed at 30°C by continuously monitoring the absorbance of oxidized o-dianisidine at 440 nm, coupling MAP activity to both L-amino-acid oxidase and peroxidase activities, according to Scheme 1 (where Aaa is any ␣-amino acid).

SCHEME 1
The conditions of this assay were set according to published observations (16,(31)(32)(33). The standard assay was performed in a final volume of 100 l in plastic cuvettes with a 1-cm optical path (UVettes; Eppendorf). Changes in absorbance over time were followed using an Ultrospec-4000 spectrophotometer (GE Healthcare) equipped with a thermostat and a six-position Peltier heated cell changer. The reaction mixtures (95 l) contained (final concentrations in 100 l) 45 mM Hepes, pH 7.4, 0.2 mM CoCl 2 (Sigma; catalog number C8661) unless otherwise stated (Fig. 1A), 0.1 mg/ml o-dianisidine (Sigma; catalog number D9154, solution prepared from one tablet every other day and stored at 4°C in the dark), 3 units of horseradish peroxidase (2000 units/ml, Sigma; catalog number P8415, stored for months at Ϫ20°C), 0.5 units of L-amino-acid oxi-dase (63 units/ml, Sigma; catalog number A9378, stored for months at 4°C in the dark), and 0.01-20 mM peptide (see above). This premixture was incubated for 4 -15 min in the spectrophotometer at 30°C until the base line at 440 nm was stable. At this point, the spectrophotometer was set to zero. The reaction was started by adding 5 l of purified 0.1-20 M (final concentration) MAP in 50 mM Hepes, pH 7.4, 0.2 mM CoCl 2 , and 150 mM NaCl, and the reaction was followed for 2-15 min. The initial velocity could generally be calculated after the first 2 min for MAP1 and the first 5 min for MAP2. This assay can measure six velocities of 0.001-0.5 A 440 /min in parallel; this is generally sufficient to determine the catalytic constants. Above 0.5 A 440 /min, MAP must be diluted. The measured velocities were transformed into s Ϫ1 by dividing the values by the enzyme concentration used, the molar extinction coefficient of oxidized o-dianisidine (10,580 M Ϫ1 ⅐cm Ϫ1 as determined experimentally by incubating L-Met in the reaction mixture and allowing the reaction to continue to completion), and the length of the optical path (1 cm). Amino-acid oxidase has a broad enough substrate specificity for the efficient oxidation of many amino acids (31). Cysteine-containing peptides were systematically avoided as they are not compatible with the assay. This incompatibility is probably due to reduction, by the thiol group of Cys, of the H 2 O 2 produced by amino-acid oxidase and used as the substrate of peroxidase, resulting in assay inhibition. We confirmed that the coupled assay was indeed inhibited by the Cys-Gly dipeptide. In a few cases, at concentrations higher than the K m value, we noted that establishment of the stationary phase was delayed, although the associated reason was unclear. This problem was easily avoided provided kinetics was followed for a longer time.
Interpretation of Kinetic Data-The kinetic parameters k cat and K m were obtained using Enzyme Kinetics module 1.1 of Sigma plot (version 8.0) by non-linear Michaelis-Menten equation fitting. The confidence limits given are those associated with the data set. The kinetic parameter k cat /K m was derived from iterative non-linear least square fits of the Michaelis-Menten equation using the experimental data (34). Confidence limits for the fitted k cat /K m values were determined by 100 Monte Carlo iterations using the experimental standard deviations on individual measurements. We obtained similar data with the two peptides of the same sequence obtained from (i) the same supplier and (ii) two different suppliers.
We searched for protein patterns (36) at www.infobiogen.fr/ services/analyseq/cgi-bin/patternp_in.pl. The pattern syntax used in this text was: "ϽM" constrains the pattern to the N-terminal residue (i.e. an initiator Met), "(^XY)" means that residues X and Y (or a subset list) are excluded, and "(Y/Z)" means that residue Y or Z (or a subset list) is included. X, Y, or Z correspond to any normal amino acid.

Validation of the Experimental in Vitro
Model, E. coli Methionine Aminopeptidase (EcMAP1)-A high throughput, substrate-independent assay was required to investigate Ec-MAP1 substrate specificity. Most of the available assays were not suitable (37)(38)(39). The only available test that appeared fully independent of the peptide sequence had only been used in a discontinuous manner. This assay involved coupling with two other activities, those of amino-acid oxidase and peroxidase (16,(31)(32)(33). We set up this assay for use in continuous conditions. This continuous version of the assay proved extremely rapid, reliable, and cheap, and all the peptides used were soluble enough for kinetic measurements. That Cyscontaining peptides inhibited the assay as a result of the reducing effect of the thiol group of Cys made it necessary to ensure that none of the peptides tested included a Cys. Other MAP activity assays are also subject to interference from Cys-containing peptides (40). However, this problem is not particularly important as Cys is the rarest amino acid at the N termini of proteins. For example, only six (0.17%) of the 4071 ORFs of E. coli are predicted to have a Cys as the second residue, 14 (0.34%) are predicted to have a Cys as the third residue, and 26 are predicted to have a Cys as the fourth residue (0.64%). Thus, our study covers 99% of the proteins of E. coli.
We first assessed the relevance of our assay with our purified EcMAP1. We evaluated the impact of the nature of the cocatalytic metal cation (see references compiled in Ref. 2) (Fig. 1A). Cobalt cations appeared to be the most efficient and were used in all MAP assays. In this study, our final goal was to measure in vitro the values of catalytic cleavage of Met cleavage (i.e. k cat /K m ) and to compare these data with those derived from N-terminal sequence analysis of proteins expressed in vivo. We therefore systematically measured the k cat and K m values associated with a given peptide. Fig. 1B provides an example of fit quality and shows the relevance of fitting the Michaelis-Menten model to the data. With peptide Met-Ala-Met-Lys-Ser, the substrate most efficiently processed in this study, a catalytic efficiency value of 81,700 Ϯ 8,000 M Ϫ1 ⅐s Ϫ1 , associated with K m ϭ 0.05 Ϯ 0.01 mM and k cat ϭ 3.9 Ϯ 0.2 s Ϫ1 , was measured. These data, obtained for more than 120 different peptides (compiled data are shown in Supplemental Table S1), showed that both k cat and K m values were extremely variable with ranges covering more than 2 orders of magnitude (0.01-4 s Ϫ1 and 0.05-15 mM, respectively). It would therefore not be possible to make reliable comparisons between the data obtained for peptide concentrations of 4 and 20 mM in previous publications. These findings also suggest that the enzyme worked below the k cat /K m rate of cleavage as the steady-state peptide concentration is about 0.1 mM.
In Vitro Characterization of E. coli Methionine Aminopeptidase: the S1 Site and the Influence of Peptide Length-We assessed whether EcMAP1 could be used as a broader specificity aminopeptidase by varying the nature of the P1 side chain ( Fig. 2A). All experiments were carried out with tripeptides containing Ala and Ser in the second and third positions, respectively. Met-Ala-Ser proved to be one of the most efficient substrates in our analysis (see below and Supplemental Table S1). We found that Met and its unnatural norleucine (Nle) classic mimic were the only residues at position P1 that gave efficient cleavage. This finding is consistent with the tapered shape of the S1 pocket (41). The natural amino acids Leu and Phe could be processed in vitro but with a catalytic efficiency more than 3 orders of magnitudes lower. Similar results were obtained with methionine sulfoxide (Mox) or norvaline (Nva), which features an n-propyl side chain. Decreasing the length of the side chain to ␣-aminobutyrate (Aba), an unusual amino acid with a two-carbon linear side chain mimicking Cys and Ser, or Ala led to resistance to enzyme cleavage. We concluded that EcMAP1 could not further process its natural substrates and that the kinetic data were representative of a single cleavage site between P1 and P1Ј provided that Met was the first amino acid of the peptide.
We investigated the role of the interface between the P1 and P1Ј sites by synthesizing three peptides derived from Met-Ala-Ser: one with 2-methyl alanine (2mA), the second with oxamic acid (no side chain but with a keto group instead, Oxa), and the third with a D-Ala in position 2 replacing L-Ala. The first two peptides were not cleaved. This was in contrast to the D-Ala variant at P1Ј that was 2 times more efficiently cleaved (Table S1). Thus, both geometry and the hydrogen environment around the ␣ carbon of the second residue had a critical influence on cleavage by EcMAP1.
We also investigated the impact of the length of the polypeptide chain. We first confirmed that EcMAP1 cleaved dipeptides extremely inefficiently: Met-Ala or Met-Gly were cleaved with a k cat /K m value 2 orders of magnitude lower than that for the reference tripeptide with a serine in position P2Ј (Supplemental Table S1). In contrast, tripeptides and larger peptides starting with Met-Gly proved to be efficient substrates (Fig. 2B). We assessed the impact of side chain and length by comparing two peptides series. The last amino acid in these two series of peptides was Gly or Met, and the linker peptide consisted of Gly. The k cat /K m value was found to be maximal for peptides more than five residues long (Fig. 2B). Comparison of the data obtained for peptides of a given peptide length between the two series indicated that the The peptide series was X-Ala-Ser where X is the indicated amino acid. A value of 100 was assigned to the k cat /K m value of Met-Ala-Ser. B, influence of peptide chain length. Two series of peptides were assayed, Met-(Gly) n Ϫ 2 -Gly and Met-(Gly) n Ϫ 2 -Met where n ϭ 3-6 corresponds to the length of the peptide. Differences in length between peptides were ensured by a poly-Gly linker. The x axis corresponds to the full-length sequence. The shortest peptide tested was three amino acids long. The k cat /K m value is plotted as a function of peptide chain length. Mox, methionine sulfoxide; Nle, norleucine. nature of the side chain at position 3 (P2Ј site) or 4 (P3Ј site) had a strong effect and that position 5 (P4Ј site) also had a significant effect. A thorough analysis of the impact of the amino acids at these positions is therefore required.
The S1Ј Subsite of EcMAP1: Difficult Cleavage of the Thr and Val Residues-MAPs are known to cleave peptides selectively between the terminal Met and the penultimate or P1Ј residue. In two tripeptide series, one with Ser and the other with Gly at P2Ј, we measured the catalytic efficiency of Met cleavage for the complete set of amino acids with the exception of Cys (Fig. 3A). Ala was cleaved most efficiently followed by Ser, Gly, and Pro. Thr and Val were cleaved less efficiently with k cat /K m values 2 orders of magnitude lower in both series. We could not determine k cat /K m values for other amino acids (at least 5 orders of magnitude lower). In particular, peptides with Ile, Asn, Asp, Met, or Leu at P1Ј were not cleaved. These findings contrast with two reports using reporter proteins in vivo (17,18). At this stage, we do not know whether our in vitro assay was representative of in vivo conditions or whether it was not sensitive enough. Nevertheless there was a clear relationship between gyration radius (as defined by Levitt (19)) of the side chain at P1Ј and catalytic efficiency as reported previously (Fig. 3A). This relationship was fully confirmed in the context of the most efficient series, the one derived from tetrapeptide MXMK (with X ϭ Ala, Gly, Pro, Ser, Thr, Val, Asn, Ile, Leu, Asp, Glu, Phe, Gln, Lys, or Arg; data reported in Supplemental Table S1). In contrast to previous reports suggesting that Gly in the penultimate position gave the most efficient processing, our data clearly show that the side chain of Ala is optimal at P1Ј. We analyzed two tripeptides with unusual P1Ј side chains. One contained Nva, and the second contained Aba, mimicking Cys and Ser, respectively. The deduced k cat /K m values for these peptides were entirely consistent with the model. This strongly suggests that the values for Cys would be similar to that of Pro, which has the most similar gyration radius (19).
In Vivo N-terminal Met Cleavage Is a Statistical Process Mostly Dependent on the Catalytic Efficiency of MAP-The data described above indicated that only peptides with Ala, Gly, Ser, Cys, or Pro at P1Ј are substrates of EcMAP1 and that the cleavage of peptides with Thr or Val at this position is less predictable, possibly depending on other features. We compared this in vitro analysis with in vivo results using compiled data from N-terminal protein sequencing (data available at ecogene.org/VerifiedInfo.php?downloadϭtrue). This analysis FIG. 3. Tight correlation between in vitro and in vivo cleavage efficiency with variation of the penultimate (P1) residue. A, influence of P1Ј in vivo and correlation with side-chain radius gyration (19). Two series were tested, Met-X-Gly and Met-X-Ser, where X is any amino acid. Cys could not be tested directly (it is therefore shown in italics here; see text); two unnatural amino acids, Aba and Nva, were included. The catalytic efficiency (k cat /K m ) is plotted as a function of the side-chain radius gyration (data from Ref. 19). The nature of the corresponding amino acid is given. showed that 14.4% of the proteins are cleaved by a signal peptidase. Cleavage usually occurs between two alanine residues located at positions 23-27 of the polypeptide sequence. The remaining proteins (85.6%) either retain or lose their N-terminal Met. We analyzed this pool of proteins (738 sequences) to draw up the rules of NME in vivo and compared the data obtained with our in vitro analysis. Cleavage was not reported with Ile, Asp, Asn, Leu, Met, or Gln at P1Ј. This fully confirmed the results of our in vitro analysis, conflicting with previous analyses based on reporter proteins (17,18). No cleavage was reported if the P1Ј residue was Arg, Phe, His, Tyr, or Trp as suggested in vitro. Finally one cleavage with Glu (42) and two with Lys (43,44) at P1Ј were reported, but these cleavages concerned only 3.0 and 1.5%, respectively, of all the sequences containing these amino acids at this position. These exceptional cases of cleavage may result from the action of dedicated acylaminopeptidases rather than MAP as shown previously for actin in animals (45,46).
This efficiency was similar to that found in the kinetic analysis shown in Fig. 3A. For each type of residue found at P1Ј, we therefore plotted in vivo cleavage efficiency as a function of EcMAP1-mediated cleavage efficiency measured in vitro. There was an extremely strong correlation between the two data sets (Fig. 3C). Thus, NME or resistance to NME was limited essentially by MAP catalytic efficiency in vivo, and the data obtained in vitro were reliable for the modeling of NME in vivo. We defined a "twilight cleavage zone" as the catalytic cleavage efficiency leading to partial cleavage efficiency in vivo (Fig. 3). Below this zone, cleavage is considered as inefficient.
The S2Ј Subsite of EcMAP1: Pro and Glu Are Inefficiently Cleaved Residues-We first investigated the impact of the P2Ј amino acid in the context of two series of tripeptides beginning with Met-Ala, the most efficient substrate of EcMAP1, and Met-Gly, an intermediate substrate for EcMAP1. The P2Ј side chain clearly had a strong effect on the catalytic efficiency of Met excision catalyzed by EcMAP1. Some residues, such as Pro and Glu, decreased the k cat /K m ratio by more than an order of magnitude in both series with others, such as Asp, Ile, and Thr, significantly decreasing this value by at least 1 FIG. 4. Comparison of the impact of the nature of the P2 position on NME in vitro and in vivo. A, influence of the P2Ј position in vitro. The data came from two data sets, one with Met-Ala-X (black bars) and the second with Met-Gly-X (gray bars), where X is any natural amino acid. The relative catalytic efficiency (k cat /K m ) is plotted as a function of X. A value of 100 was assigned to X ϭ Ser in both series. A black or gray asterisk (*) indicates that the corresponding peptide in either series was not characterized. B, influence of the P2Ј position in vivo. Data were taken from the 862-sequence database of proteins with determined Nterminal sequences. The efficiency of Met cleavage was defined as the percentage of proteins with a ϽM(A/C/G/P/ S/T/V) pattern resistant to Met cleavage (385 instances). The general mean value of 87% was used to display differences optimally. Given the small number of instances, an error bar corresponding to one more or less uncleaved example was introduced. Arrows indicate the amino acids leading to significant decrease of cleavage efficiency in the series.
order of magnitude (Fig. 4A). Efficiency was optimal for peptides with Trp, Met, or Ser at P2Ј as these residues increased the k cat /K m ratio. Other residues gave intermediate values. We next analyzed the impact of the P2Ј position on the efficiency of cleavage in vivo using cleavage data for this position in the 385-sequence data library. The data are displayed in Fig. 4B. They show a clear trend toward difficult cleavage when the P2Ј residue is Asp, Glu, Ile, Pro, or Thr. Phe and Tyr were also suggested to decrease cleavage efficiency in vivo, although this prediction was not consistent with the data for catalysis. Nevertheless the small number of peptides containing these residues resulted in the data not being robust enough for confident prediction of the in vivo behavior linked to the presence of these two aromatic amino acids at position P2Ј (Fig. 4B). We concluded that the kinetic data were representative of the in vivo efficiency of cleavage.
In a second series of investigations of the impact of position P2Ј in EcMAP1 cleavage, we assessed tripeptides with Thr or Val at position P1Ј, corresponding to substrates for fastidious cleavage by EcMAP1 (i.e. zone leading to the twilight cleavage zone). A similar trend was observed for this series (Fig.  5A): Pro and Glu and, to a lesser extent, Asp and Thr at P2Ј were less likely to result in cleavage than other residues. If we considered only those proteins starting with Val or Thr at P1Ј, the number of examples was significantly reduced, and the statistics were therefore not robust enough for significant conclusions to be drawn, but Glu and Pro seemed to reduce cleavage efficiency (Fig. 5B). We concluded that the cleavage of proteins with ϽM(V/T)(E/P) patterns was virtually impossi-ble in vivo and that the cleavage of proteins with ϽM(V/T)(D/T) patterns was difficult (i.e."inefficient"), resulting in partial Met retention in some cases.
The S3Ј Subsite of EcMAP1 Shows Negative Discrimination Similar to That for the S2Ј Subsite-In a final systematic search for EcMAP1 cleavage criteria, we investigated the role of position P3Ј and P4Ј. Data were obtained with a tetrapeptide, Met-Gly-Met-X, in which position X (P3Ј) was varied with all natural amino acids (except Cys). Significant differences were observed, but only Asp, Glu, Pro, and Thr, i.e. the same residues as for P2Ј, resulted in a significantly lower cleavage efficiency (Fig. 6A). Another set of data were obtained with a pentapeptide, Met-Gly-Gly-Gly-X, in which position X (P4Ј) was varied with a reduced set of 10 amino acids (Fig. 6B). Phe was the most efficient, and Glu (again) was the least efficient residue. Cleavage efficiency was only modestly increased when peptides Met-Ala-Met-Lys and Met-Ala-Met-Lys-Ser were compared (Supplemental Table S1). The impact of P3Ј or P4Ј in vivo was investigated (Supplemental Fig. S1). Concerning position P4Ј, a negative influence of Cys and to a lesser extent Glu and His emerged (Supplemental Fig. S1B). The set of fastidious cleavage substrates with Val and Thr at position P1Ј was next investigated. The very small size and the composition of the sample analyzed with Val and Thr (only 91 cases covering 18 different residues) resulted in less conclusive data than for position P2Ј (Supplemental Fig. S1AB). Nevertheless the negative impact of Cys, Glu, His, and Pro at P4Ј appeared clearly (Supplemental Fig. S1C).

FIG. 5. Impact of the nature of P2 on NME when P1 is a residue processed inefficiently (i.e. Thr or Val) for MAP. A,
influence of the P2Ј position in vitro. The data were obtained from two data sets, one with Met-Val-X (black bars) and the second with Met-Thr-X (gray bars), where X is the indicated natural amino acid. The relative catalytic efficiency (k cat /K m ) is plotted as a function of X. A value of 100 was assigned to Met-Ala-Ser. A black or gray asterisk (*) indicates that the corresponding peptide in either series was not characterized. B, influence of the P2Ј position in vivo. Data were taken from the 862-sequence database of proteins with determined Nterminal sequences. The efficiency of Met cleavage was defined as the percentage of proteins with a ϽM(A/C/G/P/ S/T/V) pattern resistant to Met cleavage. Arrows indicate the amino acids leading to significant decrease of cleavage efficiency in the series.

Cleavage Specificity Is Very Similar for Type-1 and Type-2
MAPs-We extended our analysis to type-2 MAP, performing the same analysis with P. furiosus MAP (PfMAP2), an archaeal enzyme. Little is known about the cleavage capacity of Pf-MAP2. Cleavage data are reported in the Takara catalog (catalog number 7335), but no detailed protocol is described. We extended our study aiming to confirm the validity of our peptide array by studying PfMAP2-mediated cleavage in detail. This enzyme was characterized under the same conditions as EcMAP1. A catalytic efficiency of 44,275 Ϯ 10,000 M Ϫ1 ⅐s Ϫ1 , associated with K m ϭ 0.9 Ϯ 0.2 mM and k cat ϭ 38 Ϯ 3 s Ϫ1 , was measured for this enzyme with Met-Gly-Met-Met, the most efficiently processed substrate studied. This catalytic efficiency is similar to that reported for EcMAP1 (see above).
We then studied a set of substrates with various residues in the P1Ј and P2Ј positions (the complete data are reported in Supplemental Table S2 and displayed in Fig. 7). For P1Ј, the data obtained were similar to those obtained with EcMAP1 (Fig. 7A) and those available from Takara with peptide series MXAAA (where X is any natural amino acid except Cys). Thus, although MAP1 and MAP2 differ in terms of their amino acid sequences, the S1Ј binding pockets of these two types of enzyme have similar sieve capacities. Cleavage was optimal for Ala and Pro and minimal for Thr and Val. The enzyme showed a strong preference for large hydrophobic side chains, such as those of Met and Trp, in the P2Ј position with strong hindrance observed for residues such as Glu and Pro and, to a lesser extent, Thr, Ile, and Asp (Fig. 7B). Some differences were observed between MAP1 and MAP2. For example, Ser and, to a lesser extent, Lys in P2Ј were less optimal for MAP2 cleavage than for MAP1 cleavage, but the presence of these residues is unlikely to have a major effect on cleavage efficiency.
Thus, the substrate specificities of PfMAP2 and EcMAP1 seem to be similar, including for instance difficult cleavage if Thr or Val is at P1Ј and if Glu or Pro is at P2Ј. This similarity was unexpected given the differences in the sequences of the two enzymes. To give access to these data for the whole community, we created a prediction tool, TermiNator2. This predictor was created as a web tool by combining php/HTML/ MySQL/javascript languages to predict NME in prokaryotes (i.e. both eubacteria and Archaea) and other associated Nterminal modifications in eukaryotes. It is available on line at www.isv.cnrs-gif.fr/terminator2/index.html. This tool allows also complete proteomes to be analyzed (see below).
The The data were obtained with the Met-Gly-Met-X peptide where X is any natural amino acid. The relative catalytic efficiency (k cat /K m ) is plotted as a function of X. A value of 100 was assigned to X ϭ Gly. B, influence of the P4Ј position in vitro. The data were obtained with the Met-Gly-Gly-Gly-X peptide where X is any natural amino acid. The relative catalytic efficiency (k cat /K m ) is plotted as a function of X. A value of 100 was assigned to X ϭ Ser. Arrows indicate the amino acids leading to significant decrease of cleavage efficiency in the series. this a unique system for further functional analysis of the impact of the N-terminal residue. In a previous analysis of the E. coli NME machinery, we showed that MAP was the primary determinant of the nature of the N terminus of a protein and that the most sensitive position for EcMAP1 action was position P1Ј (Fig. 3). We investigated the frequency and nature of particular N-terminal residues in E. coli. We extracted all proteins with the same penultimate (P1Ј) residue from the complete proteome of E. coli (4071 entries) and calculated the percentage of proteins with each of the amino acids in the penultimate position. We compared this data set with that calculated only for proteins with N-terminal sequences that had been determined and shown to follow the rules of NME (738 entries; Fig. 8A). All residues predicted to undergo NME, except Val, the least efficiently processed of these substrates, were underrepresented by a factor of 2-3 in the complete proteome with respect to the restricted proteome. These data suggest that the accumulation of proteins sensitive to NME is favored in bacteria. Using the MAP cleavage prediction tool, TermiNator2, and the complete proteomes of E. coli (representative of Gram-negative bacteria), B. subtilis (representative of Gram-positive bacteria), and P. furiosus (representative of Archaea), we predicted the N-terminal amino acid distribution. In each of these cell types, the N-terminal amino acid distribution was very similar, featuring Met as the major Nterminal residue (Fig. 8B). Residues appearing as a result of NME corresponded to less than 30% of the proteome. These data indicate that (i) E. coli is a good model for NME studies in prokaryotes and that (ii) NME involves only a minority of the proteins in a given proteome in contrast to the general belief.
In addition, we investigated whether these findings were biased by the lack of consideration of leader peptide removal (LPR). LPR may play an important role, in addition to MAP cleavage, in influencing the exposure of specific and functionally important N-terminal residues. Another possible bias is the dependence of NME on the residue in the P1Ј position and also on the residues in the P2Ј and P3Ј positions (Fig. 3B). Therefore, a second series of predictive analyses were performed based on the 4071 entries of the compete proteome of E. coli, taking into account both (i) the extent of LPR and (ii) Met cleavage efficiency (data shown in Fig. 3B). Statistical analysis of the restricted proteome (862 entries) was used (i) to predict the extent of LPR (14.4%) in a robust manner and (ii) to determine the nature and probability of the appearance of a new N-terminal amino acid as a result of LPR (i.e. Ala Ͼ Ͼ Asp, Glu, Lys Ͼ Val Ͼ Gln, His, Tyr; values indicated in the legend to Fig. 8). Our data are consistent with published results (47), which reveal a clear bias in cleavage site selection and in the nature of the amino acid unmasked at the N terminus. We used these data and the associated probabilities to calculate the mean percentage of mature proteins from the complete proteome with a given amino acid at their N terminus (Fig. 8C, black bars). As many as 60% of the proteins considered were predicted to retain their N-terminal Met residue. These data were compared with experimental data obtained by Edman N-terminal sequence determination of crude E. coli extracts (white bars; average of data from Refs. 8 -10 and 23). The experimental data set differs from the calculated data in that only 35 Ϯ 5% of the proteins were found to retain their N-terminal Met residue. This value is half that obtained theoretically. In contrast, both Ala and Ser appeared to be overrepresented as N-terminal residues in the experimental data sets. These data indicate that taking into account LPR is not sufficient for significant modification of the data shown in Fig. 8, A and B, and that there is indeed (i) a strong bias FIG. 7. Impact of P1 and P2 positions on MAP2 cleavage efficiency in vitro. A, influence of P1Ј in vivo and correlation with side-chain radius gyration (19). Two series were tested, Met-X-Gly and Met-X-Ser, where X is any amino acid. "Oth" corresponds to all natural amino acids not indicated in the figure. B, influence of the P2Ј position in vitro (black bars) and comparison with EcMAP (white bars). The data were obtained from the Met-Gly-X series where X is any natural amino acid. The relative catalytic efficiency (k cat /K m ) is plotted as a function of X. A value of 100 was assigned to X ϭ Met in both series. Arrows indicate the amino acids leading to significant decrease of cleavage efficiency in the series. toward the in vivo accumulation of proteins losing their Nterminal Met and/or (ii) a negative bias against proteins resistant to NME. DISCUSSION We report here the first extensive, coherent, and comprehensive analysis of the substrate specificity of MAP in vitro.
Our data indicate that the P1Ј, P2Ј, and P3Ј positions of the protein substrate (i.e. the three residues following the first Met (P1 position)) have a major effect on cleavage efficiency. Our data were obtained with MAPs of both types: type-1, exemplified by E. coli MAP, and type-2, exemplified by P. furiosus MAP. The E. coli system, with a large number of available protein sequences, has provided unique information, making it possible to demonstrate a direct correlation between MAP activity in vitro and NME in vivo. Together these data indicate that the assessment of MAP catalytic efficiency in vitro is the best way to assess MAP cleavage efficiency in vivo. Based on our data, we also developed a new bioinformatics tool for improving Met cleavage prediction for prokaryotic, eubacterial and archaeal, MAPs that is available from www.isv.cnrsgif.fr/terminator2/index.html. The prediction of Met cleavage for organellar and nucleus-encoded eukaryotic proteins is also possible via the same site using data for the prokaryotic enzyme (this work and Ref. 48).

NME Prediction of Natural Bacterial Substrates and Recombinant Proteins
Overproduced in E. coli-The nature of position P1Ј in a polypeptide chain affects cleavage efficiency, depending on the nature of the amino acid, from highly efficient cleavage, as observed with Ala, to fastidious cleavage, as seen with Thr and Val. MAP k cat /K m (catalytic efficiency) and in vivo cleavage efficiency were found to be directly correlated (Fig. 3C). This explains why most proteins resistant to NME according to the usual rules of thumb have Val or Thr in the penultimate residue. Positions P2Ј and, to a lesser extent, P3Ј and P4Ј also affected cleavage probability with the same series of amino acids, Asp, Glu, Pro, and Thr, having a negative effect. A Glu or a Pro at P2Ј appeared to have the most negative effect. For the prediction of MAP cleavage in vivo directly from analysis of the N terminus of a protein, it is important to take into account whether the protein is (i) a natural protein substrate or (ii) an overproduced recombinant protein. For natural substrates, we suggest that proteins beginning with a ϽM(^ACGPSTV) or ϽM(V/T)(E/P) (note that "(^XY)" means that residues X and Y or a subset list is ex-FIG. 8. Existence of a strong positive bias between proteins with an N terminus identical to that revealed by NME and proteins accumulating in vivo. A, comparison between the penultimate (P1Ј) residues predicted in the 4071 ORFs annotated in the E. coli genome (black bars) and those of the 862-protein set for which the N-terminal residue has been determined by sequencing (white bars). The proteins undergoing LPR (14.4%) in this second set were excluded from the analysis. B, amino acids unmasked by NME in various prokaryotic proteomes. The TermiNator2 prediction tool was used to analyze the three complete proteomes from E. coli, B. subtilis, and P. furiosus. LPR was not considered and assumed to be 0 (see C). C, comparison of a model of N-terminal protein maturation in vivo, predicting the nature of the N-terminal residue (calculation involved the 4071 ORFs; black bars), with the average of N termini (white bars) determined from two-dimensional gel electrophoresis (data from Ref. 23) and crude extracts (data from Refs. 8 -10). The bioinformatic model took into account the efficiency of NME in vivo as a function of the nature of the residue in the P1Ј position, as shown in Fig. 3B, and the LPR of 14.4% of the proteins. In this subset (data available from Ref. 23), we assumed that 57% of the proteins undergoing LPR eventually start with Ala, 12.2% with Asp, 10.2% with Glu, 8.2% with Lys, 4.1% with Val, and 2% with each of Gln, His, Phe, and Tyr, respectively. "Oth" corresponds to the sum of all the other amino acids (0.1%). This assessment was highly consistent with the data set shown in Fig. 1 (for Gram-negative bacteria) of Ref. 47. Data were calculated from the same set but with a probability of LPR of 29% (gray bars, tendency up to 42% is schematized with a black arrow on the bar) rather than 14.4% (black bars). cluded, and "(Y/Z)" means that residue Y or Z or a subset list is included; X, Y, or Z correspond to any normal amino acid) are unlikely to be cleaved. There is a strong risk of lack of cleavage for proteins beginning with ϽM(G/P/V/T)(D/T)(D/E/ P/T), ϽM(G/P/V/T)(D/T)(^DEPT)(C/E/H/P), and ϽM(G/P/V/T)-(^DEPT)(D/E/P/T)(C/E/H/P). In this last situation, incomplete maturation in vivo is most likely to occur, resulting in two types of the protein: one, major with an N-terminal Met, and the other, minor without an N-terminal Met. There are already a few well documented examples of such behavior (23). These simple rules account for 60% of the proteins displaying as yet unexplained Met retention in the proteome of E. coli.
E. coli is widely used for the overproduction of recombinant proteins. In the last 20 years, many instances of improperly processed N termini have been reported. This recurrent issue, known as the "methionine problem" (49), has not yet been resolved. MAP cleavage is thought to become limiting because the amount of overproduced protein exceeds the capacity for MAP activity. Our findings are entirely consistent with MAP activity being the limiting factor for NME in vivo (Fig.  3C). They also suggest that the conclusions drawn from studies based on overproduced reporter proteins (17,18) should be interpreted with caution. Unexpected complete or partial Met retention is known to lead to many defects, including an increase in antigenicity or a decrease in biological activity. This is true for the N-terminal nucleophile hydrolase (Ntn) family of enzymes, which use both the free N terminus and the corresponding Thr, Ser, and Cys side chains in the catalytic mechanism (50) as reviewed in Ref. 51. Before envisaging the production of a protein in E. coli, particularly for heterologous proteins, the NME behavior of the protein must be correctly predicted to ensure that the expected protein, with the expected N terminus, is produced. The data reported herein is particularly useful in this respect as illustrated by the following examples concerning the production of human proteins of therapeutic interest by recombinant DNA methods in E. coli.
"Unexpected" blockage of Met cleavage has been reported for granulocyte colony-stimulating factor (Met-Thr-Pro-Leu; see Ref. 52) and ␣or ␤-hemoglobins (Met-Val-His-Leu; see Ref. 53). In all these cases, the overproduced protein contains a fastidious P1Ј (Val or Thr) residue as revealed in our study (Fig. 3), and one of these proteins also contains the most negative P2Ј residue, Pro. Other well documented examples of Met retention include the cytokine RANTES (regulated on activation normal T cell expressed and secreted) (Met-Ser-Pro-Tyr) and the interleukins 1␤ and 2 (Met-Ala-Pro-Thr/Val; see Refs. 54 and 55). For the interleukins, Met cleavage is crucial as the N-terminal Ala must interact with the C-terminal Thr-133 (56). Our study provides a rationale for the retention of Met as both Pro and Thr at positions P2Ј and P3Ј tend to inhibit Met cleavage. Another example is provided by interleukin-6, which begins with Met-Pro-Val-Pro (57). We show here that a Pro at position P3Ј significantly decreases the efficiency of the process. We conclude from the aforemen-tioned examples that, when a given protein is overproduced, substrates displaying fastidious cleavage by MAP may not be completely processed by this enzyme. The prediction of MAP cleavage must therefore be more stringent than for intrinsic proteins. We suggest that processing is highly unlikely to occur for overproduced proteins beginning with ϽM(A/G/P/ S/T)(D/E/P/T)(D/E/P/T), ϽM(A/G/P/S/T)(^DEPT)X(C/E/H/P), ϽM(A/G/P/S/T)X(^DEPT)(C/E/H/P), or ϽMV(D/T) and that there is a risk of non-processing for proteins beginning with ϽM(A/G/P/S/T)(D/E/P/T) or ϽMV(^DEPT), leading to partial Met retention. This rule should be useful for predicting the likelihood of cleavage for recombinant proteins. To this aim, we therefore added an option relative to the overexpression of any favorite protein in the TermiNator2 prediction tool. If the protein has an N terminus associated with a risk of nonprocessing, various methods may be used to prevent Met retention: (i) adjusting expression conditions to optimize MAP activity according to the rate of protein synthesis in vivo (58), (ii) in vitro processing with purified wild-type or engineered MAP (59 -62), (iii) dual co-overproduction of MAP and the protein of interest in vivo (25,63,64), (iv) processing in vitro or in vivo of the N terminus with another amino-or endoprotease (65)(66)(67), or (v) fusion of the ORF to a propeptide or ubiquitin sequence and use of endogenous signal peptidase specificity (68,69). This last strategy has also proved successful for residues that cannot be unmasked by NME (this study, Fig. 3), such as the thrombin inhibitor hirudin, which must have Ile at its N terminus for biological activity, or human growth hormone, which must have Phe at its N terminus (70 -73).
Comparison of NME Rules between Eubacteria and Archaea-In Archaea, most mature proteins have Ala, Met, Gly, Ser, Thr, or Pro at their N terminus (10). The N-terminal Met of proteins is believed to be frequently cleaved by MAP2 in Archaea, although few data are available to confirm this. This assumption is consistent with the in vitro data presented here (Fig. 7A). Specifically we have demonstrated that the rules governing cleavage are similar for MAP2 and MAP1, involving not only the P1Ј site but also the true P2Ј and P3Ј sites (Fig. 7B).
Archaea display 8 -9% signal peptide cleavage, a frequency significantly lower than that in eubacteria Ref. 74). NME is therefore of greater importance for exposing the N termini of most proteins. The consequences of the observed differences in MAP substrate specificity for Met cleavage in vivo should be taken into account. In Archaea, the rules governing NME do not differ significantly from those of eubacteria with similar trends observed at P1Ј and P2Ј. We conclude that the use of identical cleavage rules should make it possible to predict NME accurately in Archaea.
Physiological Relevance of Improved NME Prediction for the E. coli Proteome-By improving NME prediction for natural proteins of the bacterial proteome, we were able to model the effect of this process on the whole proteome (Fig. 8). We observed an overrepresentation of proteins undergoing NME in bacteria: those that resisted the process and had se-quences starting with a Met were underrepresented (Fig. 8C).
We tried to account for this discrepancy by modeling the impact of inadequate LPR assessment. The frequency of LPR used (14.4%) was directly deduced from data obtained from the 862-protein database. Interestingly this value is identical to that obtained from bioinformatics analysis using a maximal stringency cutoff applied to the complete proteome of Haemophilus influenzae (14% in Ref. 47). H. influenzae is a Gramnegative bacterium closely related to E. coli but with a significantly lower (by a factor of 2.5) number of associated ORFs (75). Doolittle (75) estimated that the signal peptide cleavage frequency might reach 15-20%, which is slightly higher than the frequency used here. A maximal value of 28% was proposed with lower stringency scoring. This value is consistent with other predictions of 25% in H. influenzae and 29% in E. coli (74). A higher frequency of signal peptide cleavage might therefore account for the bias observed in the distribution of N-terminal amino acids. It should be borne in mind that the proteins analyzed correspond to a class of proteins with a compartmentation bias as these proteins are (i) soluble (as opposed to membrane proteins) and (ii) not secreted outside the bacterium as the only secreted proteins that can be retrieved by the fractionation strategies used are those exported to the periplasmic space. LPR is thought to be frequent in membrane and secreted proteins. We therefore modeled the impact of increasing LPR frequency. Only a huge increase in the proportion of these proteins to 42% (3 times experimental values) gave a satisfactory fit of N-terminal Met frequencies to the experimental values (Fig. 8C, end of arrows in gray bars). However, the correlation with Ala, Ser, and Thr was very weak with this value. Finally this value of 42% is much higher than the upper value of 29% proposed previously (74). A value of 29% gave a much better fit to the data (Fig. 8C, gray bars). We conclude that the protein pool undergoing signal peptide cleavage is probably underestimated in our study but that this underestimation alone is unlikely to account for the trend observed toward a lower content of proteins retaining their N-terminal Met.
Most of the data for the 862-sequence compilation concerned proteins extracted by two-dimensional gel electrophoresis (23). As a result of this, these proteins were probably the most abundant proteins in the bacterium. Another possible reason for the poor correlation between the data sets in Fig. 8C is likely due to the assumption that the genes for these abundant proteins may have been selected to encode residues such as Ala or Ser at P1Ј so that their N-terminal Met is efficiently cleaved and recycled. Met is indeed an "expensive" amino acid to biosynthesize. The capacity to recover a majority of the N-terminal Met (i.e. especially those corresponding to the most abundant proteins) could have been favored throughout bacterial evolution (see also concluding remarks in Ref. 21 for further discussion). One could also suggest that all or some of the proteins retaining their N-terminal Met are likely to be less stable. Recent data obtained in the search for proteins cleaved by the ClpAXP proteolytic machinery of E. coli have indeed identified an N-terminal motif including an N-terminal Met. The recurring N-terminal motif ϽMK⌽⌽X⌽ (where ⌽ is any hydrophobic amino acid and X is any amino acid) is involved in cleavage by the bacterial ClpAXP protease (76). This hypothesis seems likely given that Lys is the most frequent amino acid in position 2 (Fig. 8A) and prevents NME. However, a less stringent motif would be required to obtain a better fit to the experimental values. It should be noted that a single proteolytic cleavage of a mature protein by cellular machinery other than a leader peptidase and ClpAXP may lead to enhanced protein degradation. This hypothesis, in which protein half-life and signal peptide prediction accounts for the amino acid bias observed in experimentally determined protein N termini with respect to proteomic predictions, may also account for the accumulation of fewer proteins with an N-Met than predicted on the basis of the genome sequence (Fig. 8B).
Generalization, Introducing a Bioinformatics Tool to Predict NME-We show here that detailed characterization of the catalytic efficiency of MAP can be used to predict the in vivo Met cleavage of any protein of a given proteome. NME prediction within the proteome is indeed crucial as a database of protein N termini as representative (i.e. about 20%) as that available for E. coli will probably never be available for any other organism. However, published data for the two cytosolic Saccharomyces cerevisiae MAPs (ScMAPs) suggest that the substrate specificity of these enzymes may differ from that of prokaryotic MAPs. For example, studies of ScMAP1 have shown fastidious processing of peptides with Pro at P1Ј or with Val or Thr in this position (Fig. 4A in Ref. 40). Studies with ScMAP2 and low concentrations of peptide (i.e. mimicking k cat /K m conditions) have suggested fastidious processing for peptides with Pro at P1Ј but not with Val in this position ( Table  V in Ref. 77). Thus, in higher eukaryotes, prediction is complicated by the occurrence of at least one MAP1 and one MAP2, which do not necessarily have identical specificities. If the two enzymes have similar specific activities in vivo, cleavage efficiency is likely to be increased. However, this does not appear to be the case in yeast in which MAP1 behaves as the most active, major MAP (78). Data on relative MAP expression and regulation are therefore required to complement prediction algorithms.
The associated crucial physiological consequences, such as protein stabilization associated with Met removal, highlight the importance of correctly predicting the N termini of proteins as recently reported for higher eukaryotes (48). Further detailed kinetic characterization of the actual substrate specificity of each eukaryotic MAP is required to increase the confidence of predictions made with this tool. France) Grant 3603, and National Science Foundation Grant CHE-0549221. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www. mcponline.org) contains supplemental material.