If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Recipient of additional support from the Canada Research Chair program, Alberta Ingenuity-Health Solutions, and the Canada Foundation for Innovation. To whom correspondence should be addressed: Dept. of Biochemistry and Molecular Biology, University of Calgary, Rm. 300 Heritage Medical Research Bldg., 3330 Hospital Dr. NW, Calgary, Alberta T2N4N1, Canada. E-mail:.
* This work was supported in part by Natural Sciences and Engineering Research Council of Canada Discovery Grant 298351-2010 (to D.C.S.). This article contains supplemental material. §§ Recipient of European Union/MEYS Grants CZ.1.05/1.1.00/02.0109 and LQ1604.
Trypsin dominates bottom-up proteomics, but there are reasons to consider alternative enzymes. Improving sequence coverage, exposing proteomic “dark matter,” and clustering post-translational modifications in different ways and with higher-order drive the pursuit of reagents complementary to trypsin. Additionally, enzymes that are easy to use and generate larger peptides that capitalize upon newer fragmentation technologies should have a place in proteomics. We expressed and characterized recombinant neprosin, a novel prolyl endoprotease of the DUF239 family, which preferentially cleaves C-terminal to proline residues under highly acidic conditions. Cleavage also occurs C-terminal to alanine with some frequency, but with an intriguingly high “skipping rate.” Digestion proceeds to a stable end point, resulting in an average peptide mass of 2521 units and a higher dependence upon electron-transfer dissociation for peptide-spectrum matches. In contrast to most proline-cleaving enzymes, neprosin effectively degrades proteins of any size. For 1251 HeLa cell proteins identified in common using trypsin, Lys-C, and neprosin, almost 50% of the neprosin sequence contribution is unique. The high average peptide mass coupled with cleavage at residues not usually modified provide new opportunities for profiling clusters of post-translational modifications. We show that neprosin is a useful reagent for reading epigenetic marks on histones. It generates peptide 1–38 of histone H3 and peptide 1–32 of histone H4 in a single digest, permitting the analysis of co-occurring post-translational modifications in these important N-terminal tails.
The most widely used “bottom-up” proteomics approach requires the proteolytic digestion of proteins into peptides. These peptides are detected by MS and identified through database searching (
). However, in practice, due to the natural distribution of these amino acids in proteomes, a large segment of the proteome is not identifiable. Most peptides are too small for unambiguous identification (56% ≤6 residues) (
). They all share the property of cleaving either the C or N terminus to charged residues. An enzyme with specificity for neutral amino acids would be a welcome addition. For example, an advantage in membrane protein representation was demonstrated recently using WaLP and MaLP (
). These are lower-specificity enzymes that cleave after Thr, Val, Ala, Ser, and Met (WaLP) and Met, Leu, Phe, Tyr, Thr, and Val (MaLP). It would be useful to have enzymes in this general class with even higher specificity to generate longer sequence “read lengths” and possibly improved representation of heterogeneity in PTMs. Enzymes recognizing dibasic motifs such as Sap9 (
) show promise in this regard, but candidates that avoid tryptic residues may offer superior complementarity.
Of course, proteases like elastase, chymotrypsin, and pepsin cleave after hydrophobic residues, but their low specificity complicates database-driven peptide identification and limits their utility in proteomics (
). It often delineates structural transitions in proteins and is generally unmodified in cellular proteins; thus digestion products could provide a better grouping of PTMs. Unfortunately, there are very few validated prolyl endoproteases (PEPs) currently available. Numerous prolyl oligopeptidases (POPs) are known to cleave after Pro and alanine (Ala), but their activity is restricted to small substrates (
An effective PEP would be particularly useful for characterizing histone modifications. Histones are Lys- and Arg-rich proteins that form the core components of nucleosomes, around which the DNA is wound in the chromatin of eukaryotes (
). Histones are tightly packed into octamers, but they maintain highly modified N-terminal tails that are flexible and integrate a variety of post-translational modifications. Many of these modifications are dynamically installed and removed. They play a key role in epigenetically regulating cell fate (
). GluC and AspN enzymes show more promise, especially for presenting the combinatorial modifications of histones H3 and H4, but they need to be used separately to cluster the most important PTM sites (
). In this study, we produced neprosin recombinantly and characterized the enzyme for use in proteomics. We show that neprosin is a legitimate low-molecular-weight PEP, active at low concentrations and low pH. We demonstrate strong complementarity with conventional enzymes for whole-proteome analysis and histone mapping.
Preparation of the Neprosin Expression Vector
The gene corresponding to pro-neprosin (residues 25–380) from Nepenthes ventrata was synthesized by Genscript with codon optimization for expression in Escherichia coli and yeast (
). The optimized neprosin gene was then inserted into pET28a(+) at the EcoRI and SalI restriction sites in-frame and downstream of an N-terminal His tag, yielding the plasmid pDS36. The MBP gene was amplified from e3884 (generous gift from Dr. A. Schryvers, University of Calgary) with forward primer LSO13F (5′GGC TCG CATATG AAA ACT GAA GAA GGT AAA CTG GTA ATC TG3′, NdeI site underlined) and reverse primer LSO14R (5′CAC TCA GCTAGCCCT TCC CTC GAT GTT GTT GTT ATT GTT ATT GTT GTT GTT GTT CG3′, the NheI site is underlined, and the factor Xa cleavage site is italicized) and inserted into pDS36 at the NdeI and NheI restriction sites in-frame and between the His tag and neprosin gene, to form the His-MBP-neprosin expression plasmid termed pDS42. The constructs were verified by DNA sequencing.
Protein Expression and Purification of Recombinant Neprosin
The His-MBP-pro-neprosin expression plasmid (pDS42) was transformed into E. coli Artic Express (generous gift from Dr. Peter Facchini, University of Calgary) competent cells. The pDS42-transformed cells (DSB90) were grown in 2YT with kanamycin 50 μg ml−1 at 37 °C until an OD of ∼0.8 was reached, and then expression was induced with 0.3 mm isopropyl β-d-thiogalactopyranoside at 16 °C overnight with shaking. Cells were harvested by centrifugation and frozen at −80 °C. The cells were then thawed and resuspended in lysis buffer (50 mm Tris-HCl, pH 8.0, 0.5 m NaCl, 10% glycerol, 1 mm DTT, 1 mm PMSF, 0.5% Triton X-100, 0.025% sodium azide, 1 μg ml−1 lysozyme, RNase A) supplemented with a mixture of protease inhibitors (Roche Applied Science) at a ratio of 10 ml of buffer per 1 g of wet cells. The cells were lysed by sonication (Qsonica Sonicators) on ice at 30% amplitude using six 30-s pulses with a 30-s rest between pulses. Following centrifugation at 15,000 rpm at 4 °C for 30 min, the supernatant was applied to a 5-ml HisTrap HP (GE Healthcare) pre-equilibrated with 50 mm Tris-HCl, pH 8.0, 0.5 m NaCl (buffer A). After washing with 20 ml of buffer A, His-MBP-pro-neprosin was eluted with 50 ml of a linear gradient of 0–100% buffer B (50 mm Tris-HCl, pH 8.0, 0.5 m NaCl, 0.5 m imidazole) at a flow rate of 1 ml min−1. Fractions containing His-MBP-pro-neprosin were pooled and dialyzed against 50 mm Tris-HCl, pH 7.5, 150 mm NaCl. Multiple rounds of dialysis against 100 mm Gly-HCl, pH 2.5, at 37 °C for 7 days allowed for acid auto-activation to the mature, active neprosin. Neprosin was then concentrated in a 10-kDa molecular mass cutoff filter (Millipore) and analyzed by SDS-PAGE. The concentration of active recombinant neprosin was estimated by comparing its activity to native neprosin purified from Nepenthes digestive fluid, using the GFP activity assay (below).
Neprosin Activity Assay via GFP Proteolysis
The green fluorescent protein (GFP) proteolysis assay was performed as reported (
) with minor modifications. Briefly, 0.1 mg ml−1 of GFP S65T (in 10 mm Tris-HCl, pH 8) was denatured with 0.1× volume of 1 m Gly-HCl, pH 2.4 (final pH of solution was 2.5), in the absence or presence of neprosin (1 nm) at 37 °C (or the indicated temperature) for 1 h. Reaction was quenched, and the GFP was renatured by addition of 0.25× volume of 1 m Tris-HCl, pH 8.4, and 0.1× volume of 1 m DTT (final pH of solution was 8). Fluorescence of renatured GFP S65T was determined using a Molecular Devices Filter Max F5 microplate spectrophotometer with excitation at 485 nm, and emission was monitored at 535 nm. For the analysis of the effect of protease inhibitors on proteolytic activity, neprosin was pre-incubated with the protease inhibitor in the reaction volume at room temperature for 30 min prior to the addition of GFP S65T and acid denaturation. For the analysis of the effect of pH on neprosin activity, GFP S65T and neprosin were incubated in 100 mm solution of various buffers (pH 2–9: Gly-HCl, ammonium formate, sodium phosphate, Tris-HCl). Neprosin activity was expressed as percentage of fluorescence lost (100% − % fluorescence recovery) relative to the untreated GFP S65T under the same reaction conditions.
Neprosin Activity Assay of Standard Proteins
For gel-based analysis using recombinant neprosin, 1 mg ml−1 of BSA was reduced with 10 mm DTT at 50 °C for 30 min. The reduced protein substrate (0.05 mg ml−1) was incubated with recombinant Npr1 (∼1 nm) and different concentrations of urea in 100 mm Gly-HCl, pH 2.5, at 37 °C for 1 h and analyzed by SDS-PAGE.
HeLa Cell Lysate Preparation
HeLa S3 cells were grown at the National Cell Culture Centre in Joklik-modified minimum Eagle's medium supplemented with 5% newborn calf serum to high cell density (1 × 106 cells/ml). Cells were collected by centrifugation at 2500 × g, followed by two washes in warm 37 °C PBS (Ca/Mg free) and stored at −80 °C. After thawing, the cells were resuspended in 50 mm HEPES buffer (pH 8, 150 mm KCl, 1 mm MgCl2), 10% glycerol, 0.5% Nonidet P-40, 5 units/ml benzonase supplemented with a mixture of protease inhibitors (Roche Applied Science). The cells were lysed by sonication on ice at 30% amplitude for six rounds of 30-s pulses with a 30-s rest between pulses. Following centrifugation at 37,000 rpm at 4 °C for 30 min, the supernatant lysate was aliquoted, frozen in liquid nitrogen, and stored at −80 °C. Protein concentration was determined by Bradford assay, using BSA as standards.
HeLa Cell Lysate and Histone Digestion
HeLa whole-cell lysate was digested with recombinant and endogenous neprosin, AN-PEP, LysC, and trypsin using the FASP protocol (
). Briefly, 100 μg of precipitated lysate was loaded onto a 10-kDa filter device and subsequently denatured, reduced, and alkylated at pH 8.5. Depending on the protease, buffer exchange was conducted using a volume of 120 μl of 100 mm Gly-HCl, pH 2.5, for both neprosin preparations and AN-PEP or 50 mm ammonium bicarbonate, pH 8.5, for LysC and trypsin. Addition of buffer was followed by a centrifugation step for 30 min at 14,000 × g. This was repeated three times for a complete buffer exchange. The enzyme was then added to an estimated enzyme-to-substrate ratio of 1:50 (w/w) in case of trypsin and LysC, 1:100 for AN-PEP, and 1:500 for both neprosin preparations. Samples were subsequently incubated overnight at 37 °C. Released peptides were eluted from the filter device in three steps. First, the filter unit was centrifuged for 30 min at 14,000 rpm. Second, 50 μl of the corresponding buffer solution was added to the filter unit, followed by a second centrifugation step for 30 min at 14,000 rpm. Final elution was performed by adding 50 μl of 0.5 m NaCl to the filter device and subsequent centrifugation for 15 min at 14,000 rpm, and the eluates were combined. Prior to mass spectrometric data acquisition, all samples were desalted and concentrated using Stage tips (
). Peptides were eluted with 50% ACN in 0.1% TFA and evaporated to dryness using a SpeedVac. Finally, peptides were reconstituted in 0.1% FA for mass spectrometric analysis. Cleavage specificity experiments using recombinant and endogenous neprosin, AN-PEP, and trypsin were conducted in biological duplicates. Experiments for the evaluation of proteome coverage with recombinant neprosin, trypsin, and LysC were done in single experiments. To evaluate whether acidic solution induced unspecific cleavages under long incubation times at pH 2.5, tryptic digests were reconstituted in 20 μl of H2O and split in 2 aliquots of 10 μl. One aliquot was kept at −20 °C until analysis. The other aliquot was adjusted to pH 2.5 with 100 mm Gly-HCl buffer and incubated at 37 °C overnight.
For whole histone analysis, unfractionated whole histone from calf thymus (Sigma-Aldrich) was dissolved in Gly-HCl, pH 2.5, to a final concentration of 1 μg μl−1. Recombinant neprosin was added to an estimated enzyme-to-substrate ratio of 1:500. Samples were subsequently incubated overnight at 37 °C.
For the determination of cleavage specificity, HeLa digests were analyzed using an EASY-nLC 1000 nano-LC coupled to an Orbitrap Velos mass spectrometer (Thermo Fisher Scientific, San Jose CA) equipped with a Nanospray Flex Ion Source. Peptides were chromatographically separated using a 15-cm PicoTip fused silica emitter with an inner diameter of 75 μm (New Objective Inc., Woburn MA) packed in-house with reversed-phase Reprosil-Pur C18-AQ 3-μm resin (New Objective). The flow rate was 300 nl min−1, and peptides were eluted using a 140-min gradient running linearly from 5 to 40% B (97% ACN in 0.1% FA). Data were acquired using data-dependent MS/MS mode. Each high-resolution product ion scan in the Orbitrap (m/z 300 to 2000, R = 60,000) was followed by high-resolution product ion scans (isolation window 3 Th) in the Orbitrap after HCD fragmentation at 35% NCE. Resolution was set to 7500. Top 10 most abundant signals with a charge state greater than 1 were selected for fragmentation, followed by dynamic exclusion for 60 s. Data acquisition was controlled with Xcalibur software (version 3.0.63).
For the comparative proteomic analysis of HeLa cells digested with LysC, trypsin, or recombinant neprosin, samples were measured using an EASY-nLC 1000 nano-LC coupled to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific) equipped with an EASY-spray source. Peptides were separated using a 25-cm Easy-Spray PepMap analytical column (Thermo Fisher Scientific) packed with reversed-phase C18 beads (2-μm particle diameter, 100-Å pores). The injection volume was adjusted to achieve similar TIC levels for all samples. The flow rate was set to 300 nl min−1. Peptides were eluted using a 110-min gradient running linearly from 5 to 25% B (100% ACN in 0.1% FA) at a spray voltage of 2.4 kV. Each high-resolution precursor ion scan in the Orbitrap (m/z 350 to 1500, R = 120,000) was followed by product ion scans (isolation window 3 Th) in the linear ion trap with a fixed scan cycle time of 3 s. Dissociation in MS2 mode was carried out via CID and ETD using a data-dependent decision tree (
). The fragmentation mode was chosen depending on the nature of the selected ions. Doubly charged ions, triply charged ions with m/z >650, quadruply charged ions with m/z >900, and quintuply charged ions with m/z >950 were triggered with CID. Triply charged ions with m/z ≤650, quadruply charged ions with m/z ≤900, quintuply charged ions with m/z ≤950, and ions with a charge state of six and higher were fragmented using ETD. CID was performed at 30% normalized collision energy (NCE), and ETD was carried out with a maximum injection time of 100 ms. A target value of 20,000 was selected for MS2 automatic gain control, and precursor ions were dynamically excluded for 30 s.
Whole histone digests were measured on an Orbitrap Fusion Lumos Tribrid mass spectrometer coupled to an EASY-nLC 1200 nano-LC system. Peptides were separated using a 15-cm Easy-Spray PepMap analytical column (Thermo Fisher Scientific) packed with reversed-phase C18 beads (2-μm particle diameter, 100-Å pores) using first a 30-min gradient running linearly from 5 to 15% B (80% ACN in 0.015% FA), followed by increasing B to 60% within the next 10 min. Each high-resolution precursor ion scan in the Orbitrap (m/z 350 to 1200, R = 120,000) was followed by product ion scans (isolation window 2 Th) in the Orbitrap (R = 15,000) with a fixed scan cycle time of 3 s. Charge states of three and less were excluded. Peptides were fragmented using EThcD (maximum injection time of 70 ms and HCD at 25% NCE) at a spray voltage of 1.9 kV. A target value of 50,000 was selected for MS2 automatic gain control, and precursor ions were dynamically excluded for 15 s. Data acquisition was controlled using Xcalibur software (version 4.0).
All unprocessed data files (RAW-format) were directly loaded into PEAKS studio (
) (version 7.5; Bioinformatics Solutions), and precursor masses were subsequently corrected in the software. Peptides were identified by de novo sequencing and database search, with matching to the human Swiss-Prot database (downloaded from uniprot.org in May 2016; containing 20,201 entries). For whole proteome analysis of HeLa digests, cleavage sites were restricted in a number of ways. For trypsin (P1: Lys and Arg, with P1′: any amino acid, according to Rodriguez et al. (
)), we allowed up to two missed cleavages for a conventional search. We also performed a semi-tryptic search (unspecific cleavage at one end) and a fully nonspecific search. For LysC (P1: Lys, with P1′: any amino acid), we also allowed up to two missed cleavages for a conventional search, and both semi-LysC and fully nonspecific searches. For neprosin (P1: Pro and Ala, with P1′: any amino acid except Pro; and P1: Asp with P1′: Pro), we allowed a maximum number of seven missed cleavages, and we also allowed for both semi-neprosin and fully nonspecific cleavage in separate searches. Met oxidation was selected as a variable modification with a maximum number of two variable modifications per peptide. Carbamidomethylation of cysteine residues was set as a fixed modification. To decrease FDR while searching for multiple post-translationally modified peptides, cleavages were restricted to the same rules as stated above for the three proteases, but applying strict cleavage rules for both ends. Acetylation of the protein's N terminus and methylation and dimethylation of Lys and Arg were chosen as variable modifications with a maximum of three variable modifications per peptide.
For a minimally biased cleavage site analysis in the proteomics experiments, the enzyme was set to “none,” and a modified database file was used, in which initiator methionine residues and signal peptides (
) were removed, to exclude false-positive N-terminal cleavage sites. The normalized cleavage specificity, taking the natural abundance of corresponding amino acids and dipeptide motifs in Homo sapiens into account, was calculated after Keil (
), which are associated with the proteins commonly identified in the LysC, trypsin, and neprosin digests, were exported with a 1% FDR filter and requiring a minimum of two PSMs per peptide.
For the analysis of whole histones from calf thymus, a manually curated database file was used containing all known 21 histone variants from Bos taurus. Cleavage sites were restricted to the same neprosin rules as stated above allowing one unspecific cleavage. The following variable modifications were chosen allowing a maximum of four variable PTMs per peptide: methylation of Lys and Arg, dimethylation of Lys, trimethylation of Lys, and acetylation of Lys and the protein's N terminus. PTMs were chosen based on known frequent PTMs in histones (
). Identified PSMs were manually inspected and only those kept for which the PTM site was unambiguously confirmed based on the identification of adjacent fragment ions in PEAKS.
Mass error tolerance of precursor ions was set to 10 ppm for all samples. Mass error tolerance of fragment ions was set in general to 0.6 Da for CID and ETD spectra acquired in the linear ion trap or to 0.015 Da for HCD and ETD spectra acquired in the Orbitrap analyzer. The peptide score threshold was decreased until a false discovery rate of ≤1% on the peptide level was reached. A minimum score of 20 together with at least one unique peptide was set as the lower threshold for identification of protein groups. This yielded an FDR between 0.1 and 0.6% at the protein level. Raw files for all MS experiments are available from Chorus (ID 1262), and annotated fragment ion spectra for identified histone tails are available in the supplemental materials.
We recently discovered an entire class of protease that appears to possess a novel domain structure (DUF239) (supplemental Fig. S1) (
). We tentatively assigned PEP functionality to this class, based on our preliminary analysis of a 29-kDa enzyme we call neprosin. The DUF239 sequence domain is found in a large family of proteins throughout the plant kingdom. Isolated from carnivorous plants of the Nepenthes genus, neprosin functions under solution conditions that suggest utility in proteomics. Initial screening of protease activity was conducted by following the degradation of simple protein substrates. These digests were carried out at pH 2.5, a level at which other proteases isolated from the Nepenthes fluid also showed high activity (
). Neprosin was weakly inhibited by the aspartic protease inhibitor pepstatin but was not affected by the POP inhibitor Z-Pro-prolinal inhibitor, and it retained activity under strongly reducing conditions (Fig. 1A). It showed its highest enzymatic activity at pH 2.5, although it retained ∼50% of its activity at pH 4.5, whereas further increases in the pH led to loss of proteolytic activity (Fig. 1B). It also retained activity in the presence of denaturants (i.e. <2 m urea, supplemental Fig. S3). The temperature significantly influenced its activity, and the optimal range was found to be between 37 and 50 °C (Fig. 1C). Based on these results, we conducted all further experiments at pH 2.5 and at a temperature of 37 °C. Although our production yield was modest, the recombinant form retained the high activity we observed with native neprosin, allowing us to use enzyme-to-substrate (E:S) ratios of ∼1:500.
Neprosin Cleavage Rule
To characterize the cleavage specificity of neprosin, we digested whole-cell lysates followed by single-shot LC-MS/MS experiments. E. coli digests were initially used to evaluate data-dependent acquisition methods that offer high rates of MS2 acquisition, which we then applied to HeLa cell lysate in two biological replicates. Cleavage specificity was calculated according to Keil (
) using sets of 3316 and 3275 unique recombinant neprosin cleavage sites identified from the digests (supplemental Table S1). To validate the method, we determined the cleavage specificities from HeLa digestions using trypsin (supplemental Fig. S4). The observed cleavage specificity was virtually identical with values calculated previously by Huesgen et al. (
). In the neprosin data set, we observed that an average of 61% of all cleavage sites were C-terminal to Pro and 23% were C-terminal to Ala (Fig. 2A). Taking the natural occurrences of Pro and Ala in the human proteome into account, the normalized activity of neprosin toward these residues was 9.8 and 3.1, respectively (Fig. 2B). Endogenous neprosin showed the same cleavage specificity as recombinantly produced neprosin (supplemental Fig. S5), as expected. Neprosin demonstrates substantially greater specificity for C-terminal Pro cleavage than AN-PEP, the only other characterized prolyl endoprotease (supplemental Fig. S6, A and B). For neprosin, charged residues such as Glu, His, Lys, and Arg were disfavored in the P1 position, yielding only 2.6% of all cleavages. The most favored dipeptide motifs were Pro-Xaa, followed by Ala-Xaa (Xaa indicates any amino acid except Pro and to lesser extent Trp) (supplemental Fig. S7A). Cleavage of an Asp-Pro motif also seemed prominent, but it was the only motif for which Pro was found in P1′ position (supplemental Fig. S7, B and C). Because cleavage of Asp-Pro may be chemically induced at pH 2.5, we incubated acidified tryptic HeLa digests overnight at 37 °C to gauge the level of acid-induced cleavages. We observed a significant increase of Asp-Pro cleavage sites, and no increase in cleavages C-terminal to Pro or Ala (supplemental Fig. S8), which supports our assertion that the Asp-Pro cleavage does not arise from the direct action of neprosin.
Next, we analyzed HeLa digests generated using trypsin, LysC, and recombinant neprosin to evaluate its utility in multienzyme proteomics experiments. We chose a precursor-dependent decision tree using CID and ETD for fragmentation to balance acquisition rate with the need to fragment larger peptides, see below (
). Defining a preliminary cleavage rule based on our previous observations increased the number of peptide-spectrum matches (PSMs) over a nonspecific digestion search (supplemental Table S2). The search was established using “semi-PEP,” allowing one site to be nonspecific. With this search output, we calculated missed cleavages throughout the whole dataset to evaluate digestion efficiency. We observed a maximum of two missed Pro cleavages for 90% of all identified peptides, but the number of missing Ala cleavages was considerably higher (supplemental Fig. S9A). Intriguingly, longer incubation times and higher enzyme-to-substrate ratios did not significantly reduce the number of missed cleavages (data not shown). Missed cleavages mostly displayed Pro, Gly, and Ala immediately C-terminal to the Ala or Pro residue (supplemental Fig. S9, B and C). Using neprosin and the semi-PEP rule, we identified 1729 protein groups in a 110-min run, compared with 2826 and 3505 for LysC and trypsin, respectively (Fig. 3A and supplemental Table S3). The sequence coverage contributed by neprosin is strongly complementary to that observed for trypsin and LysC. Of the 1251 proteins identified in common, almost half of the neprosin-generated coverage represents “dark matter” of the proteome sequence and potential PTMs not otherwise found using the two conventional reagents combined (Fig. 3B). When considering a rich data repository like ProteomicsDB (
), which contains a cumulative representation of human proteins identified from numerous projects, the single neprosin data set contributes additional sequence coverage of 6116 amino acids (FDR 1%, supplemental Table S4). We observed no obvious size bias in neprosin protein substrates. Identified proteins showed the same size distribution as proteins identified after digestion with trypsin and LysC (Fig. 4A).
MS/MS of Neprosin-generated Peptides
The average mass of a triggered peptide after digestion with neprosin was 2521 units, significantly higher than both trypsin and Lys-C (Fig. 4B). This number represents all MS/MS-triggering events. Only 20% of the MS2 spectra were converted to PSMs, a conversion rate considerably lower than either trypsin (52%) or Lys-C (46%) in our experiments (supplemental Table S2). This was somewhat unexpected as peptide feature sets were equally rich, although lower conversion rates for non-tryptic enzymes have been documented (
). We note that more than half of the PSMs were identified by ETD, whereas LysC and tryptic peptides were mainly identified by CID (supplemental Fig. S10). In separate experiments, we further observed that HCD fragmentation of peptides with a C-terminal Pro residue led to the generation of very abundant y1 ions independent of the position of charged residues within the peptide (supplemental Fig. S11, A and C).
The ability to drive digestion to a stable end point while preserving a larger average peptide mass should be useful for PTM mapping exercises. To assess the potential of neprosin for the analysis of complex PTM patterns, we observed improved coverage for histones, the core components of chromatin known for their highly basic and multiply modified N termini (
). With neprosin, even a single untargeted analysis yielded a strong representation of histone tails from H3 and H4, relative to trypsin and LysC (supplemental Tables S5 and S6). Proline positions in these tails in particular appear well suited for presenting the histone code in a minimally biased manner. To explore this further, we digested and analyzed unfractionated histones from calf thymus via LC-MS/MS to expand our initial analysis, and we used EThcD for fragmentation (
). We observed prominent cleavages after Pro-32 for histone H4 and after Pro-38 in histone H3, which led to the release of the N-terminal tails 1–32 and 1–38, respectively (supplemental Fig. S12).
The release of these peptides highlights the vast number of histone modifications present, and we could identify a variety of co-occurring PTMs in what we could describe as an extended bottom-up approach for H3 (Fig. 5A) and H4 (Fig. 5B). In this crude extract, H3 peptide 1–38 showed many co-occurring modifications at Lys-9, -14, -18, -23, -27, and -36, including methylation, dimethylation, trimethylation, and acetylation. Given their hydrophilic character, these peptides elute early in reversed-phase chromatography, mostly independent of the number of modifications (supplemental Fig. S12). H4 peptide 1–32 revealed an acetylated N terminus, together with acetylated Lys-5, -8, -12, and -16 and a methylated or dimethylated Lys-20. Using EThcD, we observed rich fragment ion spectra for these peptides, which allowed us to unambiguously assign the modified amino acids (e.g.Fig. 5C).
Histone Mapping, Additional Observations
Even though H3 contains two additional Pro residues at positions 16 and 30, no significant cleavages were observed at these residues (Fig. 5A), which highlights that additional specificity is conferred outside of the P1-P1′ position. For H1 and H2B, we did not observe such a well defined, near-singular N-terminal peptide, likely because of the higher content of Pro residues in their N-terminal tails. We identified several peptides covering the N terminus of H2A (e.g. peptide 1–26), here mostly due to minor cleavages after residues beyond Pro. Nevertheless, the digest still readily demonstrated the presence of an acetylated N terminus and H2AK5Ac (supplemental Fig. S13).
Our data show that neprosin cleaves at a combined 85% after Pro and Ala residues, which provides specificity levels approaching trypsin (91%) and its counterpart Lys arginase (92%) (
). This level of specificity permits a limiting of the search space, which reduces the search time compared with other enzymes that cleave at neutral residues. The specificity of neprosin and its ability to digest substrates of any size confirm its characterization as a legitimate PEP. The cleavage of the Asp-Pro motif that we observed is, as we have shown, most likely due to the lability of this peptide bond under acidic conditions (
), rather than a direct result of enzymatic activity. Although we used a “semi-PEP” cleavage rule for our searches, a full PEP cleavage rule was also very effective, and it only reduced the protein group output by 5% (supplemental Table S2).
The number of missed cleavage sites was remarkably high, particularly for Ala, and was not significantly affected by incubation times or E:S ratios. It suggests that we have not yet fully understood the influence of amino acids distal from the P1 and P1′ positions. We have noted that Pro or Ala in the P2 position enhanced catalytic activity, as well as Glu in position P1′ or P2′. There may be other influences that limit activity at the canonical cleavage site, which may only become apparent with much larger data sets and a crystallographic structural model of the enzyme. The “skipping frequency” contributed to an average peptide mass of ∼2500 units, which places neprosin in the category of an enzyme ideal for extended bottom-up proteomics (
). There can be many reasons for lower performance. Given their smaller average size, tryptic peptides are well matched to the fragmentation methods conventionally used in mass spectrometry, and search algorithms have been exquisitely tuned over many years to reflect fragmentation bias. Our data show an increased reliance upon ETD in the neprosin digests, which at first glance might suggest lower overall spectral quality. However, the intensity and richness of the unidentified MS2 subset was high, which may be due to a higher frequency of internal fragments. Additionally, as the peptide size increases, there is an increased probability of peptides carrying multiple modifications, and these are often missed by search algorithms (
), which may offer a very useful reporter ion and help support alternative search algorithms. Improving the MS2 conversion rate should become a bioinformatics priority, especially for enzymes that exhibit high selectivity and appear to complement trypsin nicely.
The complementarity of neprosin with other enzymes is particularly apparent in the analysis of histone tails. Many disease states in cancer, cognitive dysfunction, and reproductive, respiratory, and cardiovascular illnesses are strongly linked to histone tail-regulated epigenetic mechanisms in the cell, and there is a growing need to understand how these mechanisms drive cellular behavior (
). Abundant histone tail modifications include the methylation and acetylation of Lys residues, as well as the methylation of Arg, even though the total number of all known histone PTMs is distinctly higher (
). GluC is generally used to produce peptide 1–50 from histone H3, and AspN is used to produce peptide 1–23 from histone H4. Neprosin provides a single enzyme solution to near-complete profiling of both tails, with additional advantages. Lys-37 is the highest extensively modified amino acid of interest in H3 (
), which means that peptide 1–50 is longer than it needs to be for its analysis. Neprosin reliably generates peptide 1–38, in which all relevant modified amino acids are still present. Its lower mass facilitates efficient fragmentation in ETD mode. Peptide 1–32 from histone H4 is released in the same reliable fashion. The PTM patterns we observed are in good qualitative agreement with previously reported values (
). Our findings lead us to the conclusion that neprosin is a promising alternative to GluC, AspN, and trypsin for the mass spectrometric analysis of histones.
Neprosin's ability to cleave with high specificity after Pro and Ala, under mildly denaturing conditions and at low E:S ratios, makes it a useful new tool for proteomics. Other opportunities will include protein structure/function applications of mass spectrometry. Neprosin can provide a more selective alternative to aspartic proteases like pepsin in hydrogen/deuterium exchange experiments or the analysis of native disulfide bridges, and as Pro and Ala are inert to most protein chemistries, it should prove useful in processing cross-linked proteins. It remains unclear whether neprosin is the archetype of the entire family of proteins presenting DUF239. Other members may offer PEP-like functionality distinct from neprosin and should be explored for their utility.
Raw files for all MS experiments are available from Chorus (ID 1262), and annotated fragment ion spectra for identified histone tails are available in the supplemental materials.
Mathias Wilhelm (Technische Universität München, Freising, Germany) is gratefully acknowledged for providing data from ProteomicsDB.
Author contributions: C.U.S. and D.C.S. designed the research; C.U.S., L.L., M.R., and P.M. performed the research; L.L., P.M., S.S., V.Z., and B.L. contributed new reagents or analytic tools; C.U.S., L.L., V.S., B.L., and D.C.S. analyzed the data; and C.U.S. and D.C.S. wrote the paper.