If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
State Key Laboratory of Chemical Biology and Drug Discovery, Department of Applied Biology and Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, SAR, China
Establishment of an efficient method to improve AltProt identification.
•
Method enabled simultaneous enrichment and fractionation from complex proteome.
•
Eighty-nine novel AltProts were identified to reveal AltORF translation in liver development.
•
Establishment of a combined approach of Ribo-seq prediction and targeted MS detection.
•
Differential AltProts analysis reveals involvement in development-related biological pathways.
Abstract
Alternative ORFs (AltORFs) are unannotated sequences in genome that encode novel peptides or proteins named alternative proteins (AltProts). Although ribosome profiling and bioinformatics predict a large number of AltProts, mass spectrometry as the only direct way of identification is hampered by the short lengths and relative low abundance of AltProts. There is an urgent need for improvement of mass spectrometry methodologies for AltProt identification. Here, we report an approach based on size-exclusion chromatography for simultaneous enrichment and fractionation of AltProts from complex proteome. This method greatly simplifies the variance of AltProts discovery by enriching small proteins smaller than 40 kDa. In a systematic comparison between 10 methods, the approach we reported enabled the discovery of more AltProts with overall higher intensities, with less cost of time and effort compared to other workflows. We applied this approach to identify 89 novel AltProts from mouse liver, 39 of which were differentially expressed between embryonic and adult mice. During embryonic development, the upregulated AltProts were mainly involved in biological pathways on RNA splicing and processing, whereas the AltProts involved in metabolisms were more active in adult livers. Our study not only provides an effective approach for identifying AltProts but also novel AltProts that are potentially important in developmental biology.
Alternative ORFs (AltORFs) are unannotated coding sequences that are different from any known protein-coding gene documented in database or reference annotation projects (
). The translation products of AltORFs are termed alternative proteins (AltProts), which have no similarity to canonical reference proteins (RefProts) of the same gene. Unlike short proteins/microproteins that are encoded by small ORFs (sORFs) with restrictions of less than 100 amino acids, AltProts do not have an upper limit on length (
). Therefore, AltProts include proteins of less than and greater than 100 amino acids. Recently, AltProts have turned out to play essential roles in a variety of physiological processes or diseases (
However, the discovery of functional AltProts was mostly serendipitous, to date, we still lack a systematic approach to directly identify AltProts from biological specimens in large scale (
). The Ribosome profiling (Ribo-seq) technique sequences ribosome-protected RNA fragments and thus enables the prediction of thousands of AltORFs with bioinformatics pipelines (
). The big difference in the identification number between the two methods calls for urgent improvement on the MS-based methodologies to detect AltProts. The discovery of AltProts by MS is challenging partly due to their short length and interference from large canonical proteins (
Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides.
). Using public databases that combine all translational products from various samples, the efficiency of AltProt discovery is far inferior than that of RefProts. Considering the high temporal/spatial specificity of AltProts translation, it is important to use customized database from the same specific samples for mining novel AltProts. Although several prior works have improved the AltProt sample preparation procedures or database construction, there is still a vast room for improvement (
). While a large number of canonical proteins and their mechanisms in developmental biology have been thoroughly investigated, only a few AltProts have been studied (
). Considering AltProts could also play pivotal roles in development, either independently or through the regulation of canonical proteins, the large scale and accurate identification of AltProts is crucial for our understanding of the mechanisms in embryonic development.
We herein report an optimized approach integrating MS and Ribo-seq techniques to identify AltProts with improved depth and efficiency. With the optimized approach, we were able to discover and quantify stage-dependent AltProts from embryonic and adult livers that were enriched in specific biological pathways. Our study not only provided an approach but also novel AltProts as new players in liver development.
Experimental Procedures
Chemicals and Reagents
Acetonitrile, methanol, formic acid (FA), trichloroacetic acid, water (HPLC grade), 16% Tricine gel, and Tricine SDS running buffer were from Thermo Fisher Scientific. Acetic acid (AA), ethanol, and chloroform were from DUKSAN. Lysyl endopeptidase (Lys-C, mass spectrometry grade) and trypsin (sequencing grade) were purchased from Promega. Ammonium formate, ammonium bicarbonate, DL-DTT, and iodoacetamide were from Sigma-Aldrich, and all other reagents were from Sigma-Aldrich.
Animals and Tissue Collection
To compare AltProt enrichment and fractionation methods from liver total lysates, C57BL/6 mice weighing between 18 and 22 g were purchased from Centralized Animal Facilities, The Hong Kong Polytechnic University, Hong Kong. Adult mice were anaesthetized and then perfused with isotonic saline containing protease inhibitors (0.120 mM EDTA, 0.2 mM PMSF, and Roche Complete Protease Inhibitor tablets, pH 7.4) before decapitation. Livers were quickly dissected and immediately snap-frozen in liquid nitrogen. All animal experiments were approved by the Hong Kong Polytechnic University Animal Subjects Ethics Subcommittee (Approval No: 20-21/275-ABCT-R-STUDENT) and were performed in accordance with the Institutional Guidelines and Animal Ordinance of the Department of Health.
For discovery of AltProts in embryonic liver development, livers were harvested separately from embryonic (E15.5) and adult (P42) C57BL/6 mice and immediately snap-frozen in liquid nitrogen. Mice were purchased from the Guangdong Medical Experimental Animal Center (Guangdong, China; License No: SCXK (YUE) 2018 0002). All experimental procedures were approved by the Animal Ethics Committee of the Zhongshan Ophthalmic Center, Sun Yat-sen University (Guangzhou, China; License No: SYXK (YUE) 2018 0189) and in accordance with the institutional animal welfare guidelines and Animal Protection Law of China.
Protein Extraction and AltProt Enrichment
Mouse liver tissues were obtained from The Hong Kong Polytechnic University. Three different AltProt extraction methods were compared: (1) RIPA lysis buffer (50 mM Tris-HCl, 150 mM sodium chloride, 2 mM EDTA, 1% NP40, 1% sodium deoxycholate), (2) acid lysis buffer (50 mM hydrochloric acid (HCl), 0.1% β-mercaptoethanol; 0.05% Triton X-100) (
). Then, the extracts were centrifuged at 16,000 g for 20 min at 4 °C to remove residual debris.
We tested 10 enrichment methods in triplicates from four categories, (1) precipitation, (2) size selection, (3) solid phase extraction (SPE) enrichment method, (4) hexagonal mesoporous silica materials, using equal amounts of lysates. For the precipitation, these methods used were based upon previously described protocols. AA 0.25% or AA 25 % precipitation: AA (0.25%, v/v) (
) was added to the supernatant followed by centrifugation at 16, 000g for 20 min at 4 °C. For the trichloroacetic acid (TCA) precipitation, 20% TCA was added to the samples as 1:1 (v/v), followed by the addition of chloroform (CHCl3) 1:1 (v/v). Samples were centrifuged at 1500 g for 10 min at 4 °C and transfer the supernatant to a new tube. The lower samples were then washed with 100 μl of Milli-Q water and 100 μl of methanol, followed by vortex and centrifugation at 1500 g at 4 °C for 10 min. Subsequently, both supernatants were combined (
Depletion of high-molecular-mass proteins for the identification of small proteins and short open reading frame encoded peptides in cellular proteomes.
). For methyl tert-butyl ether (MTBE)-based sequential precipitation, single-phase buffer MTBE/methanol/water (5:3:1, v/v) and two-phase buffer MTBE/methanol/water (5:1:1, v/v) were applied for sequential precipitation and delipidation as described previously (
). For the method of size selection category, the first one is the 30-kDa-molecular weight cut-off ultrafiltration (30-kDa-MWCO), the lysate was loaded into a 30-kDa-MWCO (Millipore), and the flow through was collected (
). Another method is size-exclusion chromatography (SEC) enrichment, to isolate proteins <30 kDa from larger proteins in liver lysates, a GE AKTA Explorer FPLC System (GE Healthcare) was combined with a Sephadex 75 Increase 5/150 Gl column (GE Healthcare) for enrichment and fractionation of small proteins. Low molecular weight standards (GE Healthcare) were used for mass calibration. Each SEC separation run was performed at a flow rate of 0.2 ml/min at a wavelength of 254 nm for 15 min. Only fractions between 8 min and 15 min of retention time, which corresponded to proteins of molecular weight <30 kDa and had a total volume of 1.6 ml, were collected into a low protein binding tube (Eppendorf); for SEC enrichment purpose, these fractions were combined into one tube and lyophilized before use. For the method of SPE category, the liver lysates were enriched using C8 SPE cartridges (Agilent Technologies) or hydrophilic-lipophilic-balanced SPE (HLB SPE, Waters) cartridges. The first method is C8 SPE-based enrichment (
), cartridges were activated with one column volume (CV) of methanol and then equilibrated with two CVs of triethylammonium formate (TEAF) buffer (pH 3.0) before the lysate was applied. The cartridges were then washed with two CVs of TEAF buffer (pH 3.0) and the enriched proteins were eluted with ACN:TEAF buffer (3:1, pH 3.0). The other method is HLB SPE-based enrichment, cartridges were activated with methanol and then equilibrated with water before the lysate was applied. The cartridges were then washed with water and eluted with 60% ACN. Lastly, hexagonal mesoporous silica materials MCM-41 were mixed with lysates and small proteins were extracted as described by Du et al (
Further investigation of a peptide extraction method with mesoporous silica using high-performance liquid chromatography coupled with tandem mass spectrometry.
). Detailed protocol on enrichment method is available in the supplemental materials.
Protein Sample Cleanup with SP3 Method
For each 20 μg of sample, Sera-Mag SpeedBeads Carboxyl Magnetic Beads, hydrophobic and Sera-Mag SpeedBeads Carboxyl Magnetic Beads, hydrophilic (GE Healthcare) were gently combined in a ratio of 1:1 (v/v) and used as described by Hughes et al (
). Samples were reduced and alkylated using DTT and iodoacetamide, respectively. Next, the bead slurries were transferred to the samples. Then, absolute ethanol was added to a final concentration of 50% (v/v) to induce protein binding. Beads were resuspended in 50 mM ammonium bicarbonate supplemented with Lys-C enzymes at an enzyme to protein ratio of 1:100 (w/w). After 4 h incubation, trypsin was added at an enzyme to protein ratio of 1:20 (w/w), as 1:25 was recommended by Hughes et al. for complete digestion, and the sample was incubated at 37 °C for 12 h. Peptide concentration was determined using the Pierce Quantitative Fluorometric Peptide assay (Thermo Fisher Scientific). From each sample, peptides were labeled with TMT-6plex (includes the following channels: 126, 127N, 127C, 128N, 128C, 129N, Thermo Fisher Scientific) according to the manufacturer’s instructions.
SDS-PAGE Gel Analysis of Enriched AltProts Samples
After enrichment, protein content was quantified by the Bradford assay and the same amount (25 μg) of protein was loaded on each lane of the gel. Samples were analyzed using 16% tricine-SDS-PAGE and separated at a constant 60 V until they completely entered the separating gel from the stacking gel. Then, a constant 110 V was maintained until the tracking dye reached the bottom of the gel. Finally, the gel was stained with Coomassie brilliant blue R-250 (Bio-Rad).
Comparison of Fractionation Methods after SEC Enrichment
SEC Enrichment into Four Fractions
Mouse liver samples were loaded on the SEC column and then final four fractions of the low molecular weight range were collected and finally they were injected separately into MS for detection.
High-pH Reversed-Phase Fractionation
After SEC enrichment, the obtained proteins were digested and then peptides were fractionated using a Waters Acquity UPLC Peptide BEH C18 column (2.1 × 100 mm, 1.7 μM, Waters) on an Agilent 1290 Infinity LC system (Agilent Technologies) operating at 50 μl/min. Buffer A consisted of 10 mM ammonium formate and buffer B consisted of 10 mM ammonium formate and 90% ACN, both buffers were adjusted to pH 9 with ammonium hydroxide as described previously (
). Fractions were collected every 1 min from 6 min to 100 min retention time (96 fractions, finally concatenated into eight fractions). Peptides were separated by a linear gradient as follows: 0 to 10 min, 1% B; 10 to 38 min, 1 to 8% B; 38 to 75 min, 8 to 62% B; 75 to 85 min, 62 to 95% B; 85 to 100 min, 95% B. The final eight fractions were concentrated and analyzed by LC-MS/MS.
ERLIC Fractionation
After SEC enrichment, the obtained proteins were digested and then peptides were fractionated using an Agilent 1290 Infinity LC system equipped with a PolyWAX ERLIC column (200 × 2.1 mm, 5 μM, 300 Å, PolyLC) as described previously (
). Buffer A consisted of 90% acetonitrile and 0.1% AA and buffer B consisted of 30% acetonitrile, 0.1% FA. From 6 min to 100 min retention time, fractions were collected every 1 min (96 fractions, finally concatenated into eight fractions). Peptides were separated by a stepwise gradient as follows: 0 to 10 min, 0% B; 10 to 22 min, 0 to 8% B; 22 to 38 min, 8 to 45% B; 38 to 50 min, 45 to 80% B; 50 to 68 min, 80 to 98% B; 68 to 100 min, 98% B. The final eight fractions were concentrated and analyzed by LC-MS/MS.
LC-MS/MS Analysis
For data-dependent acquisition, all mass spectrometry data were collected on an Orbitrap Exploris 480 mass spectrometry equipped with the FAIMS interface and coupled with an Ultimate 3000 RSLC nano system (Thermo Fisher Scientific). The digested samples were redissolved in 0.1% FA and separated on a self-packed capillary column packed with Reprosil-Pur C18 1.9 μM particles (Dr Maisch GmbH). Mobile phase A (0.1% FA) and mobile phase B (80% ACN and 0.1% FA) were used to separate peptides with the following gradients: 2 min, 8 - 10% B; 2 to 120 min, 10 - 35% B, 120 to 140 min, 35 to 90% B; 140 to 150 min, 90%B in bottom-up proteomics, at a constant flow rate of 300 nl/min. The full scan spectra were measured with a resolution of 120,000 within 50 ms maximum injection time, followed by MS2 scans with a resolution of 30,000 within 55 ms maximum injection time. The isolation window of the MS2 scan was set to 1.6 m/z, and only ions with 2 to 6 charges were triggered for the MS2 event. The normalized collision energy was set as 32. The dynamic exclusion time was set as 45 s. Compensation voltages were set at -45 V and -65 V to remove singly charged ions.
Construction of AltProts Database
This study used the Ribo-seq dataset we reported previously (
) (v 1.33, with parameters “se -x -t sanger”). rRNA and tRNA contaminants were removed by aligning trimmed reads to mouse tRNA and rRNA sequences (5S, 5.8S, 18S, and 28S) using Bowtie 2 (
) (v1.0.1, with command “-q -L 20 --phred33 --end-to-end”). All remaining reads were mapped to the mouse reference genome GRCm 38 with a GTF annotation file (GENCODE vM25) using STAR (v 2.7.2 a) (
), were used to perform ORF and AltORF detection with the longest strategy under the default threshold setting (supplemental Table S1). The final set of actively translated ORFs with all near-cognate start codons (AUG, TUG, CUG, and GUG) followed by an in-frame stop codon in annotated transcripts was stringently filtered based on the requirement of a minimum length of 18 nucleotides and the expression of the ORF-containing gene at an above-background level, as described in a previous report (
). Those ORFs that pass above filtering criteria were classified into several categories based on their relative location with nearest annotated coding sequence (CDS), as described previously (
). In the classification result, ORFs were defined as annotated proteins. Upstream ORFs (uORFs) and downstream ORFs were defined as AltORFs originating from the 5′ untranslated regions (UTRs) and 3′UTRs of annotated protein-coding genes, respectively; long noncoding RNA ORFs (lncRNA-ORFs) were defined as AltORFs originating from transcripts currently annotated as lncRNAs; upstream overlapping ORFs (uoORFs), downstream overlapping ORFs, and internal out-of-frame ORFs were defined as AltORFs located upstream, downstream, and intermediate of CDS and out-frame overlapping with annotated CDSs, respectively. Finally, nucleic acid sequences of all actively translated AltORFs were converted into amino acid sequences in the FASTA format for the construction of protein databases.
Identification of Canonical Proteins and AltProts
The LC-MS/MS raw data were analyzed with MSFragger (version 3.3). The common parameters were set as below: precursor mass tolerance: 10 ppm, fragment mass tolerance: 0.02 Da; trypsin as enzyme; two missed cleavages; oxidation (methionine), acetyl (protein N-term), and TMT-6plex (N terminus) as variable modifications; carbamidomethylation (cysteine) and TMT-6plex (lysine) as fixed modification; the validation was performed using PeptideProphet; the FDR was set as 1%. Two different protein databases were used in this study: (1) Mouse OpenProt and sORF database were used for comparison of enrichment methods. Mouse OpenProt protein database was derived from OpenProt (https://openprot.org, version number 1.6, 01 September 2020) (
) and contains 563,275 entries consisting of RefProts, novel isoforms, and AltProts predicted from both Ensembl and RefSeq. There were 503,679 entries in the Mus musculus AltProt protein database from sORF.org (http://www.sorfs.org, downloaded on 01 June 2021) (
); (2) in-house mouse AltProt database had 146,461 entries, which were used for AltProt discovery in TMT-labeled embryonic and adult livers. Identification of AltProts was always based on a peptide specific to the AltProt sequence and not common with the RefProts. The results from the custom database search were further filtered against the reference mouse proteins database (RefProt, containing Ensembl, NCBI RefSeq, and UniProtKB) using a stringent string-searching-based mapping algorithm to ensure that we did not report any known protein degradation, mutants, or isoforms.
We performed Gene Ontology (GO) analysis mainly based on annotated AltORFs, which are in the same genes that encode the related uORFs, downstream ORFs, and uoORFs, as well as lncRNA-ORFs that were encoded by the retained introns of protein-coding genes with known functions. GO analysis was performed with R package clusterProfiler (v4.0.5).
Validation of Novel AltProts with Parallel Reaction Monitoring
For parallel reaction monitoring (PRM), the samples were separated on the same LC-MS system by a 150 min gradient. Full scan spectra were measured with a resolution of 120,000 within a 50 ms maximum injection time, followed by targeted peptide MS2 scans with a resolution of 30,000 within a 60 ms maximum injection time under the 1.2 m/z isolation window. The normalized collision energy was set as 30. PRM data (tier 3 level) were processed with Skyline (version 21.1) software as described previously (
Identification of More AltProts Using the PRM Method
Twenty-seven AltProts were selected from the Ribo-seq-based AltProt database for targeted PRM analysis (tier 3 level) to identify additional AltProts. Briefly, a fragmentation inclusion list of theoretically predicted tryptic peptides in the selected AltProt was generated to identify more novel AltProts using high-resolution data-dependent scanning. A total of 51 unique peptide targets (corresponding to 27 AltProts) were selected in the inclusion list based on the following stringent screening criteria, including peptides uncommon to RefProts, sequence length greater than 7 amino acids, and the absence of methionine oxidation.
Experimental Design and Statistical Rationale
To test the performance of different AltProt enrichment methods, we performed triplicates for each enrichment method using adult C57BL/6 mice liver samples. To investigate AltProt expression during liver development, livers of embryonic (E15.5) and adult (P42) C57BL/6 mice in triplicates were used. Data were analyzed by a two-tailed unpaired Student's t test (unless otherwise indicated), and p < 0.05 was selected as the statistical limit of significance. We selected ∗ and ∗∗ for p < 0.05 and p < 0.01, respectively. Unless otherwise stated, all the data in the graphs were expressed as arithmetic mean ± the SD from at least three repeated experiments.
Results
Optimization of the Workflow for Microprotein Discovery
Considering the distinct lengths and properties of canonical RefProts and AltProts (
), the identification of AltProts with classical proteomic methods is analytically challenging. Therefore, we sought to improve the proteomics workflow at multiples steps, including protein extraction, AltProts enrichment, and peptides fractionation by comparing various conditions (Fig. 1). First, three widely employed protein extraction methods, RIPA lysis buffer, acidic lysis buffer, and boiling water, were tested for extracting AltProts from mouse liver homogenates. Significant protein loss was observed with acid lysis buffer and boiling water although they have been reported for extraction of small proteins by preferentially causing aggregation of high molecular weight proteins (
Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides.
). In contrast, RIPA lysis buffer offered much higher efficiency for total protein extraction and therefore was adopted in all following experiments (supplemental Fig. S1A).
Fig. 1Schematic illustration of the workflow for MS-based discovery of AltORFs-encoded AltProts from mouse liver tissues.A, designed workflow including extraction, enrichment, and fractionation methods for the discovery of AltProts. B, construction of AltProt database. The Ribo-seq data was screened by 10 different bioinformatics pipelines to find all possible translational AltORFs, which were then translated into potential translational products AltProts. AltORFs, alternative ORFs; AltProts, alternative proteins; DDA, data-dependent acquisition; MS, mass spectrometry; PRM, parallel reaction monitoring; SEC, size-exclusion chromatography; SPE, solid-phase extraction.
SEC is the Most Efficient Method to Enrich AltProts
Next, we tested 10 methods from four categories to find the most efficient method for enriching AltProts from total proteins. In the first category “precipitation”, organic solvent or acids precipitated high molecular weight proteins and subsequently enriched AltProts. In the second category “size selection”, ultrafiltration tubes and SEC enabled separation of proteins by size. In the third category “solid phase separation”, the nonpolar reversed-phase sorbent trapped large hydrophobic proteins, while small and polar proteins were eluted and enriched. The fourth category was hexagonal mesoporous silica materials MCM-41, which enabled selectively enriched peptides and small protein through size selectivity and adsorptive mechanism. The efficiency of methods was compared side by side based on gel images and or MS analysis of the enriched proteins. Based on the tricine gel and glycine gel image, most methods were able to remove proteins larger than 40 kDa efficiently. However, the proteins enriched with these methods display vastly different profiles (Figs. 2A and S1, B–F). TCA precipitation, AA precipitation, C8 SPE, HLB SPE, 30-kDa-MWCO, and SEC resulted in strong protein bands and therefore were chosen for the following comparison with MS.
Fig. 2Comparison of different enrichment methods for AltProts in mouse livers.A, liver lysates were lysed in RIPA lysis buffer followed by enrichment including C8 SPE, HLB SPE, SEC, 30-kDa-MWCO, AA precipitation, or TCA precipitation. The results from these enrichments were analyzed by SDS-PAGE (Coomassie stain). B, average number of detected AltProts using different enrichment methods. C, MS intensity of identified AltProts and RefProts in each enrichment method. 30-kDa-MWCO, 30-kDa-molecular weight cut-off ultrafiltration; AltProts, alternative proteins; AA, acetic acid; C8 SPE, C8 solid-phase extraction; HLB SPE, hydrophilic-lipophilic-balanced solid-phase extraction; MS, mass spectrometry; RefProts, reference proteins; SEC, size-exclusion chromatography; TCA, trichloroacetic acid.
With equal protein amounts, the highest identification number was achieved by using SEC enrichment, with an average of 51 AltProts identified, which was more than twice that of the other methods (Fig. 2B). Meanwhile, although the intensity of RefProts was similar across all tested methods, the intensity of AltProts after SEC enrichment was five folds higher than that of other methods. SEC greatly reduced the difference between RefProts and AltProts in terms of MS intensity, which demonstrated its effectiveness in concentrating AltProts out of total lysates (Fig. 2C).
Characteristics of AltProts Enriched with Various Methods
Given the complementary nature of these enrichment methods, there were only a few AltProts commonly identified by using different categories of methods (Figs. 3A and S2). Although individual method did not yield a high number of AltProts, the methods collectively contributed more varieties of AltProts. In our study, we found that No-enrich and SEC method were actually complementary in identifying different categories of AltProts. The reproducibility was higher within the same category than between categories. For example, over 60% of AltProts identified with TCA precipitation were reproducibly identified with AA precipitation. Among all the methods, SEC was found to be the most comprehensive. For AltProts that were identified by multiple enrichment methods, SEC resulted in the highest intensities (highlighted in red in Fig. 3A). Next, we analyzed the hydrophobicity and isoelectric point (pI) to investigate whether AltProt identification was associated with their biophysical properties (Fig. 3, B and C). As expected, the acid precipitation methods enriched more hydrophilic AltProts with lower GRAVY scores (Fig. 3B). TCA precipitation and AA precipitation preferentially enriched more AltProts with a high pI than other methods (Fig. 3C). Such differential biophysical properties partially explained the observation that a complementary pool of AltProts was enriched with different methods. SEC-based method enriched AltProts with evenly distributed hydrophobicity and pI and therefore was the most efficient method.
Fig. 3Performance of different approaches for enriching AltProts. Distribution of MS intensity (A), hydrophobicity (B), and isoelectric point (C) of identified AltProts from different enrichment methods. ∗p < 0.05 versus No-enrich; ∗∗p < 0.01 versus No-enrich. 30-kDa-MWCO, 30-kDa-molecular weight cut-off ultrafiltration; AltProts, alternative proteins; AA, acetic acid; C8 SPE, C8 solid-phase extraction; GRAVY, grand average of hydropathicity index; HLB SPE, hydrophilic-lipophilic-balanced solid-phase extraction; pI, isoelectric point; MS, mass spectrometryy; SEC, size-exclusion chromatograph; TCA, trichloroacetic acid.
Peptide fractionation using electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) and high pH reverse phase (HpRP) has been reported to improve the discovery of AltProts in prior studies (
Combination of bottom-up 2D-LC-MS and Semi-top-down GelFree-LC-MS enhances coverage of proteome and low molecular weight short open reading frame encoded peptides of the archaeon methanosarcina mazei.
). SEC, which was found to be the most efficient and unbiased method for the enrichment of AltProts in our study, could also serve for protein fractionation to obtain four fractions of different molecular weight ranges. Therefore, we evaluated four fractionation methods for improving the depth of AltProt discovery, including SEC enrichment without fractionation (SEC), SEC enrichment into 4 fractions (SEC-fraction), SEC enrichment followed by ERLIC fractionation (SEC-ERLIC), and SEC enrichment followed by HpRP fractionation (SEC-HpRP) (Fig. 4A). SEC-fraction and ERLIC fractionation increased the number of AltProts by 1.4 to 1.6 folds. SEC-ERLIC led to the highest number of AltProts, while SEC-fraction was the most time- and effort-effective, as it could enrich and fractionate AltProts simultaneously within 15 min (Fig. 4A). The intensities of AltProts even showed slight increase after SEC-fraction and SEC-ERLIC (Fig. 4B).
Fig. 4The effect of fractionation methods on AltProt discovery. The number of identified AltProts (A) and MS intensity of AltProts (B) before and after fractionation. ∗ indicates p < 0.05 for comparison. AltProts, alternative Proteins; MS, mass spectrometry; SEC, size-exclusion chromatography; SEC-fraction, SEC enrichment into 4 fractions; SEC-ERLIC, SEC enrichment followed by ERLIC fractionation; SEC-HpRP, SEC enrichment followed by HpRP fractionation.
Optimized Workflow Enables Discovery of AltProts in Embryonic Liver Development
Next, we applied the optimized workflow in combination with TMT-based quantification to investigate AltProts expression during liver development (Fig. 5A). Total protein lysates were extracted from the livers of embryonic (E15.5) or adult (P42) C57BL/6 mice in triplicates followed by SEC-ERLIC and TMT-based quantification. As we previously studied the protein translation landscape of mouse livers during development by using Ribo-seq, we were able to construct a liver-specific protein database based on Ribo-seq results and search the MS data against it. Although our customized database was much smaller than public databases (
), we were able to detect 5146 RefProts and 89 AltProts reproducibly from embryonic and adult mouse livers (supplemental Table S2). Representative mass spectra of AltProt peptides were listed in supplemental Figs. S3 and S4. Despite the fact that MS and Ribo-seq were two completely different techniques, the measured fold change between embryonic and adult livers showed a positive correlation with R equals to 0.71 (Fig. 5B), indicating that both techniques can precisely capture the overall changes of proteome during development. A large majority of the AltProts identified were encoded by lncRNA-ORFs (74%) and uoORFs (22%) and some were from uORF, downstream ORF, and internal out-of-frame ORFs (Fig. 5C). The identified AltProts showed similar hydrophobicity with RefProts (supplemental Fig. S5). Furthermore, 39 AltProts were found to be differentially expressed (Fig. 5D). GO analysis of AltORFs showed that AltProts upregulated in embryonic livers were involved in RNA splicing and processing, whereas AltProts upregulated in adult livers were enriched in metabolic pathways (Fig. 5E). The biological pathways were consistent with that of RefProts, suggesting the functional importance of AltProts in liver development. We further employed an alternative MS strategy, PRM, to validate the identification and quantification of novel AltProts (supplemental Figs. S6 and S7). For example, the MS2 spectrum of the noncanonical peptide QLLLAGLQNAGR highly agreed with its predicted spectrum (Fig. 5F). The amount of this peptide was significantly downregulated in three embryonic livers compared to adult livers (Fig. 5G). In the end, we sought to understand the relationship between AltProts and RefProts. We specifically searched for actively translated AltORFs within the 5′- and 3′-UTRs of canonical ORFs. With stringent criteria, six pairs of AltProts and their primary RefProts from the same gene were detected by MS in the same experiment (Fig. 5H and supplemental Table S3). Among them, dihydrofolate reductase, ceruloplasmin, and beta-globin (Hbb-bs) and their corresponding AltProts were significantly changed between embryonic and adult mice, indicating a potential cis gene regulatory effect between AltORFs and the corresponding primary ORFs.
Fig. 5Discovery of AltProts in embryonic and adult livers.A, SEC-ERLIC workflow for the discovery of AltProts in adult and embryonic livers using a TMT-based MS approach. B, the correlation of protein expression differences from embryonic and adult mice detected using two techniques Ribo-seq and MS-based proteomic approaches. C, RNA type distribution of identified AltProts. D, volcano plot of identified RefProts and AltProts in adult and embryonic livers. Orange and green dots represent the upregulated and downregulated RefProts, respectively. Red and dark gray dots represent the significant changed AltProts and stable AltProts, respectively. (p values <0.05; |fold change (FC)| > 1.5). E, GO analysis of the significantly changed RefProts and AltProts. F, an example of the experimental spectrum and the predicted spectrum of noncanonical peptide QLLLAGLQNAGR. G, the corresponding peak areas of the representative noncanonical peptide QLLLAGLQNAGR in embryonic and adult livers using PRM method. H, heatmap of MS intensity of pairs of AltProts and their primary RefProts from the same gene. AltProts, alternative protein; Cp, ceruloplasmin; DHFR, dihydrofolate reductases; ERLIC, electrostatic repulsion-hydrophilic interaction chromatography; Hbb-bs, beta-globin; lncRNA, long noncoding RNA; MS, mass spectrometry; RefProts, reference proteins; Ribo-seq, ribosome profiling; SEC, size-exclusion chromatography.
Integrating Ribo-seq and PRM to Discover Additional AltProts
It is noteworthy that the number of AltORFs being translated predicted with Ribo-seq was dramatically higher than that detected by MS (Fig. 6A). Therefore, we tested an alternative approach by integrating Ribo-seq with targeted MS method to discover additional AltProts that were undetectable using conventional shotgun proteomics (Fig. 6B). To provide a precise list of AltORFs, we used 10 different bioinformatics pipelines to predict possible translational AltORFs and kept only those that were reproducibly reported with at least two pipelines. The full-length sequences of AltProts were subsequently generated by using 3-frame translation. Out of the 27 selected AltProts with unique peptides, 11 were detectable with PRM (supplemental Table S4). The retention time showed a high correlation between theoretical and experimental values (R = 0.84-0.88), indicating a high confidence in AltProt identification (Fig. 6C). Even though the identification rate was 40% with this approach, it could serve as a supplement to traditional shotgun proteomics and possibly allow detection of AltProts with low abundance. For example, peptide LALGPAAR was from a novel AltProt with 50 amino acids. This AltProt was encoded by the 5′-UTR sequence of Hnrnpa0 gene encoding Heterogeneous nuclear ribonucleoprotein A0 (HnRNPA0) (Fig. 6D). Both this AltProt and RefProt HnRNPA0 were significantly upregulated in embryonic livers (Fig. 6, E and F). Hnrnpa0 plays an important role in myeloid cell differentiation (
). The identification of its upstream AltORF could lead to novel regulatory mechanisms of this important protein.
Fig. 6Discovery of additional AltProts using PRM method.A, Venn diagram of AltProts predicted by Ribo-seq and AltProts detected in targeted MS-based proteomics. B, flow chart of the AltProt discovery using PRM method. C, the correlation between the experimental retention time and the theoretical retention time of identified AltProts. Red dots represent the peptides found by targeted PRM, black dots represent the peptides found by SEC-ERLIC workflow and validated by PRM method. D, a typical schematic diagram of an AltProt expressed on the uORF of Hnrnpa0 gene encoding HnRNPA0. E, an example of the experimental spectrum and the predicted spectrum of noncanonical peptide LALGPAAR. F, the peak areas of the representative noncanonical peptide LALGPAAR using PRM method. AltProts, alternative proteins; ERLIC, electrostatic repulsion-hydrophilic interaction chromatography; HnRNPA0, heterogeneous nuclear ribonucleoprotein A0; MS, mass spectrometry; PRM, parallel reaction monitoring; Rt, retention time; SEC, size-exclusion chromatography; uORFs, upstream ORFs.
In this study, we tested various methods and found “RIPA extraction/SEC enrichment/ERLIC fractionation” was the most efficient strategy for identifying AltProts with MS. With this strategy, we investigated novel AltProts in embryonic and adult mouse livers.
Although a few elegant works using MS for AltProt detection have been reported in recent years, but the number of AltProt identified to our knowledge still varies widely, from tens (
). This is probably explained by the different enrichment and analysis methods used and the sample variation. In our study, 89 novel AltProts were identified and compared between embryonic and adult mice, although not the highest, it is based on only one sample type. Our study is so far the most comprehensive one to optimize multiple steps and various combinations for AltProt identification and we found that different workflows favor different types of AltProts. According to our results, the SEC-based enrichment outperformed other methods in terms of identification number, specificity, and reproducibility of low-abundant AltProts. In contrast, sample loss and batch-to-batch variability were observed in the 30-kDa-MWCO method, probably due to nonspecific protein binding to the filter membrane (
Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides.
), enabling analysis depth comparable to HpRP or ERLIC fractionation without extra cost of time and effort. Besides, SEC does not require specific buffer conditions and therefore is usually compatible with downstream experiments like top-down MS and functional characterization. SEC-based approach has great potential in future AltProts studies.
We also compared two types of SEC columns for AltProt enrichment, considering that flow rate, particle pore size, sample volume, and CV could all influence the separation efficiency. Conventional SEC requires relatively large amounts of proteins and more time due to the large CV (
). Scaling-up the volumes would also dilute the proteins of interest, which impeded the detection sensitivity. We found that SEC column with smaller CV (3 ml) outperformed the one with larger CV (24 ml). Smaller column is also more efficient to complete the enrichment and fractionation simultaneously within 15 min.
We acknowledged that the identification number and confidence of AltProts are highly dependent on the size and quality of database and therefore decided to use only a noninflated, customized database. In this study, 89 AltProts were identified from embryonic and adult mouse livers. Our results showed that many AltProts that were upregulated in embryonic livers were involved in RNA splicing, RNA processing, and regulation of cell cycle transition (Fig. 5E). RNA splicing is a crucial process for changing mature mRNA into functional protein, a process that is required during mammalian embryogenesis to generate a viable organism from a single cell (
). The translation of uORFs of GCN4 promoted the release of ribosomes from the same transcript, preventing ribosomes from reaching start codon and subsequent inhibiting translation of the GCN4 gene (
). In our study, two uORFs and corresponding canonical ORFs of hnRNPA0 and hnRNPA2/B1 showed significant activation in embryonic livers. The observation was highly consistent in both MS and Ribo-seq results. hnRNPA0 was reported to affect myeloid cell differentiation and neurodevelopment (
). We speculate that AltProts encoded by uORF could promote the expression of downstream CDS, thereby regulating liver development. The detailed relationship in functions and mechanisms will be studied in due course.
Although we have discovered interesting AltProts involved in embryonic development with an optimized approach, the total identification number of AltProts was not comparable to that of RefProts. One possible reason is that we used a small, specific database and stringent cut-offs to filter the findings. However, the intrinsic short length and likely low abundance of AltProts are more important factors. Therefore, improvement in MS instrumentation with high sensitivity is needed in the future studies of AltProts.
Data Availability
The data that support the findings of this study have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD033940.
Depletion of high-molecular-mass proteins for the identification of small proteins and short open reading frame encoded peptides in cellular proteomes.
Further investigation of a peptide extraction method with mesoporous silica using high-performance liquid chromatography coupled with tandem mass spectrometry.
The authors declare that there are no conflicts of interest with the contents of this article.
Acknowledgments
We acknowledge the funding support from Research Grants Council-GRF 15305821, CRF Equipment C5033-19E, and RGC-RIF R5050-18, support from Laboratory for Synthetic Chemistry and Chemical Biology Limited (LSCCB) and Centre for Eye and Vision Research (CEVR) under the [email protected] Programme launched by ITC, HKSAR. We thank the Prof. Mankin Wong, PolyU Research Facilities ULS and UCEA, and research institute/center RiFood and RCMI for technical support.
Authors contributions
Yi. Y., H. W., L. C., Z. X., and Q. Z. methodology; Yi. Y., Y. Z., L. C., and Ya. Y. investigation; Yi. Y., Y. Z., and Q. Z. formal analysis; Yi. Y. and Q. Z. writing–original draft.
Bottom-up and top-down proteomic approaches for the identification, characterization, and quantification of the low molecular weight proteome with focus on short open reading frame-encoded peptides.
Depletion of high-molecular-mass proteins for the identification of small proteins and short open reading frame encoded peptides in cellular proteomes.
Further investigation of a peptide extraction method with mesoporous silica using high-performance liquid chromatography coupled with tandem mass spectrometry.
Combination of bottom-up 2D-LC-MS and Semi-top-down GelFree-LC-MS enhances coverage of proteome and low molecular weight short open reading frame encoded peptides of the archaeon methanosarcina mazei.