If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
¶Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, and Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, Massachusetts;
From the ‡The Broad Institute or MIT and Harvard, Cambridge, Massachusetts 02142;‖Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215;**Center for Cancer Immunology, Massachusetts General Hospital, Boston, Massachusetts 02114;
From the ‡The Broad Institute or MIT and Harvard, Cambridge, Massachusetts 02142;§Klarman Cell Observatory, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142;‡‡Howard Hughes Medical Institute and Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02140
Proteomic profiling describes the molecular landscape of proteins in cells immediately available to sense, transduce, and enact the appropriate responses to extracellular queues. Transcriptional profiling has proven invaluable to our understanding of cellular responses; however, insights may be lost as mounting evidence suggests transcript levels only moderately correlate with protein levels in steady state cells. Mass spectrometry-based quantitative proteomics is a well-suited and widely used analytical tool for studying global protein abundances. Typical proteomic workflows are often limited by the amount of sample input that is required for deep and quantitative proteome profiling. This is especially true if the cells of interest need to be purified by fluorescence-activated cell sorting (FACS) and one wants to avoid ex vivo culturing. To address this need, we developed an easy to implement, streamlined workflow that enables quantitative proteome profiling from roughly 2 μg of protein input per experimental condition. Utilizing a combination of facile cell collection from cell sorting, solid-state isobaric labeling and multiplexing of peptides, and small-scale fractionation, we profiled the proteomes of 12 freshly isolated, primary murine immune cell types. Analyzing half of the 3e5 cells collected per cell type, we quantified over 7000 proteins across 12 key immune cell populations directly from their resident tissues. We show that low input proteomics is precise, and the data generated accurately reflects many aspects of known immunology, while expanding the list of cell-type specific proteins across the cell types profiled. The low input proteomics methods we developed are readily adaptable and broadly applicable to any cell or sample types and should enable proteome profiling in systems previously unattainable.
Proteome-wide measurements provide a more functionally relevant snapshot of cell states than transcriptional profiling alone. There is increasing evidence that steady-state measurements of mRNA levels only partially reflect the functional potential of a cell (
), whereas proteins are immediately available to sense and transduce extracellular cues and activate transcriptional responses to ultimately remodel the transcriptome/proteome. When used in combination, proteome profiling can reveal insights into regulatory steps such as the post-transcriptional, translational, and the post-translational levels (referred to hereafter as post-transcriptional) that can be missed with exome sequencing alone (
A major drawback for proteomic analyses is the high amount of protein input required, which can be too demanding for many biological systems. Typical sample preparation for mass spectrometry-based proteomics requires relatively large amounts of protein per sample (≥ 50 μg) per experimental condition. Samples such as cells purified by fluorescence-activated cell sorting (FACS)1, needle-core biopsies, and laser capture micro-dissected (LCM) tissue samples often yield low micrograms of protein per condition, preventing deep and quantitative global protein measurements using conventional proteomic sample preparation and analysis methods.
Immune cells comprise a wide variety of functionally distinct cell types and are most often characterized and classified by their transcriptional profiles, or a small set of protein surface markers (
). Previous studies profiling immune cell proteomes with liquid chromatography-mass spectrometry (LC-MS) have either not been input limited (human peripheral blood immune cells) or have expanded and differentiated purified murine immune cells in culture (
). Having relative protein abundances across the mouse immune system would provide a useful resource for future immunological studies in a genetically tractable organism.
Although powerful alternative approaches have been demonstrated for low input proteomics they require highly specialized equipment or expertise, or fall short of reaching an appreciable depth of coverage (
). To date, no approaches suitable for deep, quantitative profiling of FAC-sorted cells have been reported. Here, we describe a simple to implement sample preparation protocol for TMT-based proteomic analysis of FAC-sorted cells that minimizes sample handling steps and processing time. The method combines efficient cell collection from a cell sorter, improved peptide labeling for sample multiplexing, and small-scale fractionation. Specialized equipment such as liquid handling robotics are not needed for these sample processing steps, making the method adaptable by most proteomics laboratories. Using the approach, we profiled 12 primary, murine immune cell types straight from FACS purification to a depth of over 7000 proteins, collecting 3e5 cells (∼2 μg/sample) and injecting half for analysis. Low input proteomics of freshly isolated cells provides high quality, quantitative proteomic data that strongly reflects known aspects of the immune system. Comparing our proteomic data to publicly available Immunological Genome Consortium (Immgen) transcriptional profiles, we find evidence for post-transcriptional regulation at the global scale.
Five-week-old male C57BL/6J mice were purchased from the Jackson Laboratory (Bar Harbor, ME) and were analyzed at 6 weeks of age. Immunocytes from pools of 3 mice were double sorted for high purity by flow cytometry (double-sorting) on a BD Aria instrument according to ImmGen protocols (
and immgen.org). B cells, T4 cells, T8 cells, regulatory T cells, γδT cells, NKT cells, NK cells, and GNs were all obtained from spleens homogenized through a 100 μm cell strainer and treated with ACK lysing buffer, 1 ml per 108 cells. Macrophages and B1a cells were obtained by lavage of the peritoneal cavity with 7 ml PBS. Peritoneal mast cells were obtained by lavage of the peritoneal cavity with 7 ml HBSS containing 1 mm EDcell TA (
). All cells were stained and washed in DMEM using the indicated antibodies described below. For mass spectrometry, 300,000 cells were sorted into collection microreactors (described below), which were prepared by running 90 μl of phosphate buffered saline (PBS) through the tips by centrifuge. Quartz filter tips with sorted cells were briefly spun on a Galaxy Mini centrifuge (VWR #C1213, Radnor, PA) for 30 s allow the FACS sheath fluid (PBS) to run through the tip until only approximately one to five microliters of PBS from the cell suspension remained and leaving the cells on the filter. Finally, tips were frozen on dry ice before storage and cell lysis step. If 300,000 cells could not be collected from one sort, multiple tip digests were spun onto the same Stage tip to achieve this total.
Gel loading tips (Ranin LTS 20 uL) were packed with three to five punches of Whatman QM-A grade SiO2 mesh (GE Life Sciences, Marlborough, MA) with a 16-gauge blunt end needle to act as a small filter onto which cells accumulate (
). For FACS collection, microreactor tips were inserted into the cap of 1.5 ml centrifuge tubes that had a cross cut into them. This allows the tip and tube to be spun in various types of benchtop centrifuges. The quartz mesh was pre-wet with PBS prior to cell collection by spinning in a Galaxy minicentrifuge for a quick pulse. Cells were directly sorted into the microreactor tips and spun as above every 200 μl or so to remove the sheath fluid. This can be repeated two to three times until the tip builds up backpressure from the high number of cells. The cells are then washed with ice cold PBS and spun until near dry. The tip is then kinked right below the quartz membrane and is held shut by a 3–6 mm slice of a 200 μl pipette tips. For storage, the kinked tip's top is covered with parafilm and the entire device is stored in a 1.5 ml microfuge tube.
For cell lysis, the tip remained kinked for the entire digestion protocol. Ten microliters of 8 m urea, 10 mm TCEP and 10 mm iodoacetamide in 50 mm ammonium bicarbonate (ABC) is added using a gel loading tip to the surface of the quartz membrane. Pipetting up and down provided the shear forces to lyse the cells. For many cell types, especially T cells, the lysate becomes viscous because of genomic DNA, indicating cell lysis. Lysis, reduction, and alkylation is allowed to happen at room temperature for 30 min, shaking in the dark. 50 mm ABC is used to dilute the urea to less than 2 m and the appropriate amount of trypsin for a 1:100 enzyme to substrate ratio is added and allowed to incubate at 37 °C overnight. Once digestion is completed the pipette tip ring is removed from the kinked microreactor tip. The tip will remain partially kinked. The kinked end of the tip is inserted into a premade, equilibrated Stage tip. The microreactor-tip-in-a-Stage-tip contraption is spun at 3500 × g until the entire digest passes through the C18 resin. 75 μl 0.1% formic acid (FA) is then added to the microreactor tip washing the quartz membrane onto the C18 resin. This is repeated a second time. At this stage, the digest has now been loaded directly from the microreactor tip to the Stage tip C18 resin and is ready to be washed and eluted or can go directly to on-column TMT labeling, starting at the HEPES/TMT step (see below).
On-column TMT Labeling
Stage tips were packed with one punch of C18 mesh (Empore) with a 16-gauge blunt end needle. Resin was conditioned with 50 μl methanol (MeOH), followed by 50 μl 50% acetonitrile (ACN)/0.1% FA, and equilibrated with 75 μl 0.1% FA twice. The digest was loaded by spinning at 3500 × g until the entire digest passed through. If needed, the bound peptides were washed twice with 75 μl 0.1% FA. One μl of TMT reagent (∼19 μg when resuspended according to manufacturer's instructions) in 100% ACN was added to 100 μl freshly made HEPES, pH 8, and passed over the C18 resin at 350 × g until the entire solution has passed through. The HEPES and residual TMT was washed away with two applications of 75 μl 0.1% FA and peptides were eluted with 50 μl 50% ACN/0.1% FA followed by a second elution with 50% ACN/20 mm ammonium formate (NH4HCO2), pH 10. Peptide concentrations were estimated using an absorbance reading at 280 nm and checking of label efficiency was performed on 1/20th of the elution. We strongly recommend using freshly prepared HEPES and TMT reagents when possible. Older reagents can cause singly charged contaminants which become a predominant signal when low amounts of peptides (≤ 1 μg) are loaded onto the LC-MS. These contaminants can be partially avoided via bSDB fractionation, where the contaminants mainly elute above an ACN percentage of ∼33%. After using 1/20th of the elution to test for labeling efficiency, the samples are mixed before fractionation and analysis.
Stage tip bSDB Fractionation
200 μl pipette tips were packed with two punches of sulfonated divinylbenzene (SDB-RPS, Empore) with a 16 gauge needle. The Stage tip was conditioned and equilibrated as described above. After loading ∼20 μg peptides, a pH switch was performed using 25 μl 20 mm NH4HCO2, pH 10, and was considered part of fraction one. Then, step fractionation was performed according to the amount of peptide material and assuming equal mass distribution. For example, 20 μg of peptides were fractionated into nine fractions of 20 mm NH4HCO2, pH 10, with ACN concentrations of 5, 10, 15, 20, 25, 30, 35, 40, and 90%. Each fraction was transferred to autosampler vials and dried via vacuum centrifugation and stored at −80 °C until needed. As a rule of thumb, we determine the number of Stage tip fractions by the amount of total TMT labeled peptides going into the fractionation, (total peptide input/2) - 1. For example, if one has 2 μg/sample and ten TMT labeled samples, assuming equal mass distribution, we would have nine final fractions. Injecting half of each fractions leads to ∼1 μg on-column for LC-MS data acquisition.
For comparison, bC18 and SCX (both Empore) tips were made and fractionated in the same way as bSDB. SCX resin was conditioned with MeOH, followed by 50% ACN, 0.1% FA, 0.1% FA, and NH4OH, pH 11, 25% ACN, and was equilibrated with 0.1% FA. Prior to SCX fractionation, digests were desalted via Stage tip and were loaded onto the SCX resin in the C18 elution buffer, 50% ACN, 0.1% FA. SCX step fractionation was performed using 50 mm NH4C2H3O2, 25% ACN, pH 4.5, followed by 50 mm NH4C2H3O2, 25% ACN, pH 5.5, 50 mm NH4C2H3O2, 25% ACN, pH 6.5, NH4OH, 25% ACN, pH 8.0, 50 mm NH4HCO2, 50 mm 25% ACN, pH 9.0, and NH4OH, 25% ACN, pH 11.0. Each SCX fraction was then desalted via C18 Stage tip, dried via vacuum centrifugation and stored at −80 °C. For gradient extension experiments, only the separation gradient was extended by the stated amounts. A 1× gradient represents a 110-min method with an 84-min separation gradient. For these experiments, we kept the final fractions to six, rather than the nine as suggested above. This was to keep the gradient length extension consistent and, thus, comparisons more interpretable. All individual fractions were analyzed using the 1x gradient described above.
Protein Input Determination
We decided that sacrificing enough sample for each cell type analyzed to determine protein levels was undesirable. Instead, we prepared separate sorts for representative cell types for the sole purpose of protein input determination. GN, B, MF and CD4+ T cells were sorted into collection microreactors, lysed and digested as described above. The desalted peptides were measured using absorbance at 280 nm on a Nanodrop spectrophotometer. The 1 μg/μl Jurkat digest we use for instrument QC was serially diluted in half for a five-point standard curve. These protein yields were then assumed for all remaining cell collections.
Experimental Design and Statistical Rationale-Common Reference
A “common reference” sample used as an internal standard for each LC-MS run was prepared by mixing 200,000 cells each from the B.1a peritoneal cavity (PC), MF.PC, B.Sp, T4.spleen (Sp), T8.Sp, DC.Sp, GN.Sp, Treg.Sp, γδT.Sp, NK.Sp, and NKT.Sp populations,100,000 total CD45- mesenteric lymph node cells, 100,000 total CD45+ and total CD45− splenocytes three hours after subcutaneous injection of 10,000 U IFNα, 200,000 total CD45+ bone marrow cells, 200,000 total CD45+ gut cells, 200,000 total CD45+ peritoneal cavity cells, and 500,000 total CD45+ splenocytes. Sample layout for all other TMT channels can be found in supplemental Table S7.
Chromatography was performed using a Proxeon UHPLC at a flow rate of 200 nl/min. Peptides were separated at 50 °C using a 75 μm i.d. PicoFrit (New Objective, Woburn, MA) column packed with 1.9 μm AQ-C18 material (Dr. Maisch, Germany) to 50 cm in length over a 235 min run. The on-line LC gradient went from 6% B at 1 min to 30% B in 204 mins, followed by an increase to 60% B by minute 214, then to 90% by min 215, and finally to 50% B until the end of the run. Mass spectrometry was performed on a Thermo Scientific Q Exactive Plus mass spectrometer. After a precursor scan from 300 to 2000 m/z at 70,000 resolution, the top 12 most intense multiply charged precursors were selected for higher energy collisional dissociation (HCD) at a resolution of 35,000. Precursor isolation width was set to 1.7 m/z and the maximum MS2 injection time was 100 msecs for an automatic gain control of 5e4. Dynamic exclusion was set to 20 s and only charge states two to six were selected for MS2. Half of each fraction was injected for each data acquisition run.
Data were searched all together with Spectrum Mill (Agilent) using the Uniprot Mouse database (17 Oct. 2014, 41,309 entries), containing common laboratory contaminants. A fixed modification of carbamidomethylation of cysteine and variable modifications of N-terminal protein acetylation, oxidation of methionine, and TMT-10plex labels were searched. The enzyme specificity was set to trypsin and a maximum of three missed cleavages was used for searching. The maximum precursor-ion charge state was set to six. The MS1 and MS2 mass tolerance were set to 20 ppm. All TMT reporter ions were ratioed to the common reference channel of 131. Peptide and protein FDRs were calculated to be less than 1% using a reverse, decoy database.
Proteins were only reported if they were identified with at least two distinct peptides and a Spectrum Mill score protein level score ≥ 20. In all supplemental tables, the identifier column includes the Uniprot accession number_Gene symbol_Protein name_#unique peptides. Protein inference was performed in two different ways. For reporting number of proteins detected, protein subgroups were expanded if a subgroup specific peptide was identified. This reflects the number of various proteoforms that can be created from a single gene. For downstream analyses protein subgroups were collapsed to the proteoform with the most or best evidence so that all proteins were only represented by their gene name a single time. Peptides common between subgroups were used for quantitation.
TMT10 reporter ion intensities in each MS/MS spectrum were corrected for isotopic impurities by the Spectrum Mill protein/peptide summary module using the afRICA correction method which implements determinant calculations according to Cramer's Rule (
Fractional intensities (frINT) were calculated as the sum of precursor ion chromatographic peak areas in MS1 spectra for all PSMs contributing to the protein. A similar approach has been previously reported (
). frINT for each protein was calculated by splitting the combined precursor ion abundance in proportion to its individual normalized protein-level reporter ion ratio. For example, the amount of “fractional intensity” of protein A from one TMT channel (e.g. 126) can be calculated as the summed MS1 intensity for all peptides from protein A multiplied by the fraction that the 126 channel contributed to the summed MS1 intensity. Formally, frINT of 126 is written as: where frINTprotA is the fractional intensity of protein A, IprotA_MS1, is the summed MS1 intensity of all peptides attributed to protein A, i126 is the intensity of the 126 reporter ion for all protein A peptides, and Σiall is the summed intensity of all reporter ions used in the experiment.
The peak area for the extracted ion chromatogram (XIC) of each precursor ion subjected to MS/MS was calculated automatically by the Spectrum Mill software in the intervening high-resolution MS1 scans of the LC-MS/MS runs using narrow windows around each individual member of the isotope cluster. Peak widths in both the time and m/z domains were dynamically determined based on MS scan resolution, precursor charge and m/z, subject to quality metrics on the relative distribution of the peaks in the isotope cluster versus theoretical. Although the determined protein abundances are generally reliable, several experimental factors contribute to variability in the determined abundance for a protein. These factors include incomplete digestion of the protein; widely varying response of individual peptides because of inherent variability in ionization efficiency as well as interference/suppression by other components eluting at the same time as the peptide of interest, and sampling of the chromatographic peak between MS/MS scans. The number of observable tryptic peptides/protein can be used to correct summed peptide abundances for protein length. The frINT values used for all analyses were calculated by dividing each protein's frINT value by the median histone intensity (HIST2H2BB, H2AFX, HIST1H4A, HIST1H3A, and HIST1H1C) for that sample/TMT channel, the sum of the TIC for all fractions for a 10-plex, as well as the number of observable tryptic peptides for a given protein. This was then multiplied by 1e6 to give ppm Histone. These values provide rough estimates for abundance differences between proteins. The ratiometric data should be used for comparing a protein's abundance level across cell types.
To generate samples for testing frINT, we cultured E14 mouse embryonic stem cells in serum containing medium. For differentiation, LIF was removed for 48 h, followed by 500 μm retinoic acid (Sigma) for 48 h more. Cells were lysed and digested as described above. After desalting, the sample was split and 5 μg was desalted for LFQ analysis, or 5 μg was on-column TMT labeled, mixed and analyzed as described above. Roughly 1 μg was analyzed for both sample formats. Label free data was acquired as described above with the exception that the MS2 resolution was set to 17,500 and the normalized collision energy was 27. Data were searched on MaxQuant version 18.104.22.168. against the Uniprot Mouse Canonical database, 2014Apr03 (43,427 entries) with a precursor and product ion tolerance of 20 ppm. Trypsin/P was set for digestion conditions allowing up to two missed cleavages. Variable peptide modifications allowed for Met oxidation and protein N-term acetylation, Cys carbamidomethylation was considered a constant modification. Match between runs within a 2 min RT window was enabled and both the peptide and protein FDR was set to < 0.01.
After replicate recall correlation analysis, log2 FC values were normalized by two component Gaussian mixture model-based normalization (
). Imputed data sets were only used for the moderated F-test and marker selection analysis (described below). This is noteworthy as exclusively expressed proteins may only be detected in a single cell type and would have been ignored for downstream analysis.
Top 20 Differentially Expressed Genes from Microarray
The top 20 differentially expressed genes were calculated by taking the 20 largest fold change values from one cell type compared with all the others (supplemental Table S2). For protein fold change mapping, duplicate gene names and genes not detected in this study were removed.
Differential Abundance Analysis
The normalized and imputed data set was subjected to a moderated F-test (Smyth, 2004), followed by Benjamini-Hochberg Procedure correcting for multiple hypothesis testing. We drew an arbitrary cutoff at adj. P val < 0.01. The heatmap shows the log2 fold change to the common reference and was clustered using one minus Pearson correlation metric (https://software.broadinstitute.org/morpheus/).
Hierarchical Clustering and Marker Selection
Hierarchical clustering and marker selection analysis was performed in Morpheus (https://software.broadinstitute.org/morpheus/). Clustering was performed on rows and/or columns indicated by the dendrogram for individual analyses, using one minus Pearson correlation as the metric. For Marker Selection analysis, the signal to noise metric using a one cell type versus all approach was performed. The number of permutations was set to 10,000 and only proteins with an FDR of less than 0.15 were kept, except for mast cells (MC), where there FDR cutoff to be included in the heatmap was 0.05 for size considerations.
Principal Component Analysis
Principal component analysis (PCA) was performed on an arbitrary cutoff of the 666 proteins (top 13%) with the highest F-statistics (highest FC, high precision). 15% is the cutoff typically used in Immgen related studies (
). To analyze gene sets contributing to the resultant projections, the loadings from PC1 and PC2 were analyzed using Gene Set Enrichment Analysis (GSEA v2.2.3, broadinstitute.org/gsea) with the c2.cp.v5.2.symbols.gmt[Curated] gene matrix. Enriched gene sets (adj. P val < 0.05) are displayed with their respective normalized enrichment score (NES).
Analysis of RNA/Protein Relationships
5125 gene products (collapsed to gene name) were identified in both the transcriptomics and proteomics and were used for all subsequent RNA/protein analyses. To calculate the cell type specific RNA/protein correlations, we performed Spearman's rank correlation on the ppm histone and microarray values (
For analysis of co-regulation, the Euclidean distance between all pairwise mRNA or protein (log2 FC values) measurements was calculated and hierarchically clustered. To visualize the discrepancies seen between mRNA co-regulation at the protein level, the protein ρ's were mapped in order of the mRNA clustering.
Expression ranks were calculated by ranking expression level of each gene product for each cell type using either the microarray data or the frINT values. The most abundant gene product was given a value of 1, the lowest, 5,125. Delta-rank scores were calculated by subtracting the protein rank from the RNA rank. Thus, large negative values suggest relatively high protein levels and low RNA levels. Large positive numbers, high RNA, low protein. We calculated the delta-rank score for every gene product for every cell.
Development of Low Input Proteomic Sample Preparation Methods
To adapt proteomics sample preparation protocols for low cell numbers directly from flow cytometry we focused on four main steps: efficient capturing of FAC-sorted cells, minimizing sample handling, improving sample multiplexing, and optimizing fractionation for low amounts of TMT-labeled peptides (< 10 μg) (Fig. 1). We concentrated on primary immune cells from mice as they are well-characterized at the transcriptome and single protein level with antibodies but are poorly studied at the proteome level because of their relatively poor availability from a small animal.
Because FAC-sorting and washing cells in microfuge tubes can lead to significant cell loss (
), we developed a simple-to-make collection microreactor that enables efficient capture and washing of FAC-sorted cells, as well as lysis and digestion of low cell numbers (Fig. 2A). We fabricated collection microreactors for facile collection and washing of FAC-sorted cells. After the final column-format washing of cells, i.e. not in batch format, cells are lysed, and proteins digested. The digest can then be directly transferred to a C18 Stage tip for sample desalting (
) (Fig. 2B). To compare the collection microreactors to standard centrifuge tubes we sorted 2e5 primary B cells directly into tubes or tips and compared the number peptides identified with either method. Collection microreactor tips outperformed tubes in distinct peptides identified in single shot, label free LC-MS/MS mode, as well as having greater total ion current (TIC), indicating greater overall peptide intensity (Fig. 2C–2D). Given the improvements in peptide IDs, together with the ease of washing and avoiding disruption of a non-visible cell pellet, we incorporated the collection microreactors into the low input proteomics workflow.
Isobaric labeling of peptides enables multiplexed peptide/protein identification and MS2 level quantitation for up to eleven samples using tandem mass tags (TMT) (
). In-solution TMT labeling requires an additional desalting step post-quenching, which often results in sample loss. To circumvent these sample handling steps we tested whether we could directly couple desalting and TMT labeling of small peptide amounts when adsorbed to the C18 resin (
). Two micrograms of whole cell lysate digest was labeled either in-solution in a microfuge tube or pre-adsorbed to C18 resin. On-column TMT labeling yielded a higher percentage of fully-labeled peptides (all possible primary amines coupled to TMT) and was more reproducible than the standard protocol (Fig. 3A). The on-column labeling method also yielded a 27% increase in peptide spectral matches (PSMs) compared with in-solution labeling (Fig. 3B). Finally, on-column TMT labeling avoided an extra desalting step and reduced the entire labeling protocol from about three hours to less than 10 mins (Fig. 3C). These results demonstrate that the on-column TMT labeling strategy is faster and more efficient for labeling low microgram levels of peptides.
To reduce sample complexity of the full proteome digest for deep proteome profiling, we explored Stage tip fractionation strategies for TMT-labeled peptide mixtures (
). Comparing strong cation exchange (SCX), C18 at pH 10 (bC18), or sulfonated polystyrenedivinylbenzene at pH 10 (bSDB) we found that fractionation improved the number of identified TMT-labeled peptides compared with gradient extension alone (Fig. 4A). Both basic reversed phase formats consistently identified more peptides than SCX fractionation (Fig. 4A). bSDB fractionation outperformed bC18 fractionation in number of PSMs, had higher precursor intensities for analytes identified in later fractions, and greater fraction uniqueness (Fig. 4A–4C). Comparison of retention times (RT) of peptides identified in their respective bC18 or bSDB step fractions compared with their on-line, acidic RT in LC-MS/MS of an unfractionated sample showed a more even distribution of peptides across fractions with bSDB (Fig. 4D). These data show bSDB is an orthogonal, well-suited method for fractionating of small amounts of TMT-labeled peptides.
Deep, Quantitative Proteomic Profiling of 3e5 FAC-sorted Immune Cells
Having developed a streamlined sample preparation protocol, we performed quantitative proteome profiling on FAC-sorted primary murine immune cells. 12 cell types were chosen spanning a range of abundance levels in the mouse (Fig. 5A). To mitigate large numbers of mice and long FACS times, we chose to collect 300,000 cells from each cell type in duplicate or triplicate. 3e5 cells yielded a range of protein amounts after urea lysis and digestion from about 4 μg for neutrophils (GN) and B cells to approximately 2 μg for T cell subtypes. To have a common reference connecting the separate TMT 10-plexes to enable cross-plex analyses, we created a sample comprised of a mixture of immune cell types (supplemental Table S1) prepared at the beginning of the study (
). This common reference sample was labeled with the TMT-131 reagent and was included in all TMT 10-plex experiments (Fig. 5A).
We performed the low input proteomics protocol described above on four TMT 10-plex experiments of ∼2 μg peptides per sample. On average 6427 protein groups (including all proteoforms) were identified in each 10-plex by two or more peptides per protein (peptide and protein FDR < 1%); 7023 protein groups were identified across all samples (Fig. 5B and supplemental Fig. S1A). After performing correlation analysis across biological replicates (defined here as starting from different mice) of the log2 transformed protein ratios of the individual samples to the common reference, we excluded any cell type replicates with a Pearson r < 0.5, except for B1a cells as only duplicates were provided (supplemental Fig. S1B). The median and mean Pearson r across the included biological replicates were 0.90 and 0.87, respectively (Fig. 5C). A recent study from our laboratory showed the median and mean Pearson's r between preparation replicates for high inputs were both 0.9, (
) indicating our low input data is of high quality, comparable to standard sample preparation methods.
To test if protein measurements recapitulated known expression patterns of immune cell types, we mapped the protein fold change values for the top 20 differentially expressed mRNAs for each cell type ((
) and Immgen.org) (supplemental Table S2). We found that most canonical markers of specific cell types had expected and exclusive abundances (e.g. CD8a/b in CD8+ T cells) or near exclusive abundances (e.g. CD19 in B and B1a cells, and CD4 in CD4+ T cells and Treg cells) (Fig. 5D).
Identification of Differentially Abundant and Immune Cell Type-Specific Proteins
To identify differentially abundant (DA) proteins between different cell types, we performed a moderated F-test across all biological replicates. 4241 protein groups were found to be DA in at least one cell type with a false discovery rate (FDR) < 0.01 ( Fig. 6A and supplemental Table S3). Principal component analysis (PCA) followed by gene set enrichment analysis (GSEA) was performed to visualize how the cell types clustered and which pathways drove their separation (Fig. 6B). When the top 13% (666 proteins, arbitrary cutoff) DA proteins were projected into PCA space there was high overlap between biological replicates, and cell types known to be similar (e.g. lymphocytes versus myeloid, T cells versus B cells, etc.) clustered together. GSEA of the loadings for PC1 and PC2 revealed eleven and eight pathways, respectively (adjusted p value <0.05), driving the separation. The gene sets separating the cell types along the PC1 axis showed enrichment for T cell receptor (TCR) and B cell-related pathways in the positive direction, whereas pathways associated with extracellular matrix remodeling were enriched in the negative direction. For PC2, gene sets involved in B cell receptor (BCR) signaling and the innate immune system were enriched in the positive direction, whereas TCR signaling was enriched in the negative direction. These data show that immunologically relevant pathways drive the differences between cell types at the protein level, demonstrating the fidelity of low input proteomics.
To identify proteins that may better distinguish closely related cell types, we employed data imputation (supplemental Table S4) followed by marker selection analysis, comparing one cell type to all others in an iterative fashion (
). Marker selection analysis displayed high cell-type specificity for DA proteins, with γδT cells showing the fewest markers, MCs the highest (Fig. 6C and supplemental Table S5). This analysis identified distinct plasma membrane-annotated proteins (Uniprot.org) between molecularly similar cell types that could provide alternative or additional markers (supplemental Fig. S2 and supplemental Table S6). For example, GHDC, LMBRD1, and region 102 of Ig heavy chain V distinguish B cells from B1a cells (both CD19+); or CD5, CNNM4, LANCL2, and SELH for CD4+ T cells from Tregs (both CD4+) (Fig. 6D). These data demonstrate that low input proteomics recapitulates many known aspects of immunology and may provide valuable new insights for future immunological studies.
Protein/RNA Relationships Reveal Evidence for Post-transcriptional Regulation Across the Immune System
We performed FACS for this study based on Immgen's protocols and gating strategies (
). Using collapsed protein groups (proteoforms from genes with the strongest evidence at the peptide level), we had overlapping measurements for 5,125 gene products. To test the appropriateness of investigating protein/RNA relationships, we calculated the Spearman's rank correlation coefficient (ρ) for all gene products for each individual cell type. We found all cell types had a correlation coefficient in agreement with previous reports for unperturbed cells, 0.25 < ρ < 0.50 (Fig. 7A) (
To find evidence of post-transcriptional regulation across the immune system, we asked whether we could see global trends in altered protein/RNA levels by looking for co-regulation changes between mRNA and protein (
). Clustering of the ρ values for RNA expression showed blocks of positive correlations, strongly suggesting these transcripts are co-expressed across immune cell types (Fig. 7B). Performing the same analysis in log2 FC protein space, we again could see blocks of positive correlations, though the overall data structure between the RNA and protein was markedly different (Fig. 7C). To visualize and better contrast the differences between mRNA and protein co-regulation, we reorganized the matrix of protein ρ's using the ordering derived from hierarchical clustering of the mRNA ρ matrix. This mRNA-ordered, protein co-regulation matrix showed a gross loss of off-diagonal blocks of positive correlations (Fig. 7D), implying protein co-expression is governed by distinct processes from those underlying mRNA co-expression in primary immune cells.
To further investigate protein/RNA relationships, we next asked which pathways show evidence for post-transcriptional regulation across the immune cells analyzed. We calculated the Pearson correlation coefficient (r) for each gene product individually across all cell types using mean mRNA levels and mean relative protein abundances. GSEA of the genes' r values identified nine gene sets with a p value of < 0.01 (Fig. 7E). Five of the gene sets were enriched for positive protein/RNA correlation, whereas four were enriched for negative protein/RNA correlation. Among the positively-correlated gene sets, we found gene products within the cell cycle gene set among the positively correlated protein/RNA levels, consistent with previous studies (
). We also found a BCR signaling pathway gene set to have positively correlated protein/RNA patterns across the 12 immune cell subtypes (Fig. 7E). On the other hand, the negatively-correlated gene sets included Toll-like receptor (TLR) and MAPK signaling pathways, suggesting that post transcriptional regulation may play a role in the innate immunity response.
To gain insight into the directionality of the negatively correlated TLR pathways, i.e. high protein levels/low RNA levels or vice versa, we rank-ordered the abundance level of each gene product within each cell type and calculated the difference in rank (delta-rank) between RNA and protein. First, to estimate relative inter-protein levels we took the summed precursor intensities for peptides belonging to each individual protein and scaled the individual TMT channel contributions to find how each sample contributed to the overall protein level (
) (supplemental Fig. S3A). These “fractional intensity” (frINT) measurements showed high agreement with MaxQuant's MaxLFQ, a widely-used label free MS1-based quantitation software package for proteomic data analysis (
) (supplemental Fig. S3B). After applying frINT to our immune cell dataset we rank ordered the protein and RNA levels for each individual cell type, where 1 was the most abundant gene product, 5125 the least abundant, for the delta-rank analysis. Clustering analysis of the delta-rank values for all gene products detected in the three TLR-related gene sets with negative RNA:protein correlations about one-third showed little to no change in the delta-rank changes (delta > 2000, e.g. the change in rank was not at least 2000 in either direction) (Fig. 7F). We were also able to identify subclusters of gene products that suggest low RNA/high protein levels (large, positive values, pink) or low protein/high RNA levels (large, negative values, blue). Some gene products also showed cell type specific delta-ranks, suggesting different immune cell types regulate specific gene products to establish their respective proteomes.
Proteome profiling offers new levels of information about cellular identity and regulation that can be missed with genomics or transcriptomics alone. The generation of genome-wide protein measurements, however, is often limited by the amount of input needed for deep and high-quality proteomic profiling. Here, we show improved sample preparation strategies for MS-based proteomic analysis of low cellular inputs from FAC-sorted cells. Using these low input proteomics methods, we were able to quantify over 7000 protein groups analyzing half of the 300,000 freshly isolated mouse immune cell-type collected directly from FACS.
The sensitivity of our low input methodology derives from a combination of three main improvements: fabrication of collection microreactors for easy cell collection and digestion; on-column TMT labeling coupled with desalting for faster and more efficient peptide labeling; and optimized small-scale fractionation of TMT-labeled peptides. The microreactors provided a versatile cell capture system with several benefits over microfuge tube or in-StageTip (
) collection including the ability to wash cells in a column-like format without the need to see a hardly visible pellet, smaller lysis and digestion volumes, and a filter which can prevent StageTip clogging. On-column labeling of peptides with TMT reduces the time and number of sample handling steps needed for conventional in-solution labeling because the desalting and labeling occur on the same resin. On-column labeling also was more efficient specifically for fully TMT-labeled spectra for low microgram inputs compared with in solution, even without a desalting step prior to on-column labeling (
), the on-column protocol reported here takes less time and has been optimized for 1000-fold less material. Finally, optimizing small-scale fractionation for TMT-labeled peptides allowed improved depth of coverage for low input proteomics. Previous work has shown that for medium levels inputs (between 5 and 50 μg peptide per state) of unlabeled peptides, SCX outperformed basic reversed-phase (bRP) chemistries in number of peptide identifications (
). Our work here shows that both bRP modalities tested outperformed SCX, likely because of fact that TMT labeling converts primary amines into amides, dampening the positively charged character of tryptic peptides. Together, the reduced sample handling steps, combined with efficient TMT labeling and off-line, tip-based fractionation, enabled the deep, quantitative proteomic profiling of 300,000 murine primary immune cells isolated directly from FAC-sorting.
We applied our low input proteomic-sample preparation to 12 freshly isolated murine immune cell types sorted according to Immgen protocols. To our knowledge, this study provides the first proteome-wide characterization of the major immune cell types from their resident tissues rather than peripheral blood. Over 7000 proteins were quantified with high reproducibility as seen by the high replicate recall between mouse replicates. The median Pearson correlation between mouse replicates acquired using low input proteomics protocols matched that of the high input, preparation replicates used routinely by our laboratory for longitudinal quality assessments (Fig. 5C) (
Differential analysis of protein abundances found nearly two-thirds of proteins were DA in at least one cell type. This high proportion was expected as even pairwise comparisons of vastly different cell types at the RNA level reveal large differences genome-wide (
). When projected into PCA space, we found that the relevant cell types clustered together, and that the clustering was driven by expected pathways, such as BCR and IL12 pathways for B cells, and TCR signaling for T cells (Fig. 6B). This analysis also found that MF, GN, and to a lesser extent MCs, had low relatedness through PC1 to the other cell types, and that proteins associated with extracellular matrix remodeling were the main drivers of this separation (
). Differentiating cell types based on DA proteins showed better discrimination between closely related cell types than protein levels chosen by their differential mRNA expression. For example, most protein products determined by their differential mRNA levels in B and B1a cells showed little difference between the two cell types (Fig. 5D). A similar situation was seen between T4 and T8 cells, and especially for PTPRC (CD45), which was present in high amounts across NK cells and all T cell subtypes. From the marker selection analysis, we were able to nominate several plasma membrane annotated proteins that could provide additional or alternative markers to distinguish B cells from B1a cells, or T4 from Tregs (Fig. 6D). Consistent with these results, CD5 and LANCL2 have established roles in Treg activity (
). To establish evidence for post-transcriptional regulation genome-wide, we looked for changes in co-regulation at the protein or RNA levels. Recent work by Kustatscher and colleagues has shown that transcriptional co-regulation is largely driven by proximity to other actively transcribed genes on their respective chromosome (
). The phenomenon of proximal genes being co-regulated is commonly observed in topologically associated domains and gene expression neighborhoods molecularly characterized by 3D-interaction mapping of genomic DNA (
). We looked for mRNA and protein co-regulation by calculating and clustering the p‘s for the 5125 gene products with overlapping measurements. Both mRNA and protein data showed respective and distinct co-clustering blocks, suggestive of co-regulation across the immune cell types profiled in this study (Fig. 7B–7C). Rearranging the protein p‘s in the order determined by hierarchical clustering of the mRNA data revealed a distinct loss in data structure and off-diagonal correlations (Fig. 7D) suggesting primary mouse immune cells show a degree of buffering transcriptional co-regulation to the desired protein landscape.
The low input proteomics described here extend the applicability of previous methodologies to FAC-sorted immune cells and highlight the value of reduced-handling sample preparation, especially those requiring analysis of limiting amounts of sample (
). Other applications of the methods we describe include profiling of other cell types, co-immunoprecipitation studies, and post-enrichment labeling. We envision these methods will enable researchers to perform proteomic experiments of scarce samples previously out of reach.
We thank Christophe Beniost (Harvard Medical School) and the Immgen Consortium for providing the samples and experimental input. We also thank Shankha Satpathy, Inbal Benhar, Bea Hamilton, Filip Mundt-Suger, Namrata Udeshi, Philipp Mertins, and Egle Kvedaraite for useful discussions; and to Susan Klaeger, John Ray, and Jacob Jaffe for critical assessment of the manuscript.
HHS | NIH | National Cancer Institute (NCI) https://dx.doi.org/10.13039/100000054 1U24CA210986–01 Carr Steven A.
* This work was supported by grants 5U24CA210986 and U01CA214125 (S.A.C.) and 5U24CA210979 (D.R.M.) from the US National Institutes of Health as part of the National Cancer Institute Clinical Proteomics Tumor Analysis Consortium Initiative.
This article contains supplemental Figures and Tables.
1 The abbreviations used are:
fluorescence-activated cell sorting
laser capture micro-dissected
extracted ion chromatogram
total ion current.
Author contributions: S.A.M. and S.A.C. designed research; S.A.M., A. Rhoads, A.R.C., and L.D.S. performed research; S.A.M., A. Rhoads, and K.R.C. contributed new reagents/analytic tools; S.A.M., R.P., A.L.H., K.K., and D.R.M. analyzed data; S.A.M., R.P., A.L.H., K.K., and S.A.C. wrote the paper; O.R.-R., N.H., and A. Regev supervised research.