Novel Proteome Extraction Method Illustrates a Conserved Immunological Signature of MSI-H Colorectal Tumors

The TOP method is a simple, robust, environment friendly proteome extraction method, with decreased fixation time bias. Using the TOP method, we analyzed by LC\MS-MS a clinical cohort of microsatellite stable (MSS) and unstable (MSI-H) colorectal carcinoma (CRC). An MSI-H specific, STAT-1 centric, immunological signature was identified. In-vitro experiments connected this signature to long, but not short exposure, to Interferon-g. Our data provides in-depth view of the MSI-H immunobiology and suggests that the roles of STAT proteins are context dependent.


In Brief
The TOP method is a simple, robust, environment friendly proteome extraction method, with decreased fixation time bias. Using the TOP method, we analyzed by LC\MS-MS a clinical cohort of microsatellite stable (MSS) and unstable (MSI-H) colorectal carcinoma (CRC). An MSI-H specific, STAT-1 centric, immunological signature was identified. In-vitro experiments connected this signature to long, but not short exposure, to Interferon-g. Our data provides in-depth view of the MSI-H immunobiology and suggests that the roles of STAT proteins are context dependent.

Graphical Abstract
Using a simple, environment friendly proteome extraction (TOP), we were able to optimize the analysis of clinical samples. Using our TOP method we analyzed a clinical cohort of microsatellite stable (MSS) and unstable (MSI-H) colorectal carcinoma (CRC). We identified a tumor cell specific, STAT1-centered, immune signature expressed by the MSI-H tumor cells. We then showed that long, but not short, exposure to Interferon-g induces a similar signature in vitro. We identified 10 different temporal protein expression patterns, classifying the Interferon-g protein temporal regulation in CRC. Our data sheds light on the changes that tumor cells undergo under long-term immunological pressure in vivo, the importance of STAT proteins in specific biological scenarios. The data generated could help find novel clinical biomarkers and therapeutic approaches.
Formalin fixation is the standard in clinical routine for more than 100 years. Indeed, one of the major breakthroughs in molecular pathology was the effective extraction of DNA and then RNA from formalin-fixed and paraffin-embedded (FFPE) clinical tissues. Although some progress was made to establish a robust protein extraction method (1,2), the issue was found to be complex (3). One of the major inherent problems is the huge variability of tissue fixation time in the clinics. In the clinics, it is adequate to fix tissues for 6 h to 72 h (4). This creates a huge bias for traditional proteomics pipelines, as protein recovery seems to decrease with prolongation of fixation time, with yields dropping by ;20% for every 24 h in formalin (5). Thus, a robust, efficient, fixation-time impervious protein extraction method will help to propel clinical molecular proteomics forward.
Colorectal cancer (CRC) is a common cancer and one of the leading causes of cancer related death (6). In CRC genomic instability seems to be a key developmental factor.
The MSI-H tumors, also known as mismatch repair deficient (MMRd), develop because of a deficiency of the MMR proteins. The MMRd might result from epigenetic silencing ("sporadic") or by constitutional mutations ("genetic") (8). In addition, the MSI-H tumors are typically characterized by a high rate of single-nucleotide substitutions (9). On average, these tumors display a high mutation rate of 12-40 mutations/Mb (7). As a result, MSI-H tumors can produce abnormal proteins that might act as neo-antigens and trigger an effective immune response against the tumor (10). There is an increasing body of evidence that shows a unique and complex interplay between the MSI-H tumor cells and the immune microenvironment. Although much is unknown, it has been shown that MSI-H tumors show an immune active tumor microenvironment with high levels of several immune checkpoint proteins (11).
Recently, the Food and Drug Administration (FDA) granted accelerated approval to Pembrolizumab (Keytruda) for the treatment of MSI-H (MMRD) solid tumors arising from different organs, thus granting the first histology agnostic treatment approval. This was based on data collected from several KEYNOTE clinical trials that enrolled MSI-H cancer patients irrespectively of the tumor origin. Thus, clinically and biologically MSI-H CRC might be considered as enriched for PD-1 immune checkpoint inhibition responsive, whereas MSS CRC as unresponsive.
The JAK-STAT axis is one of the most important pathways in cancer (12). Specifically, the pleiotropic transcription factors STAT1/3 appear to have a complex and conflicting role in CRC (13). In general, STAT1 is assumed to be a tumor suppressor (13). However, although it was shown to correlate with a better prognosis (14,15), it was also claimed to be associated with shorter survival and poor outcome, particularly of the microsatellite instable subtype (16). Moreover, some support the concept of STAT3 as an oncogene, whereas others as a tumor suppressor (13). Contradictory reports were published, with some who describe the oncogenic effects of STAT3 [17][18][19][20], whereas others the positive outcome (15,21).
Here we investigated the proteomes of MSI-H versus MSS CRC using a novel clinical proteome pipeline. We have found that the proteins overexpressed by the MSI-H, but not the MSS tumors, displayed a dense protein network. The network was highly enriched for immunological processes, and specifically for antigen processing and presentation. Interestingly, only STAT1 was overexpressed by the MSI-H tumors, and only STAT1 showed a strong connection to immunerelated processes. Furthermore, we managed to reproduce some of these changes in vitro by long-, but not short-, exposure to Interferon-g. We continued and investigated the temporal proteomics of Interferon-g exposure. Our data sheds new light on the basic immunobiology of MSI-H CRC.

MATERIALS AND METHODS
Xylene-Mediated (Traditional) Proteome Extraction-The traditional protein extraction was done according to the Mann's laboratory manuscripts (22,23). Xylene-mediated deparaffinization was accomplished by a 1 ml xylene incubation at 37˚C for one hour. After vortexing and 5 mins centrifugation at 20,000 3 g, xylene was removed. A second incubation with fresh xylene for 30 min at 37˚C followed, again ensued by vortexing, centrifugation and removal of xylene. Then, a wash with 1 ml of absolute ethanol was done for 5 min at room temperature. After steps of vortexing and centrifugation like above, ethanol was removed. Ethanol wash was repeated one more time. Ethanol was discarded and remaining pellets were air-dried for 30 min upside down.
For protein recovery, pellets were resuspended in 0.1 M Tris pH 7.5. Subsequently, samples were sonicated at level 5 for 3 min on ice (Fisher Scientific, Massachusetts). After addition of SDS (SDS) to a concentration of 4% and a total buffer volume of 150 ml, samples were heated up to 95˚C for one hour in a dry bath (Elite EL-02, MS major science, New York). Following 10 min centrifugation at 20,000 3 g, supernatants were transferred to new tubes and stored at 270˚C. All reagents are from Bio-Lab, unless stated otherwise.
TOP Proteome Extraction-Deparaffinization was conducted using mineral oil (Ventana, arizona). Sections were incubated in 1 ml of mineral oil at 90˚C for 30 min with agitation. Then, 130 ml of 0.1 M Tris pH 7.5 buffer was added. The tubes were rigorously mixed and left for 5 min to start tissue rehydration, then centrifuged at 20,000 3 g for 5 min. Mineral oil (the upper phase) was removed carefully. Like the xylene-mediated deparaffinization, the samples were sonicated and SDS was added. Samples were heated up for 5 min at 121˚C (;15 psi) in an autoclave (Wet cycle; Tuttnauer, Jerusalem, Israel). Subsequent centrifugation, transfer of supernatants and storage were performed as stated above.
Clinical Cohort and Proteome Extraction-A colorectal carcinoma tumor cohort was collected under ethical approval of the Tel Aviv Sourasky Medical Center ethics committee (supplemental Table  S1). 19 surgery samples from 8 MSI-H and 9 MSS different patients (including two biological duplicates from 2 MSS patients). Blocks were sliced to consecutive 8 mM sections with a microtome (RM2265, Leica Biosystems, Wetzlar, Germany). Macro-dissection was done to ensure a minimum of 50% tumor cells in the sample. The macro-dissected material was moved into 1.5 ml. Eppendorf tubes. The proteome was extracted using TOP extraction (see above).
Proteome of Cell Line-As in the western experiments, 0.5 3 10 6 DLD1 cells were seeded in 6-well plate for a week. Then, Interferon-g was immediately added for 1-week group, or after 6 days it was added for the last day in the 1-day group. Control cells cultured without Interferon-g. The experiment was done in quadruplicates. At the end of experiment, cells were washed with 0.9% NaCl and lysed with 100 mM Tris pH7.5, 4% SDS. The protein extracts were kept at 270°C until analysis.
Sample Preparation for Label-Free Mass Spectrometry-Samples were subjected to in-solution tryptic digestion using a modified filter aided sample preparation protocol (FASP) (24). All chemicals were sourced from Sigma Aldrich unless stated otherwise. 200 ml of UB were added to the filter unit and centrifuged again. Proteins were alkylated using 10 mM iodoacetamide (Sigma) for 45 min, at room temperature in the dark. Filters were washed twice with 50 mM ammonium bicarbonate followed by trypsin digestion. Trypsin was added and samples were incubated at 37°C overnight. At the next day, trypsin was added again for 4 h. Digested proteins were then spun down, acidified, desalted, and then stored in 280°C until analysis.
NanoLC MS ESI MS/MS Analysis-ULC/MS grade solvents were used for all chromatographic steps. Each sample was loaded using split-less nano-ultra performance liquid chromatography (Waters). The mobile phase was: (a) H 2 O 1 0.1% formic acid and (b) acetonitrile 1 0.1% formic acid. Desalting of the samples was performed online using a reversed-phase C18 trapping column (180 mM internal diameter, 20 mm length, 5 mM particle size; Waters). The peptides were then separated using a HSS T3 nano-column (75 mM internal diameter, 250 mm length, 1.8 mM particle size; Waters). Peptides were eluted from the column into the mass spectrometer using the following gradient: 4% to 35% B in 150 min, 35% to 90%B in 5 min, maintained at 95% for 5 min and then returned to initial conditions.
The nanoUPLC was coupled online through a nanoESI emitter (10 mM tip; New Objective, Woburn, MA, USA) to a quadrupole orbitrap mass spectrometer (Q Exactive Plus, Thermo Scientific) using a Flex-Ion nano spray apparatus (Thermo Scientific).
Data were acquired in data dependent acquisition (DDA) mode, using a Top20 method. MS1 resolution was set to 70,000 (at 400 m/z), automatic gain control (AGC) to 3e 6 and maximum injection time was set to 20 msec. MS1 isolation window was set to 1.6 mass units. MS2 resolution was set to 17,500, AGC to 1e 6 and maximum injection time of 60 ms.
Protein Identification and Quantification-Raw data were processed using MaxQuant (MQ) version 1.6.5.0 (25) and the embedded Andromeda search engine (26). Data were searched against the human sequences in Swissprot human proteome version 2017_01 (20299 entries). Precursor mass and fragment mass were searched with mass tolerance of 4.5 and 20 ppm, respectively. The search included variable modifications of oxidation (methionine) and protein N-terminal acetylation, and fixed modification of carbamidomethyl (cysteine). Enzyme specificity was set to trypsin, a maximum of two miscleavages were allowed, and a minimal peptide length was set to 7 amino acids. The false discovery rate (FDR) for peptide and protein identifications was 0.01. For the conventional versus TOP proteome extraction the defaults were kept, except: deamidation (NQ) was added as variable modification, and match between runs was not enabled.
The bioinformatics was performed on Perseus suite (version 1.6.2.3). The data were filtered for reverse, contaminants and identified by site. Then the data were filtered such that a protein had to have none-zero LFQ intensity in all 19 samples with 3 or more peptides, unless stated otherwise. Gene Ontology annotation performed using STRING site version 10.5.
Cells-DLD1 and RKO cells were cultured in DMEM (biological industries) supplemented with 10% FCS, 2 mM Glutamine and antibiotics. Interferon-g (Peprotech) was added to the medium as indicated.
Western Blotting-On day of experiment, 0.5 3 10 6 DLD1 and RKO cells were seeded on 6-well plates, treated with Interferon-g and incubated at 37°C in humidified atmosphere containing 5% CO 2 . For concentration scale experiments, cells were incubated with different concentrations of Interferon-g (1, 5, 10, 20 and 40 ng/ml). For time scale experiments, cells were incubated with 10 ng/ml Interferon -g for 15 min, 1 h, 6 h and 24 h. At the end of experiment, cells were briefly washed with 0.9% NaCl and lysed with lysis buffer (100 mM Tris pH7.5, 5% SDS). For Western blot analysis, cells were sonicated and centrifuged at 12,000 3 g for 10 min. Protein extract was subjected to 10% SDS/PAGE and transferred to nitrocellulose membranes. After blocking in 0.15% gelatin (Sigma), blots were incubated with primary antibodies (Cell Signaling) for: Total STAT1 (9172), and Total STAT3 (8019) for overnight, at 4°C. Blots were washed in TBS 2 0.2% Tween20 (TBS-T) three times, and incubated for one hour at room temperature with HRP-conjugated secondary antibody diluted in 0.15% gelatin. Bands were visualized by ECL, and detected using C600 camera (Azure, Dublin, CA).
Experimental Design and Statistical Rationale-For the LC-MS/ MS analysis comparison of TOP method to the conventional proteome extraction we used three independent biological replicate experiments. The tissue after 2 days fixation served as a control.
For the LC-MS/MS analysis comparison of the MSS and MSI-H proteome, we used 19 independent biological extractions. The MSS group served as a control.
For the LC-MS/MS analysis of DLD1 exposure to Interferon-g we used four independent biological replicate experiments. The DLD1 cells without Interferon-g exposure served as a control.
The bioinformatics was done on Perseus suite (version 1.6.2.3). The significantly enriched proteins were found (two-sample t test with a permutation-based FDR method) and further selected using an adjusted p value ,0.05. Log2-transformed individual values or triplicate means were z-score-normalized prior to hierarchical clustering. Gene Ontology annotation performed using STRING site version 10.5, and the statistical values by the site have been reported.

RESULTS
Optimization of FFPE Proteome Extraction-Formalin fixation times vary considerably in routine clinical setting (4). One of the consequences is the deleterious impact of extended fixation time on the amount of protein extracted from clinical FFPE samples, which influences the proteomic coverage achievable by MS analysis (5,27). Therefore, we aimed to minimize the fixation length effects.
To enhance the extraction, we changed several steps in the proteome extraction: (1) Deparaffinization using mineral oil excluded toxic organic solvents and multiple washes. (2) Elevated temperature and pressure (121˚c, 15 psi) to aid the protein extraction process. We named our proteome extraction TOP (Temperature-Oil-Pressure) method.
We compared the conventional and the TOP proteome extraction using mass spectrography. We used a set of matched tissues fixed in formalin for 2 or 8 days, representing clinically adequate or over-fixation, respectively. This created four groups named conventional-2d, conventional-8d, TOP-2d and TOP-8d.
First, MQ iBAQ identified 5240 protein groups, but filtering for site, reverse, contaminants, and protein with at least 3 peptides left us with 3988 proteins. We identified a comparable number of proteins in the four groups ( Fig 1A). Then, we studied the Pearson's correlation (all 3988 proteins identified). The analysis pointed out that the conventional-8 days' group showed less correlation to the other groups (Fig. 1B).
Because the Conventional-8day groups showed slightly decreased amounts of identified proteins, we wanted to determine if this was the cause of the decreased correlation. Thus, we repeated the same analysis but with the proteins identified in all 12 samples (2350 proteins). As before, the proteomes of the Conventional-8day group showed the lowest correlation. Importantly, the proteomes of this group showed a high correlation to each other, suggesting that there was a consistent bias in this group (Fig. 1C).
Unsupervised Clustering of CRC TOP Samples Follows CRC Biology-We decided to test our improved TOP proteome extraction method on clinical formalin-fixed paraffin-embedded samples. We focused on microsatellite stable (MSS) and unstable (MSI-H) colorectal carcinoma. The colorectal TCGA study clearly showed that the MSI status has a profound impact on the tumor biology (7), and the FDA granted the first tissue agnostic approval for immunotherapy treatment in MSI-H patients (28,29). Surprisingly, the TCGA proteomics effort identified five proteomic subtypes in the TCGA cohort, with only weak association to MSS/MSI-H groups (30,31).
To investigate this point, the proteomes of 19 samples were extracted by the TOP method, eight of which were MSI-H (42%). We identified 4350 proteins but continued with only 1664 proteins that were identified in all 19 samples. Our clinical cohort showed a high Pearson's correlation between the samples (supplemental Table S1 and supplemental Fig.  S1), with Pearson's correlation of ;0.98 for biological replicates. This supports the notion that the TOP proteome extraction produces highly consistent proteome results from clinical FFPE samples.
Importantly, in agreement with the original TCGA publication (7), an unsupervised hierarchical clustering of the cohort shows two major clusters (Fig. 2). One of the main clusters contained the MSI-H samples (7 out of 8), whereas the other contained the MSS samples (8 out of 11). We noted no MSI-H in the MSS cluster, and vice versa. Additionally, the two biological duplicates (different areas of the same tumor) cluster tightly together (Samples 750 and 30497; Fig. 2), highlighting again the reproducibility of the pipeline.
These results suggest that when optimized, clinical modern proteomics can produce consistent, high quality data and that the tumor proteome reflects the MSI status.  Table S2).
Using the STRING v11 database, we investigated the connections of the proteins (32). We found no statistically significant enrichment in the 35 proteins overexpressed by the MSS tumors. In sharp contrast, the STRING analysis found a highly significant connection enrichment in the MSI-H overexpressed proteins ( Fig. 3B; p-Value , 10 216 ). Furthermore, ;33% of the proteins (22 out of 65) were found to be linked to immune response processes (Fig. 3C), and specifically to the Interferon-g-mediated signaling pathway (Fig. 3D) and antigen processing and presentation (FDR 7.9e 28 ). In agreement, the three top ranked KEGG enrichment pathways were the proteasome (8 proteins), the phagosome (10 proteins) as well as antigen processing and presentation (8 proteins) (FDR p-Value 3.2e 210 , 4.06e 29 and 4.06e 29 , respectively; full GO and KEGG lists are in supplemental Table S2).
Interestingly, the list included signal transducers and activators of transcription 1 (STAT1), a key transcription factor in many immunological processes, including Interferon-g signaling (see below) (33)(34)(35). It is important to remember that this correlation is to the total amount of STAT1 and not to the phospho-form (active) of STAT1 protein.
Taken together, we show that the proteins overexpressed by MSI-H tumors are highly enriched and related to the immune system. Specifically, we found a significant elevation of STAT1 levels, and proteins related to MHC-II and antigen presentation.
MSI-H Colorectal Tumor Cells, and Not the Stroma Cell, Aberrantly Express MHC-II Receptors-Our proteomics data lacks morphological annotation, and it was not clear which component in the tissue overexpresses these proteins, the tumor or the nonmalignant stromal component that includes many immune cells (36)(37)(38).
To differentiate between the possibilities, and to validate our results, we analyzed the spatial distribution of several protein candidates by immunohistochemistry (IHC). MHC-II proteins were chosen as they are not commonly expressed by epithelial cells, and because MSI-H tumors cells have been reported to be positive (39). We analyzed five MSS and four MSI-H tumors, including one MSI-H tumor that our proteomic analysis predicted it to be with medium MHC-II expression (sample 6288; Fig. 4A).
Our IHC assay confirmed that the stromal lymphocytes exhibit HLA-DPB1, HLA-DQB1 and CD74 expression in both MSS and MSI-H tumors, acting as an internal positive control. However, tumor cells with diffuse, cytoplasmatic and membranous positivity were found only in MSI-H samples (Fig. 4).
The IHC results indicate that the immunological signature is driven mainly by the tumor cells. Furthermore, it highlights a robust expression of MHC-II, the ligand of the immune checkpoint LAG-3, by the MSI-H tumor cells (40,41).
Only STAT1 Levels Are Linked to Interferon-g Signaling-The human STAT protein family has 7 members (42). Upon Interferon-g exposure phospho-STAT1 and STAT3 are increased in amount, translocate to the cell nucleus where they act as transcription activators (43,44). However, information about the connection of total STAT to biological processes is mostly unknown (13,44). To identify the proteins that follow the different STAT proteins expression pattern we computed a Pearson's correlation matrix at the individual protein level (1664 proteins).
STAT3 was reported to mediate Interferon-g activation and immune checkpoint expression (45,46). Using the same correlation matrix, we found 54 proteins that clustered around STAT3. STRING analysis found interaction enrichment between them (enrichment p-Value 2.5e 205 ) (supplemental Fig. S2). However, the biological processes related to STAT3 appear to be different from STAT1's, and no correlation to immunological pathways were noted (supplemental Table S3).
To expand our understanding of STAT proteins related signatures, we created a correlation matrix with 2345 proteins (quantified 70% of samples). Using this correlation matrix, we were able to investigate also STAT-5 and -6 protein clusters. Although STRING analysis found statistically significant interactions within the STAT5 and STAT6 clusters (p-Value ,1e 216 and 1.1e 214 , respectively), they were linked to biological processes, which were very different of STAT1's immune related processes (supplemental Fig. S3; and supplemental Table S4).
We conclude that the correlation matrix of protein levels contain abundant data about important biological processes in CRC. We showed that each of the four STAT protein expression is associated with different proteins and biological processes. Significantly, our data suggest that in vivo only STAT1 protein levels appear to correlate with the same immunological processes that differentiate MSI-H from MSS tumors.
Long, but Not Short Exposure to Interferon-g Differentiates STAT1 from STAT3-Although Interferon-g is a key factor in MSI-H tumors (39,47,48), the changes Interferon-g induces in CRC in the proteomic level are still poorly understood.
First, we cultured DLD1 and RKO, both micro-satellite instable, human CRC cell lines, in the presence of increasing amounts of Interferon-g (supplemental Fig. S4). Interestingly, we observed an increase in the phospho-and total-amounts of STAT1 and STAT3. We decided to continue with 10 ng/ml (50 units/ml).
Because of the data mentioned above, we investigated the temporal proteomic changes Interferon-g induces in cancer cells, focusing particularly on the long-term changes. To gain a systematic view on the subject, we analyzed the proteomes of DLD1 exposed to Interferon-g for a day, a week, and of control cells. Using label-free shotgun proteomics we quantified 3771 proteins in all samples (with at least 3 peptides), among them STAT1 and 3.
We computed the Pearson's correlation between the samples. We noted that the control and the IFN-g 1-week groups showed the biggest difference, with the INF-g 1 day group in between (Fig. 6B). This suggested that the proteomes of the three groups are different, and that there are significant temporal related changes. Indeed, unsupervised primary component analysis (PCA comp. 1) matched the experimental design and identified 3 groups (Fig. 6C). We found that 9 proteins contributed the most to the PCA (score of more than 1; supplemental Table S5). Significantly, STAT1, TAPBP, TAP-2, HLA-A and HLA-C are among them. STRING analysis found significant interaction enrichment between them (enrichment p-Value 1.1e 211 ). Moreover, antigen processing and presentation processes were highly enriched (supplemental Table S5), resembling the biological processes found to be enriched in the MSI-H versus MSS tumors.
Finally, we again used a Person's correlation matrix to investigate which proteins follow the STAT proteins' behav-ior. We found that the four STAT proteins clustered to one major cluster (containing 277 proteins out of 3771 tested), providing some indication that there is a connection between the STAT proteins. However, STRING reported no significant interaction in the proteins clustered around STAT2, 3 and 6 (supplemental Table S6), suggesting that they are not well connected to the changes Interferon-g induces over time. In sharp contrast, STAT1 showed a highly enriched network (supplemental Table S5) (PPI enrichment p-value ,10 216 ). Importantly, the proteins correlated to STAT1 were highly enriched for biological processes such as Interferon and antigen processing and presentation (supplemental Table S6).

Interferon-g Exposure Triggers Complex Temporal Proteome
Changes-Our previous findings suggested that proteins levels changed differently over time. Thus, we investigated all of the proteins showing a statistically significant change between any two groups (multiple-sample test, ANOVA FDR = 0.01, S = 0.1). Strikingly, we found that 16.1% of the proteome (608/ 3771 proteins) was expressed differentially.
Hierarchical clustering of these proteins clearly displayed the temporal changes and highlighted a complex temporal behavior. Two main clusters of proteins were identified, 343 and 265 proteins in each cluster ( Fig. 7A; supplemental Table  S7). However, within these clusters a complex pattern of expression can be seen. DISCUSSION Today's proteomics studies have the capacity to capture and analyze the proteome in great depth and accuracy (51). Indeed, significant improvements in the MS machines (52,53) and label-free quantification bioinformatics have been made (54). However, the usage of formalin-fixed paraffin embedded tissues, the most common clinical material in the pathology departments, is still complicated. One of the main problems is the wide variability of formalin fixation times observed in the clinics (5).
For this reason, we optimized a novel proteome extraction method suitable for wide fixation times. Furthermore, sample preparation steps were simplified to increase performance, and the elevated temperature and pressure seem to solubilize proteins very efficiently. Our TOP method uses only nonhazardous, environmentally friendly reagents, as opposed to the traditional method, which uses organic solvents. As a result, proteome extraction is done in about 3 h after starting from FFPE cuts, without the need for a fume hood.
Most researchers divide CRC into two main groups, MSS and MSI-H tumors (30). Indeed, recently the FDA-approved immunooncology treatments for MSI-H tumors across multiple indications in the first ever tumor type agnostic manner (55), suggesting a significant role for the immune system in the MSI-H cancers.
Here we demonstrate that optimized clinical proteomics pipeline might help to generate novel fundamental biological insights. Using the TOP method and only nineteen FFPE archival tissues, we were able to identify and quantify over 4100 proteins, of which 1664 were identified in all samples. Unsupervised hierarchical clustering of our data followed the MSS and MSI-H groups, pointing that the proteomes of MSS and MSI-H groups are different.
Further investigation showed that most of the proteins overexpressed by MSI-H tumors are connected to numerous immune system related processes, among them antigen processing and presentation. Importantly, we noted the overexpressed proteins in most of MSI-H patients, suggesting a presence of a preserved biological program. Using immunohistochemistry, we were able to show that the tumor cells, and not the stroma, are responsible for the aberrantly expression of MHC-II, a ligand of the immune-check point protein LAG-3.
Significantly, we noted that STAT1 is overexpressed by the MSI-H tumor cells. Further insight into STAT1's role came from the protein correlation matrix. We were able to show that in vivo only STAT1 levels, but not levels of other STAT proteins, are correlated to these immunological processes.
FIG. 6. STAT1, but not STAT3, levels continue to rise because of long IFNg exposure. A, Western blot analysis of DLD1 cell line; STAT-1 and STAT-3 levels. Total STAT-1 increase with time, whereas total STAT3 seem to plateau after a day of exposure. B, the DLD1 proteome Pearson's correlation matrix. C, PCA analysis (comp.1) clearly separates the three experimental arms. (Green, Pink and red rectangles, as control, 1-day-, and a week-of interferon exposure, respectively). D, The proteins that contribute the most for the PCA scoring (marked in red). STAT1 is marked.

Proteomics Signature of MSI-H CRC
These observations were confusing because STAT1 is considered a tumor suppressor in CRC (43). This view stems from few observations. First, STAT1 was linked to anti-proliferative and pro-apoptotic signaling pathways (56). Specifically, in CRC, nuclear STAT1 was shown to be a good prognostic marker (14), whereas high levels of cytoplasmic STAT1 correlated with shorter survival in early stage CRC, particularly of the microsatellite instability cases (57). In sharp contrast, STAT1 was shown to be an immune tumor promoter for leukemia development. Importantly, STAT1 null tumor cells showed enhanced natural killer cell lysis because of their low protein expression of antigen presentation mole-cules. Furthermore, upon leukemia progression the STAT1 null cells acquired an increased amount of MHC-I (58).
To better understand the roles of Interferon-g and STAT proteins in CRC, we turned to an in vitro system. Our hypothesis was that MSI-H tumor cells evolve under immunological pressure for long periods. Furthermore, we know that Interferon-g is important for this system, and that all nucleated cells express Interferon-receptors (59). Thus, we decided to investigate the poorly understood temporal proteomics changes induced by Interferon-g.
PCA analysis of the DLD1 proteomes showed that the proteome changes reflect the length of exposure to Interferon-g. FIG. 7. Supervised hierarchial clustering of all differentially expressed proteins (608 proteins). The supervised hierarchical clustering clearly highlights that a significant change in the proteome occurs after a week of exposure to IFNg, reflected by the segregation of the 1 week IFNg exposure from the other two groups. However, complex temporal behavior of the proteome is clear, especially in the proteins up-regulated by interferon (green cluster).
Significantly, by PCA analysis we identified STAT1 and several other proteins that correlate with these changes. Further substantiating evidence came from the protein correlation matrix of the in vitro experiment. Like what was seen in the patients: antigen presentation, the proteasome, and various immunological processes were highly enriched. In sharp contrast, the proteins that clustered around the other STAT proteins showed no evidence of interactions, nor linkage to immune-related processes. The fact that these in vitro results emulate the in vivo results, substantiates the linkage of STAT1 levels with the time-dependent Interferon-g proteomic changes and the immunological program seen in MSI-H CRC tumors.
Our data clearly points that in CRC, the role of STAT1 is complex and perhaps pathogenesis dependent. The MSI-H neoplastic cells abnormally express MHC-II complex, and STAT1 modulates CIITA transcription, a key inducer of MHC-II expression by Interferon-g (60,61). MHC-II complex is a known ligand of the immune checkpoint, lymphocyte activation gene-3 (LAG-3, CD223). Furthermore, we observed a high correlation between the mRNAs of STAT1 to LAG-3 and PD-L1 (TCGA CRC data, data not shown). Taken together, our data suggests a link between high STAT1 levels and immune evasion, which supports in turn that STAT1 is an immune-tumor promoter in MSI-H CRC, but not in MSS CRC. However, further studies are needed to substantiate this point.
In summary, clinical proteomics has matured significantly, and optimized pipelines will continue to improve our understanding of human diseases. Using these tools we show that MSI-H CRC tumors express high levels of STAT1. Our findings points toward a preserved, constant primary immune escape mechanism of CRC MSI-H tumors. We hope that our findings will enable further drug development.

DATA AVAILABILITY
The MS proteomics data have been deposited to the Pro-teomeXchange Consortium via the PRIDE partner repository with the data set identifiers PXD019252, PXD019254, and PXD017821.
Funding and additional information-Study was supported by the Israel Cancer association through the ICA USA, Israel Science foundation, German-Israeli foundation grants to G.W.V.; A kind donation of Rosetrees trust to G.W.V.; German-Israeli foundation grant to E.D.V.