Genetic Profile and Functional Proteomics of Anal Squamous Cell Carcinoma: Proposal for a Molecular Classification*

Anal squamous cell carcinoma is a rare tumor and the treatment standards have not suffered any improvement since 1970s. For this reason, a molecular characterization of the disease is still necessary. In this work, two molecular groups with a different protein and genetic profile were described. Additionally, some of these differences suggested biological processes that may be therapeutic targets. Graphical Abstract Highlights Two molecular groups in anal squamous carcinoma according proteomic profile. Differences in possible targeted processes such as metabolism or immune response. Different percentage of tumor lymphocyte infiltration. Difference in the frequency of ATM variants, related to PPAR inhibitors. Anal squamous cell carcinoma is a rare tumor. Chemo-radiotherapy yields a 50% 3-year relapse-free survival rate in advanced anal cancer, so improved predictive markers and therapeutic options are needed. High-throughput proteomics and whole-exome sequencing were performed in 46 paraffin samples from anal squamous cell carcinoma patients. Hierarchical clustering was used to establish groups de novo. Then, probabilistic graphical models were used to study the differences between groups of patients at the biological process level. A molecular classification into two groups of patients was established, one group with increased expression of proteins related to adhesion, T lymphocytes and glycolysis; and the other group with increased expression of proteins related to translation and ribosomes. The functional analysis by the probabilistic graphical model showed that these two groups presented differences in metabolism, mitochondria, translation, splicing and adhesion processes. Additionally, these groups showed different frequencies of genetic variants in some genes, such as ATM, SLFN11 and DST. Finally, genetic and proteomic characteristics of these groups suggested the use of some possible targeted therapies, such as PARP inhibitors or immunotherapy.

years, new therapeutic strategies are needed to improve these outcomes.
Whole-exome sequencing (WES) focused on the identification of disease-causing genes is now being implemented into clinical practice (4). The first work that announced entire exome sequencing was published by Ng et al. (5). Since then, personalized medicine has focused on identifying the cause of rare diseases and cancers.
With the recent improvements in mass-spectrometry (MS) techniques, high-throughput proteomics has made it possible to identify thousands of proteins (6). Proteins are the effectors of biological processes, being closer to the phenotype than genes or transcripts. On the other hand, probabilistic graphical models (PGMs) were successfully used in previous studies to characterize tumors from a functional perspective (7)(8)(9). Moreover, when used in combination, proteomics and genomics provide complementary information.
Previous studies in ASCC were focused in the characterization of genetic variants in this disease using next-generation sequencing techniques. The most frequent mutated genes, such as PIK3CA, FBXW7, FAT1 or ATM, were characterized (10 -13). On the other hand, Herfs et al. used MS proteomics in microdissected anal samples to establish differential protein expression patterns depending of the location (squamous or transitional) (14). However, until date, a molecular classification of ASCC has not been established In this study, we combined WES with high-throughput proteomics to further characterize a cohort of 46 ASCC tumors. This is the first time that a combined study of these characteristics in ASCC has been done. Genomics provides information about the genetic causes of disease and proteins are the ultimate effectors of biological processes. Therefore, a study of these two -omics allows us to obtain a broader picture of the molecular features of ASCC tumors.

EXPERIMENTAL PROCEDURES
Experimental Design and Statistical Rationale-Forty-six paraffin samples from ASCC patients were analyzed by whole-exome sequencing and by mass-spectrometry proteomics. Neither replicate analyses nor control normal tissue samples were necessary because of the nature of the samples and the objectives of the study. We used a large cohort of clinical samples so replicates are not necessary. In addition, this study was focused in the molecular characterization of the disease instead of the carcinogenesis mechanisms (in which comparing normal and tumor tissues are necessary); therefore, normal tissue cannot be used as a control.
Patient Cohort-Forty-six formalin-fixed, paraffin-embedded (FFPE) samples from patients diagnosed with ASCC, obtained before any treatment, were analyzed by WES and MS proteomics. The study was approved by the Ethical Committee of Hospital Universitario La Paz. Informed consent was obtained for all patients in the study. Samples were reviewed by an experienced pathologist and all the samples included at least 70% invasive tumor cells. Patients were required to have a histologically-confirmed diagnosis of ASCC, be 18 years of age or older; have an Eastern Cooperative Oncology Group performance status (ECOG-PS) of 0 to 2; have received no prior radiotherapy or chemotherapy for this malignancy and present with no metastasis. Demographic information related to the tumor and the treatments was collected. Human papilloma virus (HPV) infection was determined by CLART ® HPV2 (Genomica, Madrid, Spain).
DNA Isolation-One 10 m section from each FFPE sample was deparaffinized and DNA was extracted using GeneRead DNA FFPE Kit (Qiagen, Hilder, Germany), in accordance with the manufacturer's instructions. Once eluted, the DNA was frozen at Ϫ80°C until use.
Protein Isolation-Ten to thirty slides of 3 m (depending on the tumor surface area) were used for protein isolation. Proteins were extracted from FFPE samples as previously described (15). Briefly, FFPE sections were deparaffinized in xylene and washed twice with absolute ethanol. Protein extracts were prepared in a 2% SDS buffer by a protocol based on heat-induced antigen retrieval. Protein concentration was measured using the MicroBCA Protein Assay Kit (Pierce-Thermo Scientific, Massachusetts). Protein extracts (10 g) were digested with trypsin (1:50) and SDS was removed from the digested lysates using Detergent Removal Spin Columns (Pierce). Peptides were desalted using self-packed C18 stage tips, dried and resolubilized with 10 l of 3% acetonitrile, 0.1% formic acid.
Whole-exome Sequencing Experiments-WES from 46 ASCC FFPE samples was performed. The isolated DNA was quantified by Picogreen and mean size was controlled by gel electrophoresis. Genomic DNA was fragmented by mechanical methods (Bioruptor) to a mean size of ϳ200 bp. At that point, DNA samples were repaired, phosphorylated, A-tailed and ligated to explicit connectors, trailed by PCR-mediated labeling with Illumina-explicit sequences and samplespecific barcodes (Kapa DNA library age unit).
Exome capture was performed utilizing the VCRome framework (capture size of 37 Mb, Nimblegen, Roche, Switzerland) under a multiplexing of 8 samples for every capture response. Capture was performed entirely in accordance to manufacturer's instructions. After capture, libraries were purified, quantified and titrated using Real Time PCR before sequencing. Samples were then sequenced to an approximate coverage of 4.5 Gb per sample in Illumina-NextSeq NS500 (Illumina Inc., Cambridge, UK) utilizing 150 cycles (2 ϫ 75) High Output cartridges.
Raw data files were available in Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra) under the name PRJNA573670.
Bioinformatics Analyses of Whole-exome Sequencing Data-The quality of the WES experiments was verified using FASTQC (http:// www.bioinformmaticsbabraham.ac.uk/projects/fastqc). First, primers were removed using Cutadapt. Then, FASTQ files were filtered by quality using PrinSeq. Both tools are available in GPRO tool (16). Sequence alignment was performed using the human genome h19 as the reference and BWA tools (17), Samtools (18) and Picard Tools (http://picard.sourceforge.net). Variant calling was performed using the MuTect tool from the GATK4 package (19) combined with Pica-rdTools, first, to create a panel of normal samples (PON) and second, to perform the variant calling (20). The PON was built using 11 samples from Iberian exomes from a 1000 genome database (http:// www.ncbi.nlm.nih.gov/sra/).
Finally, variants were annotated using Variant Effect Predictor (VEP) (21). The information about the genetic variants provided by VEP was used to filter the genetic variants. The filtering criteria were: a frequency in the general population, according gnomAD database, 1 The abbreviations used are: ASCC, anal squamous cell carcinoma; WES, whole-exome sequencing; MS, mass-spectrometry; 5FU, 5-fluorouracil; DFS, disease-free survival; FFPE, formalin-fixed paraffin-embedded; ECOG-PS, eastern cooperative oncology group performance status; PON, panel of normal samples; VEP, variant effect predictor; PGM, probabilistic graphical model; SAM, significance analysis of microarrays; HPV, human papillomavirus; PARPi, PPAR inhibitors. of less than 1%, a high or moderate impact, and the presence of a variant of this gene in our cohort in at least 10% of the patients.
Liquid Chromatography-Mass Spectrometry Analysis-MS analysis was performed using a Q Exactive HF-X mass spectrometer (Thermo Scientific) equipped with a Digital PicoView source (New Objective) and coupled to a M-Class UPLC (Waters, Massachusetts). Solvent composition at the two channels was 0.1% formic acid for channel A and 0.1% formic acid, 99.9% acetonitrile for channel B. For each sample 3 l of peptides were loaded on a commercial MZ Symmetry C18 Trap Column (100Å, 5 m, 180 m ϫ 20 mm, Waters) followed by nanoEase MZ C18 HSS T3 Column (100Å, 1.8 m, 75 m ϫ 250 mm, Waters). The peptides were eluted at a flow rate of 300 nL/min by a gradient from 8 to 27% B in 85 min, 35% B in 5 min and 80% B in 1 min. Samples were acquired in a randomized order. The mass spectrometer was operated in data-dependent acquisition mode (DDA), acquiring full-scan MS spectra (350 -1400 m/z) at a resolution of 120,000 at 200 m/z after accumulation to a target value of 3,000,000, followed by HCD (higher-energy collision dissociation) fragmentation on the twenty most intense signals per cycle. HCD spectra were acquired at a resolution of 15,000 using normalized collision energy of 28 and a maximum injection time of 22 ms. The automatic gain control (AGC) was set to 100,000 ions. Charge state screening was enabled. Singly, unassigned, and charge states higher than seven were rejected. Only those precursors with an intensity above 110,000 were selected for MS/MS. Precursor masses previously selected for MS/MS measurement were excluded from further selection for 30 s, and the exclusion window was set at 10 ppm. The samples were acquired using internal lock mass (22) calibration on m/z 371.1012 and 445.1200.
The MS proteomics results were handled using the local laboratory information management system (LIMS) (22) and all relevant data have been deposited to Chorus under the project name "Anal squamous cell carcinoma proteomics, Project ID: 1578." Protein Identification and Label-free Protein Quantification-The acquired raw MS data was processed by MaxQuant (version 1.6.2.3), followed by protein identification using the integrated Andromeda search engine (23). Spectra were searched against a Uniprot reference proteome (taxonomy 9606, canonical version from 2016-12-09, 20,913 entries), concatenated to its reversed decoyed fasta database and common protein contaminants. Methionine oxidation and N-terminal protein acetylation were set as variable modifications. None fixed modifications were used. Enzyme specificity was set to trypsin/P allowing for a minimal peptide length of 7 amino acids and a maximum of two missed-cleavages. MaxQuant Orbitrap default search settings were used. Mass tolerance for precursor and fragment ions was fixed to 20 ppm. The maximum false discovery rate (FDR) was set to 0.01 for peptides and 0.05 for proteins. Label-free quantification was enabled and a 2 min window for match between runs was applied. In the MaxQuant experimental design template, each file is kept separate in the experimental design to obtain individual quantitative values.
As quality criteria, the detectable measurement in at least 75% of the samples and the presence of two unique peptides were applied. Log2 of the data was calculated and missing values were imputed to a normal distribution using Perseus software (24).
Statistical Analyses-Statistical analyses were performed in Graph-Pad Prism 6 and SPSS IBM Statistics 20. Network analyses were performed using Cytoscape software (25). Hierarchical cluster and Significance Analysis of Microarrays (SAM) were performed using MeV software (26). First, all the identified proteins were used to build a hierarchical cluster based on Pearson correlation. In this hierarchical cluster, two different groups were identified. Then, a Significance Analysis of Microarrays was used to determine those proteins that were differentially expressed between the two identified groups of patients. SAM analysis allows the identification of differential proteins between groups by a t test corrected by permutations over the number of samples. The significance was determined using the False Discovery Rate (FDR) (27). The Genomics of Drug Sensitivity in Cancer database (https://www.cancerrxgene.org/) was used to find possible therapeutic targets. p values are two-sided and considered statistically significant under 0.05.
Tumor Infiltrating Lymphocyte Quantification-For the quantification of tumor infiltrating lymphocytes (TILs), hematoxylin-eosin preparations were evaluated for those samples in which they were available. Percentage of TILs was blind quantified by a person that did not know clinical data and proteomic group classification.
Samples were classified according the percentage of lymphocytes counted in these preparations into two groups: 0 -25% and 25-75%of TILs (28). In general, none of the samples presented a percentage of TILs higher than 75%.
Network Construction and Functional Node Activity-With the aim of studying proteomics data from a functional point of view, probabilistic graphical models (PGMs) compatible with high-dimensional data, were used. Briefly, grapHD (29) and R v3.2.5 were used to generate the PGM. For the construction of this PGM proteomics expression data without any a priori information was used, and correlation was employed as associative measurement. The PGM network was built in two steps: first, the spanning tree with maximum likelihood was found and, then, the edges was chosen based on the reduction of the Bayesian Information Criteria (BIC) and the preservation of the decomposability of the graph (30). The resulting network was analyzed to define a functional structure by gene ontology analyses, as in previous works (7)(8)(9). Briefly, the network was split in branches. These branches were analyzed by gene ontology analyses which allow us assigning an overrepresented function for each branch, thereby determining different functional nodes in the network. Gene ontology analyses were performed using DAVID 6.8 webtool (31) using "homo sapiens" as background and GOTERM-FAT, Biocarta and KEGG as categories.
Once each branch had been assigned a function, functional node activities were calculated as the mean of the proteins of each branch related to the main function of that branch (8,9). Then, comparisons between groups using Mann-Whitney test were done.
Metabolic Modeling and Estimation of Tumor Growth Rate-Flux Balance Analysis is a method used to model the flow of metabolites through biochemical networks (32). It allows the growth rate or production rate of a given metabolite to be estimated using as input gene or protein expression data. In this study we used the whole human reconstruction Recon2 and the biomass reaction included in this model as the objective function and as representative of tumor growth (33). Proteomics data was introduced into the model to make accurate predictions by solving Gene-Protein-Reaction rules (GPRs), which contain the relationships between genes and enzymes, using a modified algorithm of Barker et al. (34) and a modified E-flux (7,35). Flux Balance Analysis calculations were performed using the COBRA Toolbox library, available for MATLAB (36).

RESULTS
Patient Cohort-Forty-six patients diagnosed with nonmetastatic ASCC were recruited for this study. Twenty-eight patients came from the VITAL clinical trial (GEMCAD-09-02, NCT01285778), treated with panitumumab, 5FU, Mitomycin C, and radiotherapy. The other 18 patients were included from the routine clinical practice at Hospital Universitario La Paz and Hospital Clinic and were treated with cisplatin-5FU or Mitomycin C-5FU, and concomitant radiotherapy.
For the survival analyses, 4 patients that could not receive chemo-radiotherapy were excluded (two of them had stage I anal carcinomas, and the other two had stage III tumors). The median follow-up was 33.18 months (5.53-116.4) and there are 13 relapse events. All clinical characteristics are shown in Table I.
Whole-exome Sequencing Experiments-Forty-six FFPE samples were analyzed by WES. The mean coverage of the samples was 42.6x, except for one sample that presented a coverage of 3.57x. This sample was dismissed from the subsequent analyses. Once this sample was dismissed, all the samples presented a mapping efficiency of 90 -98%, except for one sample (with a mapping efficiency of 75.4%). Human exome has Ͼ 195,238 exonic regions, of which only 23,021 (11.21% of the human exome) have not been mapped in any sample.
After VEP analysis and filtering, 382 genes that presented a genetic variant with high or moderate impact in at least 10% of our cohort were identified (supplemental Table S1). These genes were mostly related to DNA repair, chromatin binding and focal adhesion processes. PIK3CA was mutated in the 40% of the patients of our cohort, FBXW7 in 16%, FAT1 in 18%, and ATM was mutated in 27% of the patients. Fig. 1 summarizes the high and moderate impact alteration landscape in our ASCC cohort.
Proteomics Experiments-After dismissing one sample in the WES experiments, 45 FFPE samples were analyzed by MS and 6035 proteins were identified. After applying quality criteria (detectable measurement in at least 75% of the samples and at least two unique peptides), 1,954 proteins were used for the subsequent analyses (supplemental Table S2).
De Novo Identification of Groups Based on Differential Proteomics Profiles-With the aim of defining de novo molecular groups of patients, a hierarchical cluster was used. Two different molecular groups of patients were obtained based on their protein profiles (supplemental Fig. S1). After the identification of the two groups of patients, a Significance Analysis of Microarrays (SAM) was performed to define the differential proteins between these two groups, yielding 318 proteins which were differentially expressed between these groups (supplemental Table S3). Group 1 showed underexpression of proteins related to translation and ribosomal processes and overexpressed proteins related to metabolism, specially glycolysis, T lymphocytes, and adhesion. On the other hand, Group 2 showed underexpression of proteins related to metabolism, T lymphocytes, and adhesion processes and overexpressed proteins related to translation and ribosomes (Fig. 2).
With respect to the clinical data distribution between these two groups, both were comparable; there were no significant differences in the distribution of clinical parameters (supplemental Table S4). In addition, there were not significant differences in disease-free survival or overall survival (supplemental Fig. S2).
A search in the Genomics of Drug Sensitivity in Cancer database (https://www.cancerrxgene.org/) suggested RAC1 (overexpressed in Group1 and underexpressed in Group 2, supplemental Fig. S3) as a possible therapeutic target. The drug associated with this gene is EHT-1864.
Tumor Lymphocyte Infiltration-The two proteomics groups of patients presented a differential expression in proteins related to T lymphocytes. For this reason, TILs were quantified in each sample in order to establish if a relationship exists between the expression of these proteins and tumor lymphocyte infiltration. It was possible to quantify TILs on 39 of the 45 ASCC samples. For Group 1, 13 samples presented a percentage of TILs between 0 to 25%, and 11 samples showed a percentage of TILs between 25 to 75%. On the other hand, in Group 2, 13 samples showed an infiltration between 0 to 25% and only two samples presented a percentage of TILs between 25 to 75% (Fig. 3). Therefore, Group 2, which is the group that underexpressed proteins related to T lymphocytes, presented a lower percentage of TILs in their samples.
Functional Characterization of Proteomics Data-To study proteomics data from a functional perspective, a probabilistic graphical model network was created using the 1,954 proteins obtaining from the MS experiments with no other a priori information. The resulting network was looking for functional structure and it was divided into 10 functional nodes, one of them with an overrepresentation of two biological functions (metabolism and mitochondria) (Fig. 4). Then, functional node activities were used, as in previous works (7)(8)(9), to study the differences in biological processes between the two identified groups of patients. There were significant differences between the two groups in membrane category, in the two functional nodes related to metabolism, and in the functional nodes associated with adhesion, ribosomes, translation, extracellular matrix and splicing (Fig. 5). This analysis offered complementary information to classical analyses.
Metabolism Nodes-We found two different nodes related to metabolism, both of which showed a higher expression in Group 1. The metabolism 1 node was formed by 158 proteins, mostly related to mitochondrial metabolism, especially the tricarboxylic acid cycle. Among them, three were also identified as differentially expressed by the SAM analysis: P09622 (DLD), P06744 (GPI) and P14550 (AKR1A1). The metabolism 2 node included 104 proteins related to mitochondria and metabolism, especially oxidative phosphorylation, such as P04406 (GAPDH), P06733 (ENO1), P07954 (FH), or Q9UI09 (NDUFA12).
Adhesion Node-The adhesion node showed a higher expression in Group 1. The adhesion node included 654 proteins, 25 of them classified by SAM as differentially expressed between the two groups of patients.

Genetic Variants with Different Frequencies Between the Two Groups of Patients Established by Proteomics-
The frequencies of genetic variants for each gene were compared with determine whether the groups of patients defined using proteomics data also showed differences in genetic variants. In general, the distribution of the mutations of each gene across the two proteomic groups was homogeneous. Only 12 genes presented different frequencies of genetic variants between the two groups (Fig. 6). It is remarkable the presence of ATM, which presented missense variants.
Tumor Growth Rate Predicted by Metabolic Modeling-Finally, Flux Balance Analysis allows for the comparison of the estimated tumor growth rate between groups of tumors. The tumor growth rate predicted for Group 1 was significantly higher than the tumor growth rate predicted for Group 2 (Fig.  7). Therefore, Group 1 tumors seem to be more proliferative than tumors of Group 2.

DISCUSSION
ASCC is an infrequent tumor. With no targeted therapy yet established, the molecular characterization of these tumors is still necessary. In this study, we combined the two main -omics, WES and proteomics, to further characterize a cohort of 46 patients diagnosed with primary ASCC. To our knowledge, this is the first study combining WES and proteomics in ASCC. The results of this study allow us to establish two molecular subgroups in ASCC with different molecular features. Moreover, the analyses of these two groups pointed out some drug-susceptible processes, such as metabolism, and suggested other possible therapeutic targets, like ATM and its relationship with PARP inhibitors (PARPi).
Previous studies have analyzed both primary and metastatic ASCC paraffin samples using WES or gene panels (10 -13). A previous study using proteomics data to characterize different locations of anal cancer was also existed (14). However, this is the first study that performs proteomics experiments in localized ASCC samples and combines proteomics data with WES information. Previous WES studies served to identify frequently-occurring mutations in this disease, including mutations in the PIK3CA, FBXW7, FAT1 and ATM genes (11)(12)(13). In our cohort, PIK3CA presented a genetic variant with a high or modifier impact in 40% of the patients, FBXW7 in 16%, FAT1 in 18%, and ATM in 27% of the patients. The mutation landscape identified in these tumors was mostly related with DNA repair and chromatin processes.
On the other hand, using proteomics data and HCL, it was possible to establish two different molecular groups of patients. Differential proteins were mainly related to the metabolism of glucose, translation and ribosomes, tumor lymphocytes and adhesion. Although these molecular groups have not been associated with any clinical or prognostic features, these processes may be relevant in the development of new therapeutic strategies. For instance, those tumors that over- expressed proteins related to glycolysis may be candidates for drugs targeting metabolism such as metformin, which has been shown to have cytostatic effects on other tumors, such as breast or bladder carcinoma (7). On the other hand, one of the main differences between these two groups is that Group 1 had a higher expression of proteins related to T lymphocytes. With the bloom of immunotherapies, immune proteins have acquired great relevance. Therefore, this group of patients may be good candidates for immunotherapy. In fact, nivolumab has been reported to be an effective therapy in metastatic ASCC and its efficacy is related to the presence of cytotoxic T cells (37). Moreover, pembrolizumab has demonstrated its antitumor activity in PD-L1-positive advanced ASCC (38). Strikingly, the two proteomics groups presented a different infiltration of lymphocytes, being Group 2 (which underexpressed proteins related to T lymphocytes) the group with a lower percentage of TILs in their samples.
On the other hand, the search in Genomics of Drug Sensitivity in Cancer database to establish possible therapeutic targets suggested RAC1 (overexpressed in Group 1 and underexpressed in Group 2) as a potential therapeutic target. RAC1 has as associated drug, EHT-1864, which affects the cytoskeleton (39).
In addition, MS experiments and PGMs allow for the functional characterization of these two groups of patients, offering complementary information about the relevance biological processes involved in the disease. In this functional analysis, differences in metabolism were confirmed. There were also differences at the mitochondria level, which is the target for metformin. Metabolism nodes showed a higher expression in Group 1. P06744 (GPI, glucose 6-phosphate isomerase), included in the metabolism 1 node, is the enzyme that converts glucose -phosphate into fructose 6-phosphate and a higher expression has been associated with tumorigenesis and poor prognosis in gastric cancer (40).
The adhesion node also had a higher expression in Group 1 and contained 25 proteins identified by SAM as differentially expressed. P46940 (IQGAP1) has been associated with poor prognosis in head and neck squamous cell carcinoma (41) and has also been associated with response to chemo-radiotherapy in rectal adenocarcinomas (42). P27797 (CALR) induces an immune response in esophageal squamous cell carcinoma (43). P08133 (ANXA6) promotes EGFR deactivation (44). P62829 (RPL23) negatively regulates apoptosis and inhibits growth in colorectal cancer (45). On the other hand, RPL23 has been identified as an oncogene in head and neck squamous cell carcinoma (46). O43707 (ACTN4) increases cell motility and invasion in colorectal cancer (47). Previous studies have described how P84077 (ARF1) forms a complex with EGFR and promotes invasion in head and neck squamous cell carcinoma (48). P35579 (MYH9) plays an important role in adhesion and migration, and its overexpression is correlated with metastasis in colorectal cancer through the MAPK pathway (49). Aberrant activity of P63000 (RAC1), FIG. 4. Functional network created using the proteomics data from the ASCC patients. Ten nodes with different biological functions were identified. which is involved in metastasis and proliferation, is a hallmark in cancer, (50). At the same time, O75131 (CPNE3) promotes cell migration through RAC1 (51). Finally, P61586 (RHOA), a tumor suppressor gene, plays a relevant role in colorectal cancer, being associated with metastasis and is deactivated in a significant number of colorectal tumors (52). In conclusion, the majority of the proteins included in this adhesion node play well-established roles in metastasis processes.
Moreover, the combination of the proteomics and genetic variants information showed that the two molecular groups defined by proteomics also had a different mutational profile. Group 2 showed a higher frequency of ATM genetic variants. Previous studies have described a high response rate to PARPi, as olaparib, in prostate tumors with mutations in ATM (53). Therefore, Group 2 patients may also be candidates for the treatment with PARPi.
In addition, FBA predicted a higher tumor growth rate for Group 1 than for Group 2. It may be possible that, given their higher proliferation, the tumors of Group 1 may also be more responsive to chemotherapy. This study has some limitations. The results need to be validated in an independent cohort. The information in ASCC is scarce so a prospective validation will be needed. The number of proteins detected by MS still needs technical improvement to be at the same level as genomics. However, proteomics offers a more direct measurement of the effectors of biological processes. Finally, a consensus analysis pipeline to apply in cancer sequencing data is still necessary.
In conclusion, two different molecular groups of patients have been proposed based on proteomics expression. This may be the first step toward a personalized therapy approach in ASCC. In addition, some possible targeted therapies, such as PARPi or immunotherapy, according to the molecular features (genetic and protein-based) defined in the two proteomics groups were suggested.

DATA AVAILABILITY
Proteomics raw data are available in Chorus repository (https://chorusproject.org/pages/index.html) under the name "Anal squamous cell carcinoma proteomics, Project ID: 1578" and whole-exome sequencing raw data files are available in Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/ sra) under the name PRJNA573670.