Quantitative mass spectrometry to interrogate proteomic heterogeneity in metastatic lung adenocarcinoma and validate a novel somatic mutation CDK12-G879V

Lung cancer is the leading cause of cancer death both in men and women. Tumor heterogeneity is an impediment to targeted treatment of all cancers, including lung cancer. Here, we sought to characterize changes in tumor proteome and phosphoproteome by longitudinal, prospective collection of tumor tissue of an exceptional responder lung adenocarcinoma patient who survived with metastatic lung adenocarcinoma for more than seven years with HER2-directed therapy in combination with chemotherapy. We employed “Super-SILAC” and TMT labeling strategies to quantify the proteome and phosphoproteome of a lung metastatic site and ten different metastatic progressive lymph nodes collected across a span of seven years, including five lymph nodes procured at autopsy. We identified specific signaling networks enriched in lung compared to the lymph node metastatic sites. We correlated the changes in protein abundance with changes in copy number alteration (CNA) and transcript expression. To further interrogate the mass spectrometry data, patient-specific database was built incorporating all the somatic variants identified by whole genome sequencing (WGS) of genomic DNA from the lung, one lymph node metastatic site and blood. An extensive validation pipeline was built for confirmation of variant peptides. We validated 360 spectra corresponding to 55 germline and 6 somatic variant peptides. Targeted MRM assays demonstrated expression of two novel variant somatic peptides, CDK12-G879V and FASN-R1439Q, with expression in lung and lymph node metastatic sites, respectively. CDK12 G879V mutation likely results in a nonfunctional CDK12 kinase and chemotherapy susceptibility in lung metastatic sites. Knockdown of CDK12 in lung adenocarcinoma cells results in increased chemotherapy sensitivity, explaining the complete resolution of the lung metastatic sites in this patient.


Introduction
Lung cancer is the leading cause of cancer mortality in men and women. The identification of several actionable targets in lung adenocarcinoma, the commonest histology of non-small cell lung cancer (NSCLC), has resulted in development of targeted treatments against those targets.
Although patients harboring those somatic mutational targets, such as the Epidermal growth factor receptor (EGFR) and EML4-ALK translocation respond well to the targeted agents, they eventually develop resistance. A key determinant of this acquired resistance is tumor evolution and generation of intra-tumor and inter-metastatic heterogeneity (1)(2)(3). The degree of mutational heterogeneity is highly variable within primary tumors and between primary and metastatic or recurrence sites. In some tumor types, such as recurrent glioma, the emergence of heterogeneity may be therapy related (Johnson et al 2014). In certain cancer types that are driven by environmental mutagens such as ultraviolet light in melanoma and smoking in lung cancer, the extent of homogenous mutational burden is greater than other caner types (4).
However, there is a subgroup of lung cancer patients that demonstrate extreme intra-tumor and inter-metastatic genomic mutational heterogeneity (2,3,5). We have demonstrated unprecedented genomic heterogeneity in an exceptional responder lung adenocarcinoma patient who survived with metastatic lung adenocarcinoma for more than 7 years while on treatment with combination HER2-directed targeted treatment and chemotherapy. There was less than 1% similarity of somatic alterations between the lung and lymph node metastatic sites in this patient (5). Multidimensional genomic analyses of the next-generation sequencing data including, for single nucleotide variants (SNVs), insertion-deletions (in-dels), copy number alterations (CNA) and expressed variants from RNAseq can give a comprehensive view of the genomic landscape and its contribution to tumor heterogeneity. However, how this genomic heterogeneity influences the proteome and phosphoproteome is an important question that can be addressed by global mass spectrometry-based proteomics analyses. Recently, the NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) has employed mass spectrometry to analyze the proteome and phosphoproteome of colon, serous ovarian and breast primary tumors that underwent extensive genomic characterization (6-8). However understanding of temporal and spatial proteogenomic tumor evolution in metastatic lung adenocarcinoma is lacking.
Here, we have used quantitative mass spectrometry strategies to identify and quantify the proteome and phosphoproteome of sequentially acquired lung and lymph node metastatic sites across a span of 7 years during which the patient was treated with combination HER-2 directed therapy. The metastatic sites were removed by surgical excision at the time of site-specific progression of disease and the final acquisition was at the time of autopsy. Interrogation of patient-specific databases built using the whole genome sequencing data from the lung and lymph node metastatic sites identified key somatic variants at the peptide-level. We further validated the novel CDK12G879V mutation by functional analyses in lung adenocarcinoma cells and provide evidence that the presence of this mutation only in the lung, and not in lymph nodes might have resulted in sensitivity of the lung metastatic sites to chemotherapy and a "cure" of the lung metastatic sites with combination HER-2 directed targeted and standard chemotherapy in this patient.

Tumor tissue collection by serial biopsies and at autopsy
Index patient was treated with standard of care treatment regimens or specific clinical protocols at the NIH Clinical Center or Walter Reed National Military Medical Center over a period of more than 7 years. At specific times when there was progression, patient underwent excisional biopsy of progressive lymph nodes. Patient also underwent a surgery to remove a progressive lesion in the lung (see time line, Fig 1A). Tumor tissue was frozen immediately in liquid nitrogen before transporting to the laboratory for storage at -80 o C. About 10-15 mg of tumor tissue was cut and lysed in 400µl of urea lysis buffer using a tissue lyser (Qiagen). Lysates were centrifuged at 14,000 rpm at 4 o C for 10 mins and the clear supernatants were transferred to new tubes. Protein concentrations were determined by the Modified Lowry method (BioRad).

Cell Culture and SILAC Labeling
Human lung adenocarcinoma cells were obtained from ATCC. All cells were cultured in RPMI stop codons, alternative splicing, fusion proteins, expression of unannotated parts of the genome.
Our database for this study contained a total of 71,503 entries.

Data Analysis
Peptides and proteins were identified and quantified using the Maxquant software package (version 1.5.3.30) with the Andromeda search engine (10,11). MS/MS spectra were searched against the customized patient specific database and quantification was performed using default parameters for 3s-SILAC or TMT10plex in MaxQuant. The parameters used for data analysis include trypsin as a protease with two missed cleavage sites allowed. Carbamidomethyl cysteine was specified as a fixed modification. Phosphorylation at serine, threonine and tyrosine, deamidation of asparagine and glutamine, oxidation of methionine and protein N-terminal acetylation were specified as variable modifications. The precursor mass tolerance was set to 7 ppm and fragment mass tolerance to 20ppm. False discovery rate was calculated using a decoy database and a 1% cut-off was applied to both peptide table and phosphosite table. Normalized ratios from super-SILAC experiment or corrected intensities of the reporter ions from TMT labels were obtained from the MaxQuant search. For the TMT experiment, relative ratios of each channel to the reference channel were calculated. Perseus (version 1.5.5.3) was used to view and further analyse the data. Hierarchical clustering of proteins and phosphosites were obtained in Perseus using log ratios or log intensities of protein and phosphorylation abundance. The Protein-Protein Interaction (PPI) maps of the phosphosite clusters were imported from the "STRING: protein query" module of the cystoscope software (San Diego, CA, USA, version 3.4.0) (12) with the confidence cutoff of 0.80. These maps were analyzed for functional enrichment of the gene ontology biological process categories using the ClueGO 2.2.6 plugin (13)with the kappa statistic >=0.4, a two-sided hypergeometric test for enrichment with Bonferroni step down method for correction of the multiple hypothesis testing. A p-value of 0.001 was used as the cut-off criterion.

Correlation between mRNA expression and protein abundance
We correlated mRNA expression and protein abundance within each of two metastatic tumor sites (L and LN1) and the PDX. We used RNA-seq FPKM values and protein intensity measurements from TMT and SILAC experiments to estimate mRNA and protein abundance, respectively. For each tumor site, we calculated the Spearman correlation coefficient between log2 values of FPKM and protein intensity values for genes that had measurements in both mRNA and protein (number of genes for TMT experiments were: L, 3853; LN1, 3880; PDX, 3791, and number of genes for SILAC measurements were: L, 5890; LN1, 5920).

Copy number variation analysis
We used OncoScan CNV FFPE Assay Kit (Affymetrix) to perform copy number variation (CNV) analysis. Copy number data was processed and normalized using OSCHP-TuScan algorithm, which determines allele specific copy number and estimates ploidy. Copy number log2 ratios of each tumor site was estimated with Nexus Express for OncoScan. Regions of copy number gains or losses across all metastatic tumor sites were matched based on the location of genes in each region. A CNV heatmap was generated based on hierarchical clustering analysis using Euclidean distance.
All fragmentation spectra from the Super SILAC experiments were searched using Mascot 2.5 (Matrix Science, Ltd., London, UK) and MS-GF+ (v20140716) against the patient-specific database created using the QUILTS pipeline. Mascot searches were run using a 20ppm precursor mass tolerance (monoisotopic) and a 0.05 Da fragment mass tolerance, allowing for up to 2 missed cleavages and considering only fully tryptic peptides. Quantitation mode was set to SILAC K+6; R+6. Carbamidomethylation of C was set as a fixed modification and oxidized methionine as variable. MS-GF+ settings were the following: 10 ppm precursor mass tolerance, -m 3, -inst 1,ntt 2, -tda 1, -ti 0, 0 -max Length 60 -min Length 7. These settings were specific for Orbitrap HCD spectra, performed target-decoy-analysis, did not allow for isotopic matching, and excluded peptides longer than 60 and shorter than 7 amino acids long. Carbamodomethyl of C was used as a fixed modification and heavy versions of K and R (+6.020), oxidation of M, and protein Nterminal acetylation were used as variable modifications. MZID files were converted to TSV using the MzIDToTSV function of the program.
Next, spectra from identifications of variant peptides identified by MaxQuant were extracted following removal of peptides that did not contain variant sequences (e.g., those giving rise to a K or R adjacent to the start of the peptide sequence). In total, 833 spectra corresponding to 105 sequences (of the 198 identified by MaxQuant) were analyzed. All spectra were required to have corresponding identifications better than the Mascot Identity Threshold (95%) or a Q Value < 0.01 peptide. The top transitions were selected based on the presence of intense y-ions at m/z greater than the precursor. In case high m/z y-ions were not seen, other more abundant ions were chosen.
All the raw data were imported in Skyline 3.7 and manually reviewed to delete any poorly performing transitions. Each peptide consisted of four or five transitions. The optimization of collision energies was done by using the values calculated by Skyline for the monoisotopic precursor and product masses for the Agilent 6495 system. The best performing transitions were combined in one scheduled MRM method with a 25-min gradient and a 2-minute retention time window, using retention times extracted during the method refinement stage. For endogenous target peptide quantification, heavy labeled synthesized peptides were spiked-in as an internal standard to the tryptic peptides of different tissue samples. All the raw data files were imported in Skyline 3.7 and data annotations were manually inspected. Peak Area Ratios (PAR) of light endogenous signals to the heavy internal standards were exported to MS excel for further analysis of mean, standard deviation and % co-efficient of variation.

CRISPR-Cas9-mediated knockout of CDK12 and cell growth analysis
A549 cells were infected with lentivirus expressing Cas9. Clones were obtained after antibiotic selection and one of the clones with higher expression of Cas9 was selected for experiments based on Western blot detection of Cas9. A549-Cas9 cells were then transduced with lentivuruses expressing CRISPR sgRNAs that target exon 1 and exon 2 of CDK12. After puromycin selection, clones were picked and expanded. Cells were lysed with 1x modified RIPA buffer, and immunoblotted for CDK12 expression. Two clones, A549-Cas9-261-7 and A549-Cas9-264-5, in which the sgRNAs target exon 1 and exon 2 of CDK12, respectively, were selected for further functional analysis based on the knockdown of CDK12 expression.
For chemotherapy treatment and survival analysis, cells were trypsinized with 0.25% trypsin in EDTA, and spun down at 1,000 rpm for 5 mins at room temperature. Cells were re-suspended in RPMI medium with 10% FBS, 1% pen/strep, counted with trypan blue exclusion and plated 4,000 cells per well in a 96 well-plate. Cells were allowed to settle overnight before drug treatment. 10x stock of camptothecin (Cell Signaling Technology) was prepared fresh for each experiment. 10uL of camptothecin was added to 90uL of RPMI media already present in each well. Cells were incubated at 37°C, 5% CO2 for 48 or 72 hours. After 48 or 72 hours of treatment, all media was removed, and 50uL of 1x Promega CellTiter-Glo® Luminescent Cell Viability Assay reagent was added to each well. Luminescence was measured with a SpectraMax M5 microplate reader and recorded by SoftMax Pro 5.4.1. Raw luminescence was normalized and plotted with MS Excel.

Brief clinical history, tumor tissue procurement and summary of protein identification and quantification
A 50-year old African American never smoker was diagnosed with metastatic lung adenocarcinoma with metastases to lung and lymph nodes. Patient was treated with combination chemotherapy followed by HER2-directed therapy either alone or in combination with various chemotherapy regimens over 7 years. Two case reports (15, 16) and a comprehensive genomics analysis of multiple tumor biopsies across his 7-year treatment course have been reported demonstrating extreme heterogeneity between lung and lymph node metastatic sites in this patient (5). Here, we sought to determine the temporal and spatial heterogeneity of the proteome and report a comprehensive quantitative proteomics analysis on different tumors metastatic to lymph nodes (LN1, LN2 and PDX) obtained by sequential excisional lymph node biopsies, 5 distinct lymph node metastases obtained at autopsy (MET1-5), and a progressive metastatic lung tumor obtained by wedge resection surgery (L) (Fig. 1A). The patient-derived xenograft (PDX) was generated from one of the lymph node biopsies from the left neck. The patient was treated with combination chemotherapy followed by targeted therapy in the form of HER2-directed treatments with/without chemotherapy throughout his course of more than 7 years as depicted in the timeline (Fig. 1A).
Both super-SILAC and TMT10plex-based quantitative proteomics strategies were employed to explore the tumor-specific and temporal heterogeneity of the expressed proteomes. In the super-SILAC strategy, twelve "heavy" amino acids-labelled lung adenocarcinoma and immortalized lung epithelial cell lines were pooled at equal amounts and used as the standard for "spike-in".
Lysates from three distinct metastatic tumors (LN1, Lung and LN2) were combined with the "heavy" standard lysate at equal amounts for further sample processing and mass spectrometry analysis (Fig. 1B). The TMT-based quantitative proteomics strategy was used for the quantification of the proteome and phosphoproteome of nine distinct tumor metastases, including two lymph nodes (LN1 and LN2), one lung tumor (L), one PDX derived from a lymph node metastasis, and five lymph node metastases (MET1-5) procured at autopsy (Fig. 1C). Lysates from the nine samples were pooled at equal amounts and labelled with TMT 126 to be used as the reference. After trypsin digestion and basic reverse-phase HPLC fractionation, tandem mass spectrometry data at high accuracy and resolution was acquired on an Orbitrap Elite mass spectrometer. A total of 6214 and 4061 proteins were identified from super-SILAC and TMT experiments, respectively (Fig. 1D). More than 4000 proteins were quantified from both experiments. 3648 proteins were identified and quantified in both experiments (Fig 1E).
PANTHER analysis gives the protein distribution based on their molecular functions, biological processes and cellular components. More than 2000 proteins had catalytic activity that include kinases, phosphatases and metabolic enzymes (Fig. 1F).
Correlation and distribution of protein abundance ratios and clustering analysis reveals more similarity of protein abundances between the lung and LN2 metastases.
Next, we compared the protein quantification results between the super-SILAC approach and the TMT approach. Correlations between the two different quantification approaches and three different samples were plotted. The protein ratios across three different samples (LN1,L and LN2) were well correlated between super-SILAC and TMT approaches ( Fig. 2A). Correlation of the protein SILAC ratios demonstrated that the protein abundance of LN2 correlated better with lung and LN1 (R 2 = 0.68 and 0.67, respectively). However, protein abundance ratios of LN1 correlated less with that of the lung (R 2 = 0.53), indicating that the lung tumor was more similar to the LN2 lymph node that was procured 2 years after the lung surgery than LN1 that was procured 6 months before the surgery. From the SILAC protein ratio distribution, we observed that the large majority of proteins were unchanged (Fig. 2C). Around 12% of proteins from each tumour biopsy sample were differentially expressed > 2-fold relative to the heavy labelled standard lysate mixture. 8-9% of the proteins were differentially expressed < 2-fold in LN2 and lung; however, in LN1, 19% of the proteins were differentially expressed < 2-fold. The number of proteins differentially expressed <2-fold in LN1 is twice that of LN2 and lung, indicating that more proteins had lower abundance in LN1 compared to LN2 and lung. Hierarchical clustering of proteins based on their "Super-SILAC" abundance ratios showed that LN2 and lung clustered together, providing further evidence of greater heterogeneity in the proteome of LN1 compared to that of Lung and LN2 ( Supplementary Fig. 1A). We identified many extracellular proteins and proteins involved in immune effector processes with increased expression compared to the standard lysate. Some of the membrane proteins involved in protein glycosylation and GTPase activity had lower expression level in LN1 compared to LN2 and lung.
PCA analysis of TMT protein ratios across all metastatic sites demonstrated that lung and LN2 metastatic sites were more similar to each other and LN1. LN1, the first progressive metastatic lesion removed was different than all other metastatic sites (Supplementary Fig. 2A). Pair-wise correlation analysis of the TMT protein ratios showed strong correlation of specific autopsy lymph nodes, MET4 and MET5 (R 2 value 0.845). Overall, there was poor correlation of TMT protein ratios across all metastatic sites except between lung and LN2 and between autopsy lymph nodes ( Supplementary Fig. 2B). Hierarchical clustering of protein abundance ratios of each tumor lysate against the pooled lysate from the TMT experiment also demonstrated that LN2 and lung clustered together. LN1 protein abundance ratios clustered with three lymph nodes obtained at autopsy and was quite separate from the other two lymph node metastases ( Supplementary Fig.  among phosphatases. Among the transcriptional regulators that were significantly upregulated in the lung metastatic site compared to LN1 were JUNB and SMARCD2. Interestingly, the levels of these two proteins were similar in the lung and LN2 that was excised more than two years after the removal of the lung metastasis. Next, we examined the function of proteins differentially expressed greater than 2-fold between either of the two lymph node metastases (LN1 vs LN2) or between either lymph node and lung metastases in the super-SILAC experiment. Ingenuity Pathway Analysis (IPA) was performed on proteins that had a "Super-SILAC" ratio of 3-fold. Among the top activated canonical pathways represented by the proteins which had at least 3 fold change in lung compared to either of the lymph nodes were activation of IRF by cytosolic pattern recognition receptors, Rac signalling, HGF signalling, ERBB and Ephrin B signalling and among the top inhibited pathways were Interferon, Sirtuin, PKA and TP53 signalling pathways (Fig. 3A). The heatmaps of protein ratios of lung versus either of the lymph nodes in three selected pathways namely ERBB, Sirtuin and PKA signalling pathways showed differences in abundance of proteins in these specific pathways between the lung and lymph node sites ( Fig. 3B-D). The expression of ERBB2, MAPK3, MAP2K2 and PAK1 in the ERBB signalling pathway was higher in lung compared to the lymph nodes.

Clustering of metastases based on copy number alteration and correlation of gene expression and protein abundance data
Hierarchical clustering of the CNA data showed that copy number of genes across many chromosomal locations from the lung tumor was quite distinct compared to rest of the lymph nodes, confirming significant metastatic site-specific copy number heterogeneity (Fig. 4A).
Interestingly, certain lymph node metastases had more similar CNA than others, and hence clustered together. The PDX that was generated from a lymph node was more similar to LN2 than other lymph nodes. RNA was available for three samples-LN1, lung and PDX. Transcriptome sequencing (RNA-seq) was performed to quantify the transcript expression in these three metastases. Since we had the protein expression data by super-SILAC approach for LN1 and lung and by TMT approach for LN1, lung and PDX we performed Spearman's correlation of the RNA expression values with either the super-SILAC or TMT protein abundance ratios. The correlation coefficients (R 2 ) between the RNA expression values and protein abundance ratios ranged between 0.5162 for lung (L) using the super-SILAC ratio (Fig. 4B) and 0.281 for LN1 using the TMT protein ratio (Fig. 4C).

Heterogeneity across different metastases based on global protein phosphorylation levels
Next, we examined global protein phosphorylation of TMT-labelled lysates of different metastases using TiO2-based enrichment strategy. The PCA plot of phosphorylation ratios shows that the autopsy lymph nodes cluster together. Similar to the protein quantitation ratios, LN2 was again closer to lung (L) compared to LN1 (Supplementary Fig. 3A). Pair-wise correlation analysis of the phosphorylation ratios showed strong correlation of autopsy lymph nodes (R 2 values ranging from 0.486 to 0.835) (Supplementary Fig. 3B). Hierarchical clustering using the quantitative phosphorylation data across the tissue samples confirmed the similarity of the 5 autopsy lymph nodes. LN1 clustered with LN2, and the lung and the PDX lymph nodes were distinct (Fig 5A).
There was a distinct cluster of proteins whose phosphorylation was inhibited in the autopsy lymph nodes (Cluster 662, Fig. 5A). Network analysis of these proteins showed that regulation of Rho protein signal transduction, cell-cell junction organization, regulation of mRNA splicing via spliceosome, spliceosome complex activity, and nuclear-transcribed mRNA catabolic process were the top networks represented by the hypo-phosphorylated proteins in autopsy samples (Fig.   5B). There were several proteins that had increased phosphorylation in the autopsy lymph nodes (Clusters 633, Fig. 5A). Enriched networks among these proteins included cellular component disassembly involved in execution phase of apoptosis, regulation of mRNA processing and mRNA export from nucleus (Fig. 5C). There were proteins that were hyper-phosphorylated across all the tissue samples (Cluster 616, Fig. 5A) and the networks that were enriched among these proteins included spliceosome complex assembly, cellular component disassembly involved in execution phase of apoptosis, establishment of spindle orientation and mRNA export from nucleus ( Fig. 5D).
Ingenuity Pathway Analysis (IPA) was performed on all identified phosphorylated proteins. Among the top canonical pathways that were activated in the lung compared to lymph nodes based on the TMT phosphorylation ratio (Lung/LN2 or Lung/LN1) were CDK5, PKA, ERK/MAPK, chemokine, androgen and RhoA signalling pathways (Fig. 5E). Among the pathways inhibited in lung compared to the lymph nodes were apoptosis, AMPK, VEGF, PPAR, SAPK/JNK and p38/MAPK signalling pathways.

Identification of variant peptides and mutant proteins by integrating the genomics sequencing and mass spectrometry data.
We constructed patient-specific protein database based on the whole genome sequencing (5) of LN1, lung (L) and blood of this patient. We identified 78 and 23 mutant peptides from super-SILAC and TMT experiments, respectively. We validated further all the mutant peptides identified by the patient-specific database search. The mutant peptides were validated by searching through different search engines and applying a series of criteria (Fig 6A). In total, we validated 360 spectra corresponding to 55 germline and 6 somatic variants. Descriptions of these spectra and peptide assignments can be found in Supplementary Table 3. We confirmed and then annotated spectra for the somatic variants, CDK12-G879V, FASN-R1439Q and HNRNPF-A105T in the tumor lysates ( Fig. 6B, Supplementary Fig. 4, left panel). Heavy labelled mutant peptides were also synthesized and analyzed by mass spectrometry to match the annotated MS2 spectra for further validation (Fig. 6B, Supplementary Fig. 4, right panel). In addition, we searched all variant peptides for reference sequences that matched allowing for I/L isobaric substitutions but none were found.

Targeted quantitative mass spectrometry to identify CDK12-G879V and FASN-R1439Q in corresponding metastatic lysates
We employed multiple reaction monitoring (MRM) in a triple quadrupole (QQQ) mass spectrometer to identify and relatively quantify two of the variant tryptic peptides harboring the mutations, CDK12G879V and FASN-R1439Q from the lung (L) and lymph node (LN2) metastatic sites, respectively. Two mutant peptides CDK12-G879V (LADFVLAR) and FASN-R1439Q (GILADEDSSQPVWLK) labelled with "heavy" amino acids were spiked in the tryptic peptide mix from the L and LN2 samples as internal standards for relative quantitation of their endogenous counterparts. Each sample spiked with the corresponding heavy peptide was analyzed in QQQ mass spectrometer in triplicate using the scheduled MRM method. The normalized peak area ratio of CDK12G879V peptide shows that this mutant CDK12 peptide was significantly more abundant in L compared to LN2 (Fig. 6C). The TICs of the transitions from the endogenous mutated peptide can clearly be seen in L, but only at background levels in LN2 (Fig. 6D, E). In contrast, FASN-R1439Q peptide had higher abundance in LN2 compared to L (Fig. 6F) and the TICs of the transitions were only detected in LN2, but not the L (Fig. 6 G, H).

Functional characterization of CDK12-G879V mutant peptide
The G879V mutation in CDK12 occurs in the DFG motif of the CDK12 kinase that is expected to destabilize the active structure of CDK12 result in a non-functional kinase. CDK12 kinase activity is required for long transcript gene expression such as of the DNA damage response (DDR) genes.
Hence, we hypothesized that the non-functional G879V mutant CDK12 will increase chemotherapy sensitivity in lung adenocarcinoma cells. We knocked out the CDK12 gene in A549 cells stably expressing Cas9 using two CRISPR sgRNAs that target exon 1 and exon 2 of CDK12.
The CDK12 protein expression was significantly lower in both clones compared to the parental cells ( Fig. 7A). Both knock out cell lines were more sensitive to camptothecin, a DNA topoisomerase I inhibitor that inhibits DNA replication, compared to the parental cells ( Fig. 7B), suggesting that CDK12 function is essential for repair from DNA damage.

Discussion
The overall goal of our study was to examine the heterogeneity at the level of the proteome and phosphoproteome across 11 different tumor metastases sequentially acquired by multiple biopsies and surgeries upon progression on combination HER2-directed therapy along with chemotherapy of our patient who survived with metastatic lung adenocarcinoma for over 7 years. We have previously demonstrated unprecedented mutational heterogeneity using whole genome and targeted next generation sequencing (NGS). Only <1% of somatic variants were common between the lymph node and lung metastatic sites in this patient (5). However, we demonstrated similarities in key hallmarks of carcinogenesis, such as proliferation that were manifest by different mechanisms in the lung and lymph node metastatic sites. We hypothesized that analyzing the proteome and the phosphorpoteome will provide more granular details of tumor evolution and natural history of tumor progression in this patient. To our knowledge, this is one of the initial studies to perform comprehensive genomics and mass spectrometry-based quantitative proteomics across multiple sequential biopsies of progressive lesions and an autopsy in an "exceptional responder" lung adenocarcinoma patient. Our patient responded to combination chemotherapy along with HER2-directed targeted therapy over the span of 7 years. We had demonstrated before that our patient harboured L869R ERBB2 mutation in the lymph node metastatic sites, but not the lung metastatic sites. Although ERBB2 was also amplified in both sites, the degree of amplification was far greater in the lung compared to the lymph node sites (5). Here using quantitative mass spectrometry we demonstrate that ERBB2 protein levels were 20-70 fold greater in the lung compared to the two lymph nodes. Interestingly, there was also tumor evolution and selection for higher ERBB2 protein levels in LN2 (LN2/LN1 fold change 3.2) that was procured more than 2 years after the procurement of the progressive lung metastasis by surgery and 2.5 years after procurement of the progressive lymph node LN1. The control of the multiple metastases using HER2-directed combination therapy suggests HER2, through amplification and increased protein levels in the lung and a somatic kinase domain mutation, specifically in the lymph node sites was the major driver of carcinogenesis in this patient.
One major reason why the patient might have survived such a long time with metastatic disease was that his lung metastases were essentially "cured" after the surgical removal of the progressive lung tumor, that was likely the primary lesion. At autopsy, no lung tumor metastases could be detected. One reason could be the significantly increased ERBB2 protein levels in the lung metastases and the absence of the ERBB2 L869R mutation. Hence the lung tumor metastases were more dependent on HER2 inhibition by combination targeted therapy, while the lymph node metastases were relatively resistant and hence continuously progressed. We also demonstrate the effect of the novel CDK12-G879V mutation that was present only in the lung metastasis, but not the lymph nodes in promoting chemotherapy sensitivity of the lung metastases. Here, we show convincingly that the CDK12-G879V mutant is expressed at the protein-level in the lung metastasis, but not the lymph nodes (Fig. 6) Complete knockdown of CDK12 protein expression indeed increased chemotherapy sensitivity of these cells (Fig 7) that correlated with increased DNA damage foci in the knock-down cells treated with chemotherapy suggesting more DNA damage and less repair.
We compared two methodologies to perform quantitative mass spectrometry using DDA data, namely the "super-SILAC" method and the TMT labelling method. The "super-SILAC" method has been used before to quantify the proteome in human breast cancer by spiking "super-SILAC" mix of heavy labelled lysates from breast cancer cell lines (21). We have used a similar strategy, and to our knowledge, first time made a lung adenocarcinoma-specific "super-SILAC" mix of heavy labelled lysates. We developed a pooled sample of heavy labelled lysates of 12 cell lines, including 2 immortalized normal lung epithelial cell lines, 1 lung adenocarcinoma (LUAD) cell line without EGFR or KRAS mutations, 6 EGFR mutant LUAD cell lines harbouring the most common EGFR mutations, and 3 KRAS mutant LUAD cell lines. Since this pool uses some of the major subtypes of lung adenocarcinoma, this can be used to compare mass spectrometry-based proteomics experiments performed in different laboratories. The problem with "super-SILAC" data is that of missing data. Our super-SILAC strategy identified slightly more number of proteins, however, quantified proteins were similar to the TMT labelling strategy (Fig. 1D, E). More importantly, the correlation of the quantitation data overall was quite good between the two strategies with R 2 values between 0.64 and 0.79 ( Fig. 2A).
One purpose of this study was to correlate genomic and proteomic data using copy number analysis (CNA), transcriptome analysis (RNA-seq) and mass spectrometry-based quantitative proteomics.
CNA clearly showed that the lung metastatic site was significantly different than 8 other lymph node metastases (Fig. 4A). This reinforces the unprecedented genomic tumor heterogeneity already demonstrated in this patient at the level of somatic variants, in which we had shown that there were <1% similarity of somatic variants between the lung and lymph node metastases by whole genome sequencing (WGS) (5). Adequate quality of RNA sample was available from three distinct metastases, namely LN1, L and PDX. Spearman's correlation (range-0.28-0.52) of the RNA-seq-based FPKM values for transcript expression and either the super-SILAC or the TMT protein ratio across datasets demonstrated that there is low correlation of the transcript and protein data from the same sample and this is in the range demonstrated before in various systems.
We made patient-specific database using the WGS data published before (5)  Proteomic and phosphoproteomic heterogeneity has been demonstrated by quantitative mass spectrometry in pancreatic cancer from metastatic sites from a single patient procured at autopsy.
Interestingly, an AXL inhibitor was more effective in lung and liver metastatic sites compared to the one from the peritoneum, suggesting that there was heterogeneity in activation of signalling pathways in different metastatic sites (22). Our study involves multiple metastatic sites procured during the treatment course over 7 years, including at autopsy. We also demonstrate the importance of integrated proteogenomic analyses in such studies to identify variant peptides in mass spectrometry data. We demonstrate how CDK12G879V mutant specific to the lung metastatic site likely contributed to the superior sensitivity of the lung metastatic sites to combination of chemotherapy along with HER2-targeted therapy. We also developed an MRM assay to quantify the CDK12G879V mutant peptide. We could identify this variant peptide in the lung lysates without any enrichment, likely due to high mutant protein expression and confirming the amplification of CDK12 from whole genome sequencing and CNV-seq data.
In summary, here, we have shown, the feasibility of multiple biopsies through a patient's treatment course culminating in an autopsy upon death of an "exceptional responder" lung adenocarcinoma patient. The global proteomics and phosphoproteomics performed here along with CNA and transcript expression described in this study and the NGS analyses published previously illuminate the unique tumor evolution of this patient. Most importantly, our study highlights the functional significance of one of the most relevant somatic variants, the CDK12-G879V mutation that could be identified directly at the peptide-level by both DDA and MRM mass spectrometry. This variant may have influenced the overall prognosis and treatment response in this patient. 12 heavy labeled ( 13 C6 Arg, 13 C6 Lys) lung adenocarcinoma cell lines were mixed at equal amount and used as standard. Then the three tumor biopsies (LN1, Lung, and LN2) were combined with the standard at equal amount, followed by tryptic digestion, high pH-RPLC fractionation, and then subjected to LC-MS/MS analysis on an Orbitrap Elite. (C) Experimental workflow of TMT based quantitative mass spectrometry. Three tumor biopsies (LN1, Lung, and LN2), one PDX sample, and five tumor autopsies from the patient were used in this experiment. All the samples were pooled at equal amount to make the reference. The digested peptides or TiO 2 enriched phosphopeptides were TMT labelled and then combined at equal amount, followed by high pH-    Correlation of tissue LN1, Lung, and PDX between proteome results of TMT and RNA-seq results.
X axis is the Log 2 intensity; Y axis is the Log 2 RPKM value.  On the left side are spectra of the endogenous "light" peptides identified in the lung (CDK12-G879V) and lymph node (FASN-R1439Q) lysates; on the right side are spectra of the synthetic "heavy" peptides labelled with 13 C 6 Arg and 13 C 6 Lys.    Sirtuin Signaling PKA Signaling