Arginine in C9ORF72 Dipolypeptides Mediates Promiscuous Proteome Binding and Multiple Modes of Toxicity

C9ORF72-associated Motor Neuron Disease patients feature abnormal expression of 5 dipeptide repeat (DPR) polymers. We found the most toxic DPRs, PR and GR, were particularly promiscuous binders to endogenous proteins. This included ribosomal proteins, translation initiation factors and translation elongation factors. The corresponding biological impacts were multipronged and included stalling of ribosomes during translation, hypomethylation of endogenous proteins, and the destabilization of the actin cytoskeleton. The findings point to new mechanisms of toxicity in disease caused by arg-rich DPRs.


In Brief
C9ORF72-associated Motor Neuron Disease patients feature abnormal expression of 5 dipeptide repeat (DPR) polymers. We found the most toxic DPRs, PR and GR, were particularly promiscuous binders to endogenous proteins. This included ribosomal proteins, translation initiation factors and translation elongation factors. The corresponding biological impacts were multipronged and included stalling of ribosomes during translation, hypomethylation of endogenous proteins, and the destabilization of the actin cytoskeleton. The findings point to new mechanisms of toxicity in disease caused by arg-rich DPRs.
Experimental animal and cell culture models expressing the DPRs have revealed poly-GR and poly-PR to be particularly toxic, with the others being comparatively inert (15)(16)(17)(18). Furthermore, although all DPRs are widely distributed in human brain of patients with ALS, only poly-GR is correlated to clinically related regions (19). Various interactome studies have indicated that the poly-GR and poly-PR DPRs engage with RNA binding proteins, ribosome machinery and proteins with low complexity domains, which mediate the formation of membrane-less organelles by phase separation (18,20,21). These interactions have been proposed to negatively impact on the functioning of ribosome biogenesis (22), ribosome activity (23), nucleolus function (22,24), nucleocytoplasmic transport (25,26) and stress granule dynamics (18,20,24).
Here we sought to probe the role of the poly-GR and poly-PR DPRs expressed in a simple cell model by defining what they interact with using quantitative proteomics and examining how short DPR lengths (10ϫ repeats) differed to longer lengths (101ϫ repeats). Our proteomics data suggested potent engagement of poly-PR and poly-GR to ribosome and translational machinery. Given that poly-GR and poly-PR suppress protein translation, we sought to explore the role of the interactions of ribosomes further (18,20,21,23). Here we provide evidence that disruption of protein translation may arise by Arg-rich peptides stalling on ribosomes during their synthesis. We also reveal other key mechanisms mediating poly-PR and poly-GR toxicity. This includes destabilization of the actin cytoskeleton and proteome arginine hypomethylation. Our findings point to the repetitive arginine sequences in the DPRs promoting promiscuous binding to the proteome that in turn enact multiple modes of toxicity.

EXPERIMENTAL PROCEDURES
Experimental Design and Statistical Rationale-The current study was designed with a focus on identifying how expression of the DPRs altered the abundance of the proteome and to define which proteins engaged to the DPRs. We also were interested in how DPR length affected these properties. We undertook this design by using GFP tagged forms to understand the abundances of the DPRs in the cells and as a handle to capture interactors using GFP trap. The general design was to perform biological replicates for statistical quantitation (between 3 and 4 depending on the sample, which was deemed enough based on expectation that substantial changes in the proteome would arise) as matched pairs with a GFP construct versus a DPR-fusion. These constructs were transfected in cells. We compared GFP alone, with GFP fusions of 10ϫ DPR repeats and 101ϫ repeats and used 3-way dimethyl labeling to enable quantitative comparisons between these treatments. The details of the statistical tests that we performed to compare the differences are included in detail in the results section and the figures.
Plasmids-Synthetic genes for short (10ϫ) and (101ϫ) dipeptide repeats were synthesized by GeneArt (Life Technologies, Regensburg, Germany). The full sequence information is available (supplemental Table S1). These constructs were flanked 5Ј by XhoI and PstI recognition sites and 3Ј by BclI and BamHI recognition sites. XhoI and BamHI enzymes (NEB, Ipswich, Massachusetts) were used to introduce the gene cassette into the mammalian expression vector pEGFP-C2 vector so the GFP tag was on the N terminus of the 10ϫ repeat sequence. The 101ϫ DPR sequence were inserted between PstI and BclI sites in the backbones of the previously developed pEGFP-C2-10ϫ DPR via classical restriction digest cloning following manufacturers' recommendations. Transformations were performed using recombination-deficient Stbl3 E. coli (Life Technologies) at 30°C.
Cell Culture-Neuro-2a and HEK293T cells, obtained originally from the American Type Culture Collection (ATCC, Manassas, Virginia), were maintained in Opti-MEM (Life Technologies) and Dulbecco's modified Eagle medium (DMEM) (Life Technologies), respectively. The medium was supplemented with 10% v/v fetal calf serum, 1 mM glutamine, and 100 Unit ml Ϫ1 penicillin and 100 g/ml streptomycin, and cells were kept in a humidified incubator with 5% v/v atmospheric CO 2 at 37°C.
Flow Cytometry-For analysis of DPR expression levels, cells were harvested 48 h post-transfection, in 500 l PBS containing 0.5 l of 5 M SYTOX Red dead cell stain (Invitrogen). Cells were analyzed using LSRFortessa X-20 flow cytometer (BD Biosciences). 120,000 events per sample were collected at a high flow rate. Side and forward scatter height, width, and area were collected to gate for single live cell population. GFP flouresence were collected with the 488-nm laser and FITC (530/30) filter to gate for transfected cells. For SYTOX Red dead cell stain, fluorescence was collected using 640-nm laser and APC (670/14) filter. Flow cytometric gating and data analysis was performed using FlowJo software (v10.5.3) and graphs were analyzed in GraphPad Prism 7.05.
Confocal Imaging-Cells expressing GFP-tagged DPRs were fixed 48 h after transfection in 4% paraformaldehyde for 15 min at room temperature. Nuclei were counterstained with Hoechst 33342 at 1:200 dilution (Thermo Fisher Scientific, San Jose, CA) for 30 min then washed twice in PBS. Fixed cells were imaged on a Leica SP5 confocal microscope using HCX PL APO CS 40ϫ or 63ϫ oil-immer-sion objective lens (NA 1.4) at room temperature. The Hoechst 33342 channel was collected with an excitation wavelength of 405 nm and emission wavelengths of 445-500 nm; EGFP was collected by excitation at 488 nm and emission from 520 -570 nm. Single color controls were used to establish and adjust to remove bleed through of the emission filter bandwidths. FIJI version of ImageJ (27) and Inkscape were used for image processing.
Longitudinal Live Cell Imaging-Neuro-2a cells in 12-well plate format were co-transfected with individual GFP-tagged DPRs along with mCherry in a pT-Rex vector (Life Technologies). The media was refreshed 24 h after transfection and cells were then imaged longitudinally with a JuLI stage live cell imaging system with fluorescent images acquired at 15 min intervals for 96 h (Nanoentek, Seoul, South Korea). Channels used: GFP for EGFP (Excitation: 466/40, Emission: 525/50), RFP for mCherry (Excitation: 525/50, Emission: 580 LP).
Death was recorded as the time points at which mCherry fluorescence was lost. This event corresponded to the loss of membrane integrity and cell death and was found to be a highly sensitive and specific assay of cell death through different pathways and in different types of cell (28,29). Cells that drifted from focus were censored. For statistical analysis, survival time was defined as the imaging time point at which a cell was last seen alive. Kaplan-Meier curves were used to estimate survival and hazard functions with GraphPad Prism software. Differences in Kaplan-Meier curves were assessed with Log-rank (Mantel-Cox) test.
Sample Preparation for Proteome Analysis of GFP-immunoprecipitated Samples-6 ϫ 10 6 Neuro-2a cells were seeded into 75 cm 2 flasks and transfected the following day with either GFP-tagged DPRs or GFP-only constructs (24 g DNA and 60 l Lipofectamine 2000) according to the manufacturer's instructions (Life Technologies). The experiment was designed as 3 or 4 biological replicates. Media was refreshed 24h after transfection. At 48 h post-transfection, cells were gently rinsed with PBS and harvested in PBS by gently pipetting. Cells were pelleted (120 g; 6 min; room temperature) and resuspended in 1 ml PBS and pelleted again (400 ϫ g; 6 min; room temperature). The pellet was resuspended in 200 l ice-cold lysis buffer (10 mM Tris-HCl, pH 7.4; 150 mM NaCl; 0.5 mM EDTA; 0.5% v/v NP-40; 1 mM PMSF; 10 units/ml DNase I) supplemented with EDTA-free Complete protease inhibitor mixture (Roche Diagnostic, Basel, Switzerland). The cell suspensions were then passed through a 27 Gauge syringe needle 25 times, followed by a 31 Gauge needle 10 times and incubated on ice for 30 min. The resultant lysates were clarified by centrifugation (21,000 ϫ g; 10 min; 4°C). Protein concentrations were quantified by the Pierce BCA Protein Assay (Catalogue Number: 23225, Thermo Fischer Scientific, MA) using bovine serum albumin (BSA) as the mass standard. 0.5 mg of cellular protein was added to 25 l of GFP-Trap MA beads (ChromoTek) pre-washed and equilibrated in the wash buffer (10 mM Tris/Cl pH 7.4; 150 mM NaCl; 0.5 mM EDTA; 1 mM PMSF; EDTA-free Protease inhibitor mixture). The solution was incubated for 2 h at 4°C (end-over-end rotation). Magnetically separated beads were then washed 3 times with wash buffer and 2 times more with 25 mM triethylammonium bicarbonate (TEAB) buffer. The immunoprecipitated proteins were eluted by the addition of 100 l of 50% v/v aqueous 2,2,2-Trifluoroethanol (TFE), 25 mM TEAB. The supernatant was collected after pelleting (2000 ϫ g; 2 min; room temperature) and adjusted to a final concentration of 100 mM TEAB by addition of 1 M stock solution (and the pH was validated to be ϳ7 after this treatment). The samples were further processed for mass spectrometry analysis.
Sample Preparation for Whole Proteome Analysis-Neuro2a cells expressing GFP-tagged 101ϫ DPRs were harvested 48 h post transfection in PBS with a cell scraper and gentle pipetting. Cells were pelleted (120 ϫ g; 6 min) and resuspended in 2 ml PBS supplemented with 10 units/ml DNase I and filtered through 100-m nylon mesh before analysis by flow cytometry. Just before sorting, 2 l of the nuclear marker DAPI (1:1000, D1306, Thermo Fisher Scientific) was spiked into cell suspensions to stain dead cells. Cells were sorted using a FACS ARIA III cell sorter (BD Biosciences) equipped with 405-nm, 488-nm, 561-nm and 640-nm lasers using a 100-m nozzle. Gating was performed with BD FACS Diva software (BD Biosciences). Cells (1,000,000) of each population of interest were sorted at a speed of 1500 cells/s. Side scatter (SSC) and forward scatter (FSC) height, width, and area were collected to gate for the single cell population. DAPI area was collected to gate-out dead cells. Data were also collected for pulse height, width, and area of GFP with the FITC filter. To match for expression, cells were further gated to the same median GFP intensity of 2200 fluorescence units by varying the window of expression. Cells were sorted in parallel across 3 days and performed as three matched replicates. Cells were kept on ice for all steps of the sorting preparation and handling. The targeted population was directly sorted into PBS, pelleted (120 ϫ g, 6 min), and snap frozen in liquid nitrogen then kept at Ϫ80°C until use.
Cell pellets were thawed and resuspended in 100 l RIPA lysis buffer (25 mM Tris-HCl, pH 7.4, 150 mM NaCl, 1% NP40, 0.1% SDS, 1% Sodium deoxycholate, 1ϫ complete mini-protease mixture; Roche), and incubated on ice for 30 min. The concentration of proteins was measured by the Pierce BCA Protein Assay according to the manufacturer's instruction (Thermo Fisher Scientific). Equal amounts of protein for each sample were precipitated with six volumes of prechilled (Ϫ20°C) acetone, and incubation overnight. Samples were pelleted at 21,000 ϫ g at 4°C for 10 min. Acetone was decanted without disturbing the protein pellet. The pellets were washed once with prechilled acetone then allowed to dry for 10 min. The protein precipitates were resuspended in 100 l 0.1 M TEAB and were vortexed and sonicated 3 times for 30 s to help solubilize the pellet. The samples were further processed for mass spectrometry analysis.
Mass Spectrometry Analysis-Proteins were subjected to reduction with 10 mM tris(2-carboxyethyl)phosphine (TCEP), pH 8.0, and alkylation with 55 mM iodoacetamide for 45 min, followed by trypsin digestion (0.25 g, 37°C, overnight). The resultant peptides were then desalted by solid-phase extraction following acidification in 1% v/v formic acid; the cartridge (Oasis HLB 1 cc Vac Cartridge, product number 186000383, Waters Corp., Milford, Massachusetts) was prewashed with 1 ml of 80% v/v acetonitrile (ACN) containing 0.1% v/v trifluoroacetic acid (TFA) and equilibrated with 1.2 ml of 0.1% v/v TFA three times. Samples were then loaded on the cartridge and washed with 1.5 ml of 0.1% v/v TFA before being eluted with 0.8 ml of 80% v/v ACN containing 0.1% v/v TFA and collected in 1.5 ml microcentrifuge tubes. Peptides were then lyophilized by freeze drying (Virtis, SP Scientific, Warminster, PA). The peptides were resuspended in 100 l distilled water and quantified using microBCA assay (Catalogue Number: 23235, Thermo Fischer Scientific) with BSA as the mass standard. Then, 10 g of each sample (in a volume of 50 l containing 100 mM TEAB) were differentially labeled by reductive dimethyl labeling using equal volumes (2 l) of 4% light formaldehyde (CH 2 O), 4% medium formaldehyde (CD 2 O, 98% D) or heavy formaldehyde ( 13 CD 2 O, 99% 13 C, 98% D) and 0.6 M Sodium cyanoborohydride (NaCNBH 3 , for light and medium label) or Sodium cyanoborodeuteride (NaCNBD 3 , 96% D, for heavy label) added in sequence. The peptide solutions were incubated on an Eppendorf Thermomixer (Eppendorf South Pacific Pty. Ltd., Macquarie Park, NSW, Australia) at room temperature for 1 h. After quenching with 8 l of 1% v/v ammonium hydroxide followed by 8 l of neat formaldehyde, dimethyl-labeled peptides were mixed up in equal volumes before LC-MS/MS analysis.
Samples were analyzed by liquid chromatography-nano electrospray ionization-tandem mass spectrometry (LC-nESI-MS/MS) using Orbitrap Lumos mass spectrometer (Thermo Fisher Scientific) fitted with nanoflow reversed-phase-HPLC (Ultimate 3000 RSLC, Dionex, Thermo Fisher Scientific). The nano-LC system was equipped with an Acclaim Pepmap nano-trap column (Dionex -C18, 100 Å, 75 m ϫ 2 cm) and an Acclaim Pepmap RSLC analytical column (Dionex -C18, 100 Å, 75 m ϫ 50 cm, Thermo Fisher Scientific). For each LC-MS/MS experiment, 1 g (whole proteome) or L (0.135 g peptide) of the peptide mix was loaded onto the enrichment (trap) column at an isocratic flow of 5 l/min of 3% CH 3 CN containing 0.1% formic acid for 6 min before the enrichment column is switched in-line with the analytical column. The eluents used for the LC were 5% DMSO/0.1% v/v formic acid (solvent A) and 100% CH 3 CN/5% DMSO/0.1% formic acid v/v. The gradient used was 3% B to 20% B for 95 min, 20% B to 40% B in 10 min, 40% B to 80% B in 5 min and maintained at 80% B for the final 5 min before equilibration for 10 min at 3% B prior to the next analysis.
The mass spectrometer was operated in positive-ionization mode with spray voltage set at 1.9 kV and source temperature at 275°C. Lockmass of 401.92272 from DMSO was used. The mass spectrometer was operated in the data-dependent acquisition mode MS spectra scanning from m/z 400 -1500 at 120,000 resolution with AGC target of 5e 5 . The "top speed" acquisition method mode (3 s cycle time) on the most intense precursor was used whereby peptide ions with charge states Ն2-5 were isolated with isolation window of 1.6 m/z and fragmented with high energy collision (HCD) mode with stepped collision energy of 30 Ϯ 5%. Fragment ion spectra were acquired in Orbitrap at 15,000 resolution. Dynamic exclusion was activated for 30s.
Proteomic Data Analysis-For GFP-immunoprecipitated samples, raw MS data were analyzed using Proteome Discoverer (version 2.3.0.81; Thermo Fisher Scientific) with the Mascot search engine (Matrix Science version 2.4.1). Data were filtered against the Swis-sProt Mus Musculus database (version 2016_07; 16794 proteins) combined with common contaminant proteins. GFP sequence (Uni-Prot ID: P42212) was also added to the database. For protein identification, the search was conducted with 20 ppm MS tolerance, and 0.8 Da MS/MS tolerance. The enzyme specificity was set as trypsin. The maximum number of missed cleavage sites permitted was two, and the minimum peptide length required was six. The following modifications were allowed: Oxidation (M), Acetylation (Protein Nterm), Dimethylation (K), Dimethylation (N-Term), Dimethylation: 2H(4) (K), Dimethylation 2H(4) (N-term), 2H(6)13C(2) Dimethylation (K), 2H(6)13C(2) Dimethylation (N-term) (Variable); Carbamidomethyl (C) (Fixed). The false discovery rate (FDR) was calculated by the Percolator node in Proteome Discoverer v 2.3.0.81. Peptide identifications were accepted with a FDR threshold value of 0.01. Protein identifications were accepted with a FDR threshold of 0.05. Proteins were filtered for those identified by at least two peptides, one of which was unique, in all three replicates. The common contaminant, Keratin, was excluded from the data set.
Peptide quantitation was performed in Proteome Discoverer v.2.3.0.81 using the precursor ion quantifier node. Dimethyl labeled peptide pairs (between two comparisons of light, medium or heavy) were established with a 2 ppm mass precision, and a signal to noise threshold of 3. A retention time tolerance of isotope pattern multiplets was set to 0.8 min. Three single peak or missing channels were allowed for peptide identification. The protein abundance in each replicate was calculated by summation of the unique peptide abundances that were used for quantitation (light, medium and-or heavy dimethyl derivatives). Missing quantitation values were replaced with a constant (zero-filling). The peptide group abundance and protein abundance values were normalized to account for sample loading. In brief, the total peptide abundances for each sample was calculated and the maximum sum for all files was determined. The normalization factor was the factor of the sum of the sample and the maximum sum in all files. After calculating the normalization factors, the Peptide and Protein Quantifier node normalized peptide group abundances and protein abundances by dividing abundances with the normalization factor over all samples.
The normalized protein abundances were imported into Perseus software (v 1.6.5.0). Protein abundances were transformed to log 2 scale. The samples were then grouped according to the replicates. For pairwise comparison of proteomes and determination of significant differences in protein abundances Welch's t test based on permutationbased FDR statistics was then applied (250 permutations; FDR ϭ 0.01; S0 ϭ 1). This, and all other t-tests below, are justified on the basis the proteomics abundance data is normally distributed.
For whole proteome data analysis, the conditions were like those above, but with the following differences: Raw MS data were analyzed using Proteome Discoverer (version 2.2.0.388; Thermo Fisher Scientific). For protein identification, the search was conducted with 20 ppm MS tolerance, and 0.6 Da MS/MS tolerance. The maximum number of missed cleavage sites permitted was three. Additional variable modifications for mono-and dimethylation of arginine were included. Peptide quantitation was performed in Proteome Discoverer v.2.2.0.388, and the retention time tolerance of isotope pattern multiplets was set to 0.6 min. After grouping samples, proteins with at least four valid values across all groups were retained for the subsequent analysis. Missing values were imputed from distribution of all other log 2 -transformed protein values from that sample, using the default settings in Perseus (1.8 standard deviation downshift, 0.3 mean downshift). For pairwise comparison of proteomes and determination of significant differences in protein abundances, a twosample Students t test based on permutation-based FDR statistics was applied (250 permutations; FDR ϭ 0.05; S0 ϭ 0.1).
Dual Fluorescence Translation Stall Assay-Genes were synthesized to produce the dual-fluorescence translation stall reporter as described previously (30), except we used mCherry as the red fluorescent protein. DPR constructs or Httex1 with different polyQ expansions (25Q, 72Q and 97Q) were cloned to replace the linker region using PstI and BamHI restriction sites. Frame shifts were corrected using standard PCR-based strategies and were validated by sequencing. Dual fluorescence reporter plasmids were transfected into cells using Lipofectamine 2000 (Life Technologies) according to manufacturer guidelines. Two days after transfection, cells were harvested in PBS containing SYTOX Blue dead cell stain (S34857, Thermo Fischer Scientific) to exclude dead cells from subsequent analysis. Cellular fluorescence of 100,000 events per sample was analyzed on a LSRFortessa X-20 flow cytometer (BD Biosciences) using the 488-nm laser and 530/30 filter for GFP and the 561 nm laser and 610/20 bandpass filter for mCherry. Subsequent analysis of flow cytometry data was done using FlowJo software (v10.5.3) and graphs were analyzed in GraphPad Prism 7.05.
Flow Cytometry Analysis of G-and F-actin-F-and G-actin levels in Neuro2a cells were measured as described (31). Briefly, Neuro2a cells seeded into 12-well plates were harvested 48 h following transfection of different GFP-tagged DPRs. Cells were treated with 2 M cytochalasin-D in 0.1% DMSO/PBS for 1 h for the positive control. Cells incubated with 0.1% DMSO for 1 h were used as negative control (untreated cells). Cells were fixed with 4% paraformaldehyde for 15 min, then permeabilized with 0.2% Triton X-100 for 5 min. After washing in PBS, cells were blocked with 1% BSA in 0.1% v/v Triton X-100/PBS solution for 15 min, and then incubated in the dark at room temperature for 30 min with Alexa Flour 594 deoxyribonu-clease1 (DNase1) conjugate (10 g ml Ϫ1 , Invitrogen Molecular Probes, D12372) for G-actin detection and Alexaflour-405 Phalloidin (1:1000, Invitrogen Molecular Probes, A30104) for F-actin detection. After thorough washing with 1ϫ PBS, fluorescence was measured using blue (BV421), green (FITC) and red (PE-Texas Red) channels on a FACSCanto flow cytometer (BD Bioscience). A total of 1 ϫ 10 4 cells were analyzed per acquisition. Unstained cells were used to set the baseline. G-and F-actin contents were determined from the respective fluorescence and the ratio of F/G calculated from the mean values as determined with FlowJo software.
Confocal Microscopy for F-actin Analysis-Cells were grown in 8-well ibidi culture chambers (Sarstedt, Nü mbrecht, Germany). Cells were washed with PBS once before being fixed with 4% paraformaldehyde for 15 min at room temperature, then washed with PBS 3 times and permeabilized in 0.2% w/v Triton X-100/PBS for 5 min. The cells were then blocked with 1% w/v BSA in 0.1% w/v Triton X-100/ PBS for 15 min at room temperature. F-actin was stained with Alexa Fluor 594 phalloidin (1:1000, Invitrogen, A12381) for 30 min and with Hoechst 33342 (1:200, Thermo Fisher Scientific) for nuclei staining for 30 min in the dark. Cells were imaged on a Leica SP5 confocal microscope (Leica Microsystems, Macquarie Park, Australia) using HCX PL APO CS 40ϫ or 63ϫ oil-immersion objective (NA 1.4) at room temperature. Fluorescence intensity profiles of phalloidin in Ͼ50 cells expressing GFP-tagged DPRs and untransfected cells were analyzed using ImageJ software.
Bioinformatic Analysis-Protein interaction networks were generated using Cytoscape 3.7.1 (32) built-in String (v11.0) (33) with a minimum required interaction score setting of 0.9 for interactome data or 0.7 for whole proteome data. Protein abundance changes were used as node color attributes and the node size reflected the Welch's t test p value. Nodes were manually re-arranged based on GO terms. Adobe Illustrator was used to annotate GO terms on the protein interaction network. Cytoscape 3.7.1 plugin BiNGO was used to perform GO term analyses performed on proteins significantly enriched in the long versus short Arg-rich DPRs immuneprecipitates. Overrepresented categories were chosen after Benjamini & Hochberg False Discovery Rate correction. Hypergeometric statistical test was used to ascertain the p value of GO term enrichment. The annotation plots show the GO terms comprising at least three proteins with Benjamini-Hochberg adjusted p Ͻ 0.05 and the log 2 fold enrichment of their associated proteins were significantly different from zero as assessed by onesample student t test.
A Python algorithm developed by Mitrea et al. (34) was employed in this study to identify proteins exhibiting multivalent arginine-rich motifs (R-motifs) with the sequence pattern, RX n1 RX n2 RX n3 R, where n1, n3 Յ 2 and n2 Յ 20. This algorithm was applied to the DPR-interacting proteins as well as proteins up-or downregulated upon DPR expression. The background proteins observed in the whole proteome analysis and which were not deemed significantly affected in abundance by DPR expression was used as a control. For determining significant differences, Fisher exact test was used.

Characterization of the Cell Model Expressing the DPRs-
Expression constructs of the DPRs were synthesized using mixed codons and an ATG-start codon, designed to minimize the influence of RNA-mediated effects from the repeat sequences. 10ϫ and 101ϫ repeat lengths of the toxic poly-PR and poly-GR, as well as the less toxic polyGA and polyAP, were prepared.
These constructs displayed patterns of localization and toxicity in the mouse neuroblastoma cell line Neuro2a similar to that described previously in other models (15-18, 35, 36) and observed in vivo for cases of MND with C9ORF72 mutation (5,14,37,38). This included a predominately nuclear punctate pattern of localization for PR 10 and PR 101 with residual expression in the cytosol (Fig. 1A). The longer repeat length accentuated the localization into the nuclear foci. GR 10 also formed nuclear foci and appeared almost identical to PR 10 , however GR 101 was excluded from the nucleus and had a mostly diffuse cytoplasmic distribution. By contrast, GA 10 , AP 10 and AP 101 were slightly enriched in the nucleus without forming distinct puncta. Similarly, GA 101 was enriched in the nucleus but also formed large cytosolic (and less commonly nuclear) inclusions in many cells. All DPR constructs expressed at levels comparable to that of GFP except for PR 101 and GR 101 that have very weak expression levels; about 30% lower than that of either GFP or other long DPRs (Fig. 1B). These constructs also had a lower transfection efficiency (supplemental Fig. S1). According to cell survival rates when expressing the DPRs, the poly-PR and poly-GR constructs were most toxic and this was true in both 10ϫ and 101ϫ lengths (Fig. 1C). In contrast, poly-GA was only toxic in 101ϫ lengths and the other DPRS were not toxic.
Arginine-rich DPRs Attracts Promiscuous Proteome Interactions-With the model system established, we next sought to investigate how the toxic 101ϫ length DPRs (poly-GR, poly-PR and poly-GA) interacted with the proteome. To do this we transiently transfected cells with GFP or the 101ϫ DPRs fused to GFP and then captured proteins that bound to the DPRs by immunoprecipitation with GFP trap. Proteins were assessed by quantitative proteomics by comparing the pulldowns to GFP-only transfected cells, with protein levels normalized to protein mass recovered from the immunoprecipitation. Under these conditions, GFP levels were anticipated to differ in the pulldown because of different expression levels (Fig 1B). Indeed, the amount of GFP appeared heavily enriched in the GFP-only control for the toxic DPRs (PR 101 , GR 101 and to a lesser extent GA 101 ) (Fig 2A). Yet, despite the larger amount of GFP coming from the control GFP-only transfected cells (which would enrich for nonspe- cific interactors to GFP), we observed many proteins strongly enriched to the Arg-rich DPRs and comparatively few for GA 101 and AP 101 (Fig 2A; supplemental Table S2). The result suggested two conclusions. One was that the Arg appears to mediate promiscuous binding to the proteome and the second was that these interactions are responsible for mitigating toxicity. The Arg-rich DPRs enriched for proteins involved in ribosome biogenesis and RNA splicing machinery, which is consistent with prior findings (18,36,39,40). However, we also found additional novel interactions with proteins involved in ribosome-translation, cytoskeleton and chromatin machineries (Fig 2B). GR 101 also enriched specifically for methylosome proteins and PR 101 with mitochondrial proteins. GA 101 , which is the only DPR that formed large cytosolic inclusions, was enriched for a very distinct proteome. These interactions may be indicative of a distinct set of molecular mechanisms involved in its more modest toxicity.
Arginine-rich DPRs Lead to Ribosome Stalling-It was previously reported that Arg-rich DPRs can cause translational suppression although a mechanism that remains undetermined (21). Our proteomics data revealed the ATP-binding cassette sub-family E member 1 (ABCE1) was enriched in the PR 101 interactome. ABCE1 is involved in translation termination ribosomal recycling (41), which led us to wonder whether synthesis of the Arg-rich DPRs impairs translation. To examine this possibility, we employed a previously established assay to measure the ability of a protein sequence to stall or delay protein synthesis rates (30). This assay involves a cassette containing two fluorescent reporters on each side of the peptide sequence to be tested for stalling (GFP at the N terminus and mCherry at the C terminus) (Fig 3A). Each construct is encoded in frame without stop codons. However, the test sequence is flanked by viral P2A sequences, which causes the ribosome to skip the formation of a peptide bond but otherwise continue translation elongation uninterrupted. This means that complete translation of the cassette from one ribosome will generate three independent proteins (GFP, test protein, and mCherry) in an equal stoichiometry. However, should the ribosome stall during synthesis, mCherry is produced at lower stoichiometries than the GFP (Fig 3A).
Compared with previously validated FLAG-tagged stalling reporters that either contain 21 AAA codons (which therefore encodes poly-lysine) to stall, or no AAA codons to allow read-through (30), GR 101 and PR 101 constructs induced marked stalling (Fig 3B). This was DPR-length-dependent in that the 10ϫ repeats showed no stalling (Fig 3C). In addition, the AP 101 and GA 101 did not lead to stalling (Fig 3C). Indeed, there was a significantly increased ratio of mCherry to GFP (Fig 3C). A likely explanation for this increased ratio comes from intermolecular fluorescence resonance energy transfer (FRET) arising from low rates of translational readthrough of the stall construct and self-association of these read-through protein products. This was more evident in another control construct of mutant Huntington exon 1 (Httex1), which when containing a polyglutamine (polyQ) expansion above 36 glutamines causes Huntington Disease and becomes highly aggregation prone (42,43). The expanded polyQ forms of Httex1 did not cause stalling (Fig 3D). However, increased pathological lengths of polyglutamine increased the ratio of GFP to mCherry. Microscopic images confirmed the presence of GFP and mCherry in aggregates in cells expressing the GA 101 and Httex1 reporters (supplemental Fig. S2).

FIG. 3. Long Arg-rich DPRs stall ribosomes during translation.
A, Schematic of the reporter construct design. The P2A sequence causes the ribosome to skip the formation of a peptide bond but otherwise continue translation elongation uninterrupted. Complete translation of the cassette from one ribosome will generate three independent proteins (GFP, test protein, and mCherry). However, should the ribosome stall during synthesis (such as through the previously established positive stall reporter sequence with poly-lysine (K20; ϩve sequence. The Ϫve sequence is stall reporter sequence without lysine residues. B, Flow cytograms of Neuro2a cells 48 h after transfection with the indicated test sequences inserted into the reporter. C, Median mCherry:GFP ratios calculated from transfected cells population (from 100,000 analyzed cells). Error bars indicate standard deviations from three independent transfections and flow cytometry measurements. p values determined for one-way ANOVA and Dunnett's post hoc test using the Ϫve as the control. ****, p Ͻ 0.0001; ***, p Ͻ 0.001; ns, p Ͼ 0.05. D, The same assay (and statistical tests) using Huntington exon 1 (Httex1) transfected in HEK293 cells with the indicated polyglutamine (polyQ) lengths.
Arginine-rich DPRs Alter the State of the Actin Cytoskeleton-We next sought to examine whether the interactomes of the Arg-rich DPRs were influence by the length of the repeat sequence (Fig 4A). For this analysis, we measured the relative mental Fig. S3). Almost all GO terms were significantly enriched to the longer Arg-rich DPRs and all were enriched once the relative abundance of the DPRs in the immunoprecipitations was considered (Fig 4A). These data therefore suggest arginine content generally mediates the binding, which may arise through increased charge per molecule (i.e. arg valency) or greater aggregation capacity. We also saw enrichment patterns consistent with different cellular localizations of 10ϫ and 101ϫ PR constructs (as shown in Fig. 1). In particular, the PR 101 DPR revealed a substantial enrichment of proteins in GO terms relevant to nucleus localization (including DNA replication, positive regulation of transcription) which was consistent with the enhanced localization of PR 101 to nucleolar substructures compared with PR 10 (Fig 4A). Although the actin-related GO terms were enriched also for the Arg-rich DPRs, the enrichment correlated with the DPR localization in the cytosol where most actin is expected to reside. Namely, there was a lesser enrichment for PR 101 compared with PR 10 , in accordance with the shift from more diffuse nuclear and cytoplasmic localization into the nucleolar substructures. GR 101 was also excluded from the nucleus compared with GR 10 , and this was reflected in the greater enrichment of GR 101 with actin-related GO terms.
Next we examined whether the enrichment with actin GO terms was indicative of changes in the actin cytoskeleton. The Arg-rich DPRs had a significant impact on the formation of filamentous (F) actin compared with the other DPRs and GFPalone control using a flow cytometry protocol for measuring filamentous (F) and globular (G) actin ratios (Fig 4B). There was no apparent colocalization of actin to the DPRs, certainly not to punctate structures of the DPRs (Fig 4C). However, it was clear that F actin was reduced in individual cells expressing the Arg-rich DPRs (Fig 4C). Therefore, the results collectively suggested that the Arg-rich DPRs, either directly or indirectly, leads to destabilization of machinery involved in actin filament assembly without binding to the actin filaments directly.
Next, we examined how proteome abundances changed in cells expressing the 100ϫ DPRs. Because the DPRs were toxic and had variable expression levels, we sorted transfected cells into similar levels of expression by virtue of GFP levels before analysis. Cell lysates were collected and quantitatively analyzed by a reductive dimethyl labeling proteome analysis approach (Fig 5A; supplemental Table S3). The poly-GR and poly-PR DPRs resulted in the most changes to the proteome in accordance with them having a more potent toxicity and promiscuous pattern of interactions. Poly-GR and poly-PR also had many overlapping GO terms annotated consistent with the Arg content driving a similar pathological consequence (Fig 5B). Conversely, the non-toxic poly-AP resulted in few changes to the proteome abundance and poly-GA had a distinct signature consistent with a different mechanism of toxicity.

Arginine-rich DPRs Lead to Altered Arginine Methylation
Patterns-Arginine methylation has been reported to be abnormal in patients with C9ORF72 mutations, including the presence of arginine-dimethylated enriched inclusions (44,45). Arginine residues are commonly methylated to regulate biological activity of many cellular processes and are important in histones which are enriched GO terms in our data sets. In addition, abnormal histone methylation has previously been reported in a mouse model of PR 50 (46). Furthermore, prior work has suggested that arginine methylase PRMT5 is important in regulating stress granule function in C9ORF72 models of disease and methylates ALS-gene risk factor FUS (44). Here, GR 101 interactome was found to be significantly enriched for 3 proteins in the Methylosome GO term (GO: 0034709) including WDR77 and arginine methylases PRMT1 and PRMT5 (Fig 2B). PRMT1 appears particularly important, accounting for 85% of the methylation activity in mammalian cells (47). Analysis of the interactome data showed that the Arg-rich DPRs were significantly enriching for other arginineenriched proteins that are common substrates for methylases (Fig 6A and 6B). This led us to hypothesize that the Arg-rich DPRs interfere with endogenous arginine methylation activity, that links to the mechanisms of ALS toxicity. To test this hypothesis, we examined the 101ϫ DPRs affected proteome levels and corresponding levels of arginine methylation (supplemental Table S4). Because of our reductive dimethylation proteomics workflow, we could only observe a minor fraction of possible arginine methylation patterns that could exist. However, enough information was obtained to reveal that GR 101 leads to a significantly lower level of arginine methylation relative to the GFP only control (Fig 6C; supplemental Table S3). These same proteins did not show a significant change in abundance (Fig 6C). Examination of these peptides revealed that many come from Hnrnp family proteins including Hnrnpa1, Hnrnpab and Hnrnph1 (supplemental Table S3). Hnrnpa1 showed a statistically significant reduction in Argmethylation in the GR 101 treatment (Fig 6D). Hnrnp family proteins are well-known substrates of PRMT family proteins (48). Mutations in Hnrnpa1 also cause ALS (49). Altogether, these data suggest the possibility that the Arg-rich DPRs act as substrate sinks of arginine methylases that therefore results in a broader deficiency in arginine methylase modification of endogenous proteins. DISCUSSION Here we show that the Arg in the Arg-rich DPRs promotes widespread interactions with the proteome relative to the other less toxic DPRs. These interactions center on various hubs of cellular activity including translation, ribosome biogenesis, chromatin, mitochondria, cytoskeleton, RNA splicing and the methylosome. The effects may be driven by the valency of arginine as well as changes in the cellular localization, and potentially aggregation state, of the different lengths of DPRs. In contrast the inert and non-toxic AP DPRs showed few interactions. GA also showed relative few interactors and unmasked a distinct interactome, consistent with a different mechanism of toxicity to the potently toxic Arg-rich DPRs.
Overall our data are consistent with the Arg-rich DPRs manifesting toxicity through multiple mechanisms. One mechanism appears to arise by the arg-rich sequences causing translation to stall. Our results suggested that lengths longer than at least 10 repeats are needed to induce stalling. Given that the ribosome exit tunnel holds around 33 amino acids and is lined in negative charges, a plausible explanation for stalling is that DPR lengths approaching 16 -17 repeat lengths would be sufficient to fill this cavity volume and lock the peptide in place by electrostatic interactions. Previously it was suggested that a canonical polyadenylate tail on mRNA used to stall ribosomes is translated to lysine and that the poly-lysine sequence is recognized as aberrant by ribosomes and results in translation repression (50). Additional experiments using repeating sequences of Lys and Arg in proteins both slowed translation, which supports the mechanism of electrostatic interactions jamming the emergent poly-basic chain in the negatively charged ribosome exit tunnel (51). Interestingly, antimicrobial peptides enriched for PR-containing motifs have been demonstrated to bind to the ribosomal exit tunnel and inhibit bacterial protein synthesis (52). This raises the possibility that emergent DPRs can also re-enter and plug the exit tunnel through electrostatic interactions. Further support for this additional mechanism comes from in vitro translation assays showing that poly-PR and poly-GR peptides formed insoluble complexes with mRNA, restricted the access of translation factors to mRNA, and blocked protein translation  (34) in the DPR interactomes versus Control, which are the background proteins observed in the whole proteome analysis and which were not deemed significantly affected in abundance by DPR expression. Numbers of proteins are indicated. B, As for A, but for proteins seen significantly up-regulated and downregulated in total protein lysate compared with Control. C, Abundances of methylated peptides seen in the whole proteome, and the corresponding protein abundances from which they derive. The data is plotted as matched-pairs of peptides (or proteins) with differences evaluated by 2-tailed Wilcoxon signed-rank test. p values are coded as ns Ͼ 0.05; **, p Ͻ 0.01; ***, p Ͻ 0.001. The mean difference in abundances of matched pairs of methylated peptides (GR 101 -GFP) is Ϫ399,018. D, Shown are the means Ϯ S.E. of peptide abundance ratios of Hnrnpa1 from the PR101 and GFP samples. Shown are unmodified (n ϭ 24) and arg-methylated peptides (n ϭ 5). Significance of difference was assessed with an unpaired t test with Welch's correction. p value is coded as *, p Ͻ 0.05. (23). This study showed that poly-PR and poly-GR inhibit protein translation by binding to the translational complex and ribosomal proteins, leading to neurotoxicity (23).
Our data also suggested that the Arg-rich DPRs impede assembly of the actin cytoskeleton. Recently it was observed that promoting actin filament assembly in cell models of ALS can alleviate defects in nuclear-cytoplasmic transport defects (53) which supports the conclusion that destabilization of the actin cytoskeleton is pertinent to disease pathomechanisms. We observed a significant reduction in F-actin in cells with Arg-rich DPRs. Whether this effect is a consequence of other cellular effects, such as hypomethylation or ribosome stalling is unclear. Interestingly it has been reported that arginine methylation of an arginine methylase (PRMT2) regulates the activity of actin nucleator protein Cobl, which suggests a possible role for arginine methylation defects being an upstream mediator of effects on the actin cytoskeleton (54). In addition, it is thought that dysregulation of actin is a key process in ALS (55). Of note is that mutations in profilin-1 protein, which mediates the conversion of G-actin to F-actin, is linked to ALS (56). Other cytoskeletal genes have also been linked to ALS including TUBA4A and DCTN1 and KIF5A (57,58).
Another mechanism of toxicity attributable to the Arg-rich DPRs was through a broader hypomethylation of the proteome. Previous studies have shown that PRMT1 colocalizes with GR and PR in a Drosophila model and that knockdown of PRMT family members enhanced toxicity (59). It was also found that C9ORF72-related brain samples had abundant methylated inclusions (59). Thus, the data raises the possibility that the Arg-rich motifs attract and alter the endogenous methylation activity leading to pathological outcomes. The substrate of PRMT family proteins contain glycine-and arginine-rich (GAR) sequences that include multiple arginines in RGG or RXR contexts, which bear resemblance to the Argrich DPRs (60). It follows that many of the key pathways seen in our data set are affected by altered arginine methylase activity -including proteins that are methylated for functional regulation such as histones, proteins involved in mRNA splicing, and ribosomes (61,62).
Also of note is that other genes that when mutated are risk factors for ALS have activity regulated by arginine methylation and show abnormal methylation patterns in disease. In particular, FUS has been reported to interact with PRMT1 and PRMT8 and undergo asymmetric dimethylation in cultured cells (63). Importantly, PRMT1 and PRMT8 localized to mutant FUS-positive inclusion bodies in ALS (63). It has also been reported that arginine methylation modulates the nuclear import of FUS and inclusions in ALS-FUS patients contain methylated FUS (64). We observed the hypomethylation of Hnrnpa1 caused by the arg-rich DPRs, which indicates a possible link to ALS arising from distinct gene mutations. Hnrnpa1 as well as the Arg-rich DPRs and other ALS-associated proteins are known to form molecular condensates by phase separation (39,58). Arginine methylation of Hnrnpa1 reduces its ability to phase separate suggesting that an imbalance in molecular condensate mechanisms contributes to the pathogenic response (65). PR DPRs can also promote the aggregation of ALS-related proteins containing prion-like domains, that are involved in mediating phase separation into molecular condensates (39). Hence our data indicate a possible convergence of multipronged mechanisms involving methylation, phase separation and cytoskeleton as important contributors to the toxicity of the Arg-rich DPRs.

DATA AVAILABILITY
The mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium database via the PRIDE (66) partner repository program with the data set identifier PXD015177 (for interactome data, http://proteomecentral. proteomexchange.org/cgi/GetDataset?IDϭPXD015177) and PXD015180 (for the whole proteome data, http://proteomecentral. proteomexchange.org/cgi/GetDataset?IDϭPXD015180). The information of the identified proteins are provided as the supplemental data. The authors declare that they have no conflicts of interest with the contents of this article.