PROBING GENUINE STRONG INTERACTIONS AND POST-TRANSLATIONAL MODIFICATIONS IN THE HETEROGENEOUS YEAST EXOSOME PROTEIN COMPLEX

The characterization of heterogeneous multi-component protein complexes, which goes beyond identification of protein subunits, is a challenging task. Here we describe and apply a comprehensive method that combines a mild affinity purification procedure with a multiplexed mass spectrometry approach for the in-depth characterization of the exosome complex from Saccharomyces cerevisiae expressed at physiologically relevant levels. The exosome is an ensemble of primarily 3’->5’ exoribonucleases and plays a major role in RNA metabolism. The complex has been reported to consist of 11 proteins, in molecular weight ranging from 20 to 120 kDa. By using native macromolecular mass spectrometry we measured accurate masses (around 400 kDa) of several (sub)-exosome complexes. Combination of these data with proteolytic peptide LC tandem mass spectrometry using a LTQ-FT-ICR and intact protein LC mass spectrometry provided us with the identity of the different exosome components and (sub)-complexes, including the subunit stoichiometry. We hypothesize that the observed complexes provide information about strong and weak interacting exosome-associated proteins. In our analysis we also identified for the first time phosphorylation sites in seven different exosome subunits. The phosphorylation site in the Rrp4 subunit is fully conserved in the human homologue of Rrp4, which is the only previously reported phosphorylation site in any of the human exosome proteins. The described multiplexed mass spectrometry-based procedure is generic and thus applicable to many different types of cellular molecular machineries, even if they are expressed at endogenous levels.


INTRODUCTION
One of the most intriguing views that have emerged out of large-scale proteome-wide analyses of proteinprotein interaction in yeast and other organisms (1)(2)(3)(4)(5)(6) is that most proteins do not "act on their own".
Consequently, it has been proposed that a cell may be better described as a network of interlocking assembly lines (3,7,8), each of which is composed of large protein machinery complexes, interacting with DNA, RNA, lipids, carbohydrates and other biomolecules. The components of these assemblies may vary over time as a function of the environment, induced by for instance protein post-translational modifications or signaling molecules. Up to now these large-scale studies of protein-protein interactions have contributed a quite static picture; the dynamic (spatial or temporal) nature has been less reported. It is now accepted that protein complexes are composed of "core" highly co-expressed and tightly interacting components that are decorated by transcriptionally and differentially regulated proteins. The monitoring of the dynamic and temporal assembly/disassembly or the recruitment of specific components of protein complexes requires their isolation from their physiological environment. In that respect, affinity purification coupled to mass spectrometry has proven to be a very powerful approach (9,10); it allows the analysis of protein complexes, which are expressed at physiological levels from endogenous promoters and are assembled in vivo (3,6,11).
Besides the great benefits in large-scale analysis of protein networks, the affinity purification coupled to mass spectrometry strategy has also some disadvantages. On the one hand it allows only the pull-down of proteins to the bait-protein that have a reasonable strong physical interaction with the bait, on the other hand the strategy also leads to the non-specific binding of proteins, which for instance bind strongly to the beads or are highly abundant in the cells studied. Additionally, unknowingly and unintentionally, a mixture of different complexes may be isolated at the same time. Finally, the conventional mass spectrometry approach does not provide information about stoichiometry, dynamics, sub-complexes and three-dimensional structure of the protein complex. Therefore, results from affinity purified pull-downs still need to be validated by other methods (3,12). Ideally, one would like to by guest on May 8, 2020 https://www.mcponline.org Downloaded from determine the structure of these complexes by high-resolution structural biology approaches, such as electron microscopy (13,14), NMR (15,16) and X-ray crystallography (17,18), which provide supreme detail on molecular structure. Although, these three techniques have become to a different degree amendable to larger proteins and even protein complexes they are still somewhat limited in their applications into very large and/or very heterogeneous protein complexes.
In recent years it has become apparent that the gentle nature of electrospray ionization enables the analysis of intact non-covalent structures, often referred to as native or macromolecular mass spectrometry (macromolecular MS). For this method, biomolecules are directly electrosprayed from aqueous solutions kept at physiological relevant pH conditions. Coupling of electrospray ionization with time-of-flight mass analysis has greatly increased the mass-to-charge (m/z) range attainable (19), and has thus extended the realm of mass spectrometry also to the field of macromolecular non-covalent complexes such as protein oligomers, chaperone machineries, small viruses and even bacterial ribosome complexes (20)(21)(22)(23)(24)(25)(26)(27). With these capabilities mass spectrometry has now been used to analyze quaternary structures and changes therein that occur upon binding of cofactors, metal ions, nucleotides, ligands etc., information which is essential for understanding the cellular functions of protein machineries. Here we report on the use of macromolecular MS to characterize the genuine stronger physical interactions and subunit stoichiometry of the exosome complex from Saccharomyces cerevisiae, which was expressed at endogenous levels and purified using the tandem affinity purification (TAP) procedure (9,10). In combination with denaturant gel liquid chromatography tandem mass spectrometry (1D gel LC MS/MS) and intact protein chromatography mass spectrometry (LC MS) our data allowed a comprehensive analysis of the exosome.
The exosome is a conserved multi-protein complex that functions in both processing of 3' extended precursor molecules to mature stable RNAs and complete degradation of other RNAs. The complex was originally discovered in S. cerevisiae, but has more recently also been identified in humans, plants, parasites, flies and archaea (28). The archetypal eukaryotic exosome from S. cerevisiae consists of nine core components. Six proteins are homologous to bacterial RNase PH (Rrp41, Rrp42, Rrp43, Rrp45, by guest on May 8, 2020 https://www.mcponline.org
Additional components include the RNase D homologue Rrp6 (32), which is only found in the nuclear exosome, and the RNase R homologue Dis3 (31). Four of the exosome proteins (Rrp4, Rrp41, Dis3 and Rrp6) have been demonstrated to have 3'->5' exoribonuclease activity in vitro. In recent years a number of exosome-associated proteins have been identified (Lrp1, Ski7, Ski2, Ski3, Ski8, Mtr4, Gsp1 and Nip7), which probably participate in the regulation and coordination of the exosome activity in different subcellular compartments (28). There is evidence that the six RNase PH-type proteins of the exosome ensemble form a ring-shaped structure with the three S1 RNA-binding domain containing proteins binding on top of this ring (33)(34)(35)(36). The recent determination of the X-ray structure of an archaeal exosome, consisting of only Rrp41 and Rrp42 components, confirmed this ring-like arrangement of the RNase PH proteins (37). However, the mode of interaction of the exosome-associated proteins has not been identified yet.
Here we set out to further define and analyze the components, protein stoichiometry, posttranslational modifications and relative strength of physical interactions of the exosome complexes using a generic multiplexed mass spectrometry approach. First, we analyzed the exosome-associated proteins by 1D gel LC MS/MS using a linear ion trap fourier transform ion cyclotron resonance mass spectrometer (LTQ-FT-ICR). Second, we analyzed the intact individual protein components protein LC MS. Third, we measured accurate masses of (sub)-exosome complexes by macromolecular MS. Our analysis also revealed for the first time phosphorylation sites in yeast exosome subunits. We compared our findings with suggested models for the exosome structure based on biochemical essays, homology modelling and electron microscopy.

S. cerevisiae strain, cultivation and protein purification
The S. cerevisiae strain MGD35313D, BSY17 containing Csl4 or Rrp42 as the C-terminal tagged entry point was kindly provided by Cellzome AG (Heidelberg, Germany) (3). 2 L of cell culture of S. cerevisiae was grown at 30 ºC in yeast extract-peptone-dextrose medium to an optimal density at 600 nm of 3.8. The cell pellets were lysed mechanically with glass beads resulting in 25 mL cell lysate. TAP purifications were performed essentially as described previously (9,10). In the first affinity purification step, 10 mL cell lysate (~200 µg/ml total protein) in 50 mM Tris/hydrochloride buffer, pH 7.5 containing 100 mM NaCl, 1.  The Netherlands) and a combination of trypsin (10 ng/μL) and endoproteinase Glu-C (20 ng/μL) (Roche, The Netherlands) digests for each gel as described previously (38) for 8 hrs at 37 °C. The digestion was stopped by the addition of 2.5% (v/v) formic acid. LC MS/MS was performed using an LTQ-FT-ICR (Thermoelectron, Bremen, Germany) coupled to an Agilent 1100 Series LC system (vacuum degasser, auto sampler, and one high-pressure mixing binary pump without static mixer). Peptide mixtures were delivered to a trap column (Reprosil C18AQ; 20 mm x 100 µm, packed in-house) at 5 µl/min 100% eluent A (0.1 M acetic acid). After reducing the flow to 100 nl/min by using a splitter, the peptides were phosphorylated serine/tyrosine/threonine as variable modifications. The peptide tolerance was fixed to 10 ppm and the MS/MS tolerance to 0.9 Da. Individual ion scores of > 5 indicated identity or extensive homology (p<0.05), in reality a minimum score of 20 was used as threshold since the calculated confidence was based on such a small protein database. The Mascot score and sequence coverage were used as an indication for the relative abundance of a protein. Primary sequence alignment of yeast exosome proteins with their human homologues was performed using CLUSTAL W, version 1.82 (EMBL-EBI, UK) (40).

Intact protein LC MS analysis
Before sample injection intact exosome complex (23 pmol) was subjected to 0.5 % (v/v) formic acid.
Protein chromatography was performed using an adapted Agilent 1100 Series LC system (vacuum degasser, auto sampler, and one high-pressure mixing binary pump) (Agilent, Palo Alto, CA, USA).
Proteins were delivered to a trap column (Poros10 R2; 19 mm x 150 µm, 10 mm particle size (Applied Biosystems, Framingham, MA); packed in-house) at 5 µl/min 100% eluent A (0.05% (v/v) trifluoro acetic acid. After reducing the flow to 1 μl/min by using a splitter, the proteins were transferred to the analytical column (Vydac TP214 C4 RP; 123 mm x 150 μm, 5 μm particle size; packed in-house) with a linear gradient from 0 to 80% eluent B (0.05% (v/v) trifluoro acetic acid, 80% (v/v) acetonitrile) for 40 min. The column eluent was directly introduced into a modified electrospray ionization time-of-flight instrument (Waters, Micromass LC-T, Manchester, UK) equipped with a Z-spray nanoflow electrospray source. The instrument settings were adjusted for optimal transfer of the ions into the mass spectrometer (capillary voltage 3,000 V, sample cone 80 V, desolvation gas 180 L/hr, desolvation temperature 120 °C). The mass spectra were externally calibrated with 4 mg/ml cesium iodide and analyzed by MassLynx 4.0 software (Waters).

Macromolecular MS analysis
by guest on May 8, 2020 For the macromolecular MS experiments exosome samples were prepared in 50 mM aqueous ammonium acetate, pH 6.8 by using ultra-filtration units with a cut-off of 100 kDa (Millipore, Bedford, UK).
Exosome sample (1 pmol; concentration 0.5 μM) was introduced into the modified electrospray ionization time-of-flight instrument mass spectrometer using nanoflow electrospray glas capillaries. The instrument was modified by introducing a speedivalve between the sample cone and extraction cone. To produce intact ions in vacuo from large complexes in solution the ions were cooled by increasing the pressure in the first vacuum stages of the mass spectrometer. In addition efficient desolvation was needed to sharpen the ion signals in order to determine the stoichiometry of the complexes from the mass spectrum. Therefore, source pressure conditions were raised and nanoflow electrospray voltages were

Multiplexed mass spectrometry approach to characterize the exosome complex
Here we describe a comprehensive method that combines a mild affinity purification procedure with a multiplexed mass spectrometry approach for the in-depth characterization of endogenously expressed exosome complexes from S. cerevisiae. The procedure is schematically outlined in Fig. 1.

Identification of exosome components by LC MS/MS analysis
For the present study we used both Csl4 and Rrp42 as the tagged entry points to purify the exosome complex with the TAP procedure. Unless stated otherwise the results described in this report are from the measurements with the Csl4 TAP-tagged exosome. We introduced a few modifications to the standard affinity purification procedure to obtain pure exosome complexes with a minimum of non-specifically The database searches identified 13 unique proteins, which were either exosome core proteins (Rrp41, Rrp42, Mtr3, Rrp43, Rrp46, Rrp45, Rrp4, Rrp40 and Csl4) or proteins known to be associated with the exosome (Rrp6, Dis3, Lrp1 and Ski7) (28,31). All proteins, apart from Lrp1, had good sequence by guest on May 8, 2020 coverage in between 47 % (Mtr3) and 91 % (Dis3 and Csl4) (supplementary tables I and II). The low sequence coverage of Lrp1 (19%) may indicate that it is only present in sub-stoichiometric amounts within the purified exosome complex. Lrp1 is known to be only present in the nuclear exosome giving a rational for the sub-stoichiometric presence of this protein (32). Also Rrp46 had relatively low sequence coverage (47%), which may be explained by the relatively low content of lysine and arginine residues in Rrp46. The resulting large tryptic peptides are detected less efficiently in our LC MS/MS procedure.
From the previously identified associated proteins we only detected Rrp6, Dis3, Lrp1 and Ski7. We did not detect Ski2, Ski3, Ski8, Mtr4, Gsp1 and Nip7 or any other new exosome-associated proteins. Yeast two-hybrid experiments have shown that the Ski complex (Ski2, Ski3, and Ski8) interacts with the exosome via Ski7 (43). Mtr4 is known to be associated only with the nuclear exosome (29,44) and Gsp1 can interact with Dis3 from the exosome (45). In two-hybrid screens Nip7 has been shown to interact with Rrp43p (46). The low number of exosome-associated proteins is explained by our purification procedure, which leads to relative pure exosome complexes, but may also lead to dissociation of weakly bound proteins.
The phosphorylation-specific staining of the denaturant gel of Csl4-tagged exosome suggested the presence of phosphorylation sites in several exosome proteins ( Fig. 2A). By using LC MS/MS, and in agreement with the phoshorylation-specific stain, one or more phosphorylation sites could be identified in Ski7, Rrp6, Rrp43, Rrp4, Csl4, Mtr3 and Rrp46 (Table I,  On the basis of the sequence data of the N-and C-termini of the identified proteins we could conclude that all proteins except Mtr3 lacked the N-terminal methionine residue (Table II). Moreover, all proteins except Rrp46 and Lrp1 contained an N-terminal acetyl moiety. There was no evidence for a modified C-terminus for any of the proteins. Csl4 contained the calmodulin-binding peptide-part of the by guest on May 8, 2020 TAP-tag (5,077 Da) and therefore it was not possible to draw any conclusions about the C-terminal region of this protein. In databases it is reported that Rrp46 exists in two forms: a short (24,407.3 Da) and long (28,331.9 Da) form (29,47). Our LC MS/MS data did not reveal any peptides originating from the long form of Rrp46 suggesting that our purified exosome complexes contain only the short form of Rrp46.
Moreover, the exact masses of exosome proteins and (sub)-complexes also pointed to the exclusive presence of the short form of Rrp46 (see section Analysis and mass measurement of intact proteins of the exosome).

Analysis and mass measurement of intact proteins of the exosome
The LC MS/MS data identified 13 exosome proteins and provided us with information about N-and Ctermini and phosphorylation sites on the proteins. These data however did not provide so-called full coverage of the proteins and therefore also did not reveal exact masses of the different proteins. To complete the dataset on the individual proteins we performed intact protein LC MS experiments. The components of the purified exosome complex were first dissociated with 0.5% (v/v) formic acid. The dissociated proteins (13 pmol) were then injected onto a Poros10 trap column after which proteins were transferred to the C4 RP column (Fig. 3A). The subsequent on-line analysis of the eluted proteins using an electrospray ionization time-of-flight mass spectrometer allowed us to identify eight exosome proteins (Dis3, Rrp45, Csl4, Rrp42, Mtr3, Rrp41, Rrp40, Rrp46) with a mass error in between 0.4 and 3.5 Da, i.e. mass error < 0.003% (Table II; Fig 3B). Unfortunately, in this assay we were unable to detect the exosome-associated proteins Rrp6, Ski7 and Lrp1, which was probably due to their sub-stoichiometric presence within the complex. Furthermore, we could not detect Rrp4 and Rrp43, which may be related to their stability in the unfolded form.
The determined molecular masses using this protein LC MS approach differed by 2 to 43 Da as compared to the masses predicted from both the gene sequences and the peptide LC MS/MS data (Table   II). We did not take into account the mass increment due to the phosphorylation sites as all these sites were present in sub-stoichiometric amounts. Most predicted and determined masses were with the given   (28)). This expected mass was only 0.13% lower than the measured mass for this complex (402,678 Da). We also calculated expected masses of exosome complexes having different stoichiometries for the non-ring proteins Csl4, Rrp4, Rrp40 or Dis3 (33), but these calculations did not yield molecular masses close to the measured mass. Therefore, we concluded that all components of the cytoplasmic exosome were present in stoichiometric amounts. The second ion series centered around Next, we measured protein complex mass spectra of the Csl4-tagged exosome complexes from a second purification from the same yeast cells (Fig. 4B). The mass spectra of the two purifications were very similar, but also showed some interesting differences, which is likely to be related to a higher temperature (25 ºC) when analyzing the exosome from the second purification (20 v.s. 25 ºC). The mass spectrum of the exosome from the second purification revealed three charge state distributions centered around m/z 7,500, 9,100, and 9,800. Mass determination of the ion series around m/z 9,100 and 9,800 yielded masses of 327,059 ± 1,000 Da and 366,160 ± 1,000 Da, respectively. Although the mass accuracy was somewhat lower than in the first experiment (Fig. 4A) we could unambiguously assign the complexes to exosome 2 and 4, respectively (Fig. 5) Calculations with all possible permutations showed that this protein composition was the only possibility for exosome 5. In our macromolecular MS approach we never observed smaller complexes or complexes including the components Rrp6, Lrp1 and/or Ski7. Together with the peptide LC MS/MS results we may conclude from this that these proteins were present only in sub-stoichiometric amounts, such that the amounts were too low to be detected by the macromolecular MS approach. However, as an alternative we cannot exclude that the two nuclear proteins and Ski7 disassembled from the exosome during our final preparations.
As one can argue that the TAP-tag to Csl4 may weaken the interaction with the core exosome, we also performed macromolecular MS experiments with the Rrp42 TAP-tagged exosome (Fig. 4C). In confirmed that Csl4 only weakly interacts with the core exosome. Therefore, we concluded that the TAPtag did not affect the association of Csl4 with the exosome. by guest on May 8, 2020

DISCUSSION
By using the multiplexed MS approach described here we characterized yeast (sub)-exosome complexes, including protein stoichiometry, protein modifications and strong and weak interacting exosome proteins.
The affinity purification of S. cerevisiae exosome yielded the core exosome complex (Rrp41, Rrp42, Rrp43, Rrp45, Rrp46, Mtr3, Csl4, Rrp4, and Rrp40) and the known associated proteins Dis3, Rrp6, Ski7 and Lrp1. Our analysis revealed for the first time phosphorylation sites in yeast exosome proteins (Table   I, for detailed MS analyses see supplemental data). Intriguingly, the serine phosphorylation at position 152 of Rrp4 is conserved within the human homologue of Rrp4 (Ser124) (49) (Fig. 6), and is the only phosphorylation site previously identified in any of the human exosome components. This serine phosphorylation site is located within the S1 RNA-binding domain of the protein (residues 107-187 in yeast Rrp4). The in vivo relevance of this phosphorylation site remains to be elucidated, but its conservation within the human homologue strongly indicates that it has an important function in the cell.
Ptacek et al. (50) have performed a global analysis of protein phosphorylation in yeast and they suggest several protein kinases may have exosome components as substrates. The two kinases Sky1 and Pho85 recognize Dis3 as their substrate, Atg1 and Swe1 recognize Lrp1 and the serine/threonine specific Pkh3 can phosphorylate Csl4 in vitro. We could not confirm any phosphorylation sites in Dis3 and Lrp1.
Several exosome proteins in which we detected phosphorylation sites (Rrp6, Ski7, Rrp43, Rrp4 and Rrp46) were not tested in the global analysis (50). Our data provide us with the exact phosphorylation sites of several exosome proteins and may give us an idea of the phosphorylation status of the yeast exosome in vivo. Our phosphorylation map of the exosome may not be complete but provides a good starting point to probe the relevance of phosphorylation for the exosome in vivo.
By combining the peptide and protein LC MS data with native macromolecular MS we obtained data on intact exosome complexes and sub-complexes thereof. The largest complex that could be observed theoretically would have a mass of 591,710 Da (Fig. 5), assuming that all components were present in a 1:1 stoichiometry. That we have never observed this complex by macromolecular MS is by guest on May 8, 2020 likely due to the sub-stoichiometric amounts of Rrp6, Ski7 and Lrp1 and/or to the weak interactions of these proteins to the core exosome. Rrp6 and Lrp1 are known to be present only in the nuclear exosome (32) and previous studies have indicated that Ski7 interacts only weakly with the exosome core complex (51). The global analysis of protein expression in yeast has provided the abundance of the yeast proteome in the logarithmic growth phase (5). The number of protein molecules per cell may be related to the expected sub-stoichiometric amounts of Rrp6, Ski7 and Lrp1 in the exosome. The core exosome components have in between 3,180 and 10,800 copies per cell, but Dis3 has only a reported copy number of 606 (Supplementary table 1). The copy numbers of Rrp6 and Ski7 are 2,160 and 233, respectively (Lrp1 was not determined in (5)). Thus, the low abundance of Rrp6 and Ski7 is in line with its substoichiometric presence within (sub)-exosome complexes. The 1D gel also indicates that Rrp6 and Ski7 are only present in sub-stoichiometric amounts as the intensity of the protein band containing both proteins has a somewhat lower intensity than most single protein bands. On the other hand the low abundance of Dis3 was not in agreement with our results. Our macromolecular MS and 1D gel LC MS/MS analyses clearly showed that Dis3 was present in each exosome complex.
The largest complex we observed by macromolecular MS was the core cytoplasmic exosome including the commonly associated protein Dis3 having a measured mass of 402,678 Da. The most abundant complex (exosome 2) that we detected lacks Csl4. From two-hybrid studies, affinity purifications, RNA interference studies (34)(35)(36) and a recent X-ray structure of Sulfolobus solfataticus exosome (37) it is now well-accepted that S. cerevisiae exosome consists of a doughnut-shaped ring structure consisting of Rrp43, Rrp41, Mtr3, Rrp45, Rrp46 and Rrp42 with other proteins associated with one or more of the ring proteins (48). Our multiplexed mass spectrometry approach provides evidence that only the ring structure is very stable in solution. Electron microscopy data in combinations with structure predictions have suggested that Csl4, Rrp40 and Rrp4 are positioned on top of the doughnutshaped ring (33). This representation is in agreement with our data, which showed that Csl4 only weakly interacts and that Rrp40 and Rrp4 moderately interact with the ring structure. We also found that Dis3 has by guest on May 8, 2020 a similar affinity to the purified exosome complex as Rrp40 and Rrp4. Our observation that Rrp40, Rrp4 and Dis3 only dissociated in combination with Csl4 may suggest that Csl4 stabilizes the quaternary structure of the exosome.
We conclude that the yeast exosome does not necessarily behave as a single static complex. The exosome complexes are all organized around a stable hexameric ring to which association and dissociation of proteins takes place. Several exosome proteins are sub-stoichiometrically phosphorylated and we found that the phosphorylation site in yeast Rrp4 is conserved within the human homologue of Rrp4. The in vivo significance of our data needs to be established, but the data may give an indication about the assembly and disassembly of the exosome in vitro. The observation that the macromolecular mass spectra from different purifications and different TAP-tagged target proteins were very similar also revealed the high reproducibility of the current method. The described method is generic and thus applicable to many protein complexes, even when they have been expressed at endogenous (picomol) levels.   show the raw LC MS data from these two proteins. The spectra were deconvoluted by using a maximum entropy algorithm and yielded exact masses of the proteins (error less than 3.5 Da).      a Exosome (sub)-complexes 1-5 are schematically presented in Fig. 5 b Mass difference of 5,077 Da between the exosome 2 species are due to the use of the two different TAPtagged proteins (Csl4 and Rrp42).
by guest on May 8, 2020