An interaction network of RNA-binding proteins involved in Drosophila oogenesis

During Drosophila oogenesis, the localization and translational regulation of maternal transcripts relies on RNA-binding proteins (RBPs). Many of these RBPs localize several mRNAs and may have additional direct interaction partners to regulate their functions. Using immunoprecipitation from whole Drosophila ovaries coupled to mass spectrometry, we examined protein-protein associations of 6 GFP-tagged RBPs expressed at physiological levels. Analysis of the interaction network and further validation in human cells allowed us to identify 26 previously unknown associations, besides recovering several well characterized interactions. We identified interactions between RBPs and several splicing factors, providing links between nuclear and cytoplasmic events of mRNA regulation. Additionally, components of the translational and RNA decay machineries were selectively co-purified with some baits, suggesting a mechanism for how RBPs may regulate maternal transcripts. Given the evolutionary conservation of the studied RBPs, the interaction network presented here provides the foundation for future functional and structural studies of mRNA localization across metazoans.

Lysates were cleared by centrifugation at 21,000g for 20 min at 4 ºC and 5 µl/ml of RNase A/T1 (Thermo Fisher Scientific) was added to the supernatants. After incubation at 4 ºC for 30 min, lysates were cleared again and 30-60 µl of GFP-TRAP MA beads (Chromotek) were added. The mixtures were incubated for 1 hour at 4 ºC in rotation. Beads were washed with lysis buffer and proteins were eluted as described above.

Western blotting and detection
Eluates were separated on 10% polyacrylamide gels and transferred to a nitrocellulose membrane.

Experimental design and statistical rationale
For analysis of proteins interacting with each tagged RBP, both label-free and dimethyl labeling MS experiments were performed and raw data was processed by MaxQuant software as described below.
Proteome data comprised a total of 21 raw files (3 biological replicates from each sample) for label-free MS and 2 raw files (2 biological replicates from each sample) for dimethyl labeling MS. Tag alone was used as a negative control for both analyses.

Mass spectrometry measurements
For proteome measurements, eluates were separated on a NuPAGE Bis-Tris precast 4-12% gradient gel (Invitrogen). Samples were run approximately 2 cm into the gel and bands were visualized with a 0.1% Colloidal Coomassie Blue stain (Serva). Proteins were digested in-gel using trypsin. Peptides were desalted Interactome of RNA Binding Proteins from Drosophila ovaries 8 and purified on C18 StageTips (68). LC-MS analysis were performed on a nanoLC (Easy-nLC 1200, Thermo Fisher Scientific) coupled to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific) through a nanoelectrospray ion source (Thermo Fisher Scientific), as described previously (69). In brief, peptides were eluted using a segmented gradient of 10%-50% HPLC solvent B (80% ACN in 0.1% formic acid) at a flow rate of 200 nL/min over 46 min. MS data acquisition was conducted in the positive ion mode. The mass spectrometer was operated in a data-dependent mode, switching automatically between one full scan and subsequent MS/MS scans of the 12 most abundant peaks selected with an isolation window of 1.4 m/z (mass/charge ratio). Full-scan MS spectra were acquired in a mass range from 300 to 1650 m/z at a target value of 3 × 10 6 charges with the maximum injection time of 25 ms and a resolution of 60,000 (defined at m/z 200). The higher-energy collisional dissociation MS/MS spectra were recorded with the maximum injection time of 45 ms at a target value of 1 × 10 5 and a resolution of 30,000 (defined at m/z 200). The normalized collision energy was set to 27%, and the intensity threshold was kept at 1 × 10 5 . The masses of sequenced precursor ions were dynamically excluded from MS/MS fragmentation for 30 s. Ions with single, unassigned, or six and higher charge states were excluded from fragmentation selection.
For dimethylation labeling, after tryptic in-gel digestion, derived peptides were loaded on C18 StageTips and labeled as described (70). Measurements were done the same way as for the unlabeled.

Data Processing and analysis
For label-free MS, raw data files were processed using MaxQuant software suite v.1.6.0.1 (71) at default settings. Using the Andromeda search engine (72) integrated in the software, the spectra were searched against UniProt D. melanogaster (taxonomy ID 7227) complete proteome database (11/07/2017; 23300 protein entries; https://www.uniprot.org/), a database comprising a sequence of the tag alone and a file containing 245 common contaminants. In the database search, Trypsin was defined as a cleaving enzyme, up to two missed cleavages were allowed. Carbamidomethylation (Cys) was set as fixed modification, whereas oxidation (Met) and acetylation (protein N termini) were set as variable modifications. The mass tolerances for precursor and fragment ions were set to default values of 20ppm and 0.5Da, respectively.
MaxLFQ algorithm was activated and the minimum number of peptide ratio count was set to 1 (73). All peptide and protein identifications were filtered using a target-decoy approach with a false discovery rate (FDR) set 9 to 0.01 at peptide and protein level (Supplementary peptides table) (74). A valid protein identification required at least one peptide with a posterior error probability (PEP) value of 0.01 or smaller. To transfer peptide identifications to unidentified or unsequenced peptides between samples, for quantification, matching between runs option was selected, with a match time window of 0.7 min and an alignment time window of 20 min. Matching was performed only between replicates by controlling the fraction numbers. The same parameters were used to process the raw data from the experiments applying dimethyl labeling, except for of the enrichment ratios of duplicates was calculated for each protein. If a protein was found to be associated with more than one bait or identified in both label-free and labeled MS data, the highest fold change value was considered, irrespective of the experiment type. In cases where several nodes were combined into one, the highest value among the respective individual components was considered. Fig. 4 was created using Gephi v. 0.9.2 (79). Modules were detected with an algorithm described in (80), with randomization on, using edge weights and a resolution of 0.5. Force-field-based clustering was performed using the Force Atlas 2 Plugin. Baits were re-positioned manually for clarity.
For GO term analysis, the DAVID (Database for Annotation, Visualization and Integrated Discovery) v. 6

Tagged proteins recapitulate the endogenous localization patterns
To purify RBP complexes from fly ovaries under native conditions, we used transgenic fly lines generated by recombineering. We used the "Tagged FlyFos TransgeneOme" resource (fTRG) (66) expressing Cterminally-tagged proteins under the regulation of their endogenous promoters. These lines carry a 40 kDa tagging cassette consisting of "2XTY1-sGFP-V5-preTEV-BLRP-3XFLAG" that can be used for both in vivo visualization and affinity purification. From the fTRG library, we selected six RBPs for IP-MS, eIF4AIII, Glo, Hrp48, Nos, Stau and Vas. To serve as a control, we generated a transgenic line expressing the tag alone (hereafter referred to as GFP), under the promoter of a moderately expressing gene (exu). To ensure that the RBP fusions are functional in vivo, we checked their localization patterns at different stages of oogenesis.
All the proteins were found to be localized as expected (Fig. S1a, b). Additionally, we also checked their ability to rescue the effects of mutations that cause either lethality or sterility, as summarized in Fig. S1c.
Although only 3 out of 6 transgenes assayed were able to fully substitute for the endogenous copy, their localization in the endogenous patterns suggests that their interactions driving localization during oogenesis have been maintained.

Label-free MS combined with statistical analysis recovers known associations
For IP, we lysed whole ovaries in mild conditions of salt and detergent, and purified the complexes using the GFP-TRAP system (Chromotek) (Fig. 1a). An IP from flies expressing GFP alone was used as a negative control to identify proteins binding nonspecifically to the tag. Since we were interested in identifying RNAindependent protein-protein interactions, the experiments were carried out in the presence of RNases. As the transgenes are regulated by their endogenous promoters, they had varying levels of expression, as observed by Western blotting (Fig. 1b). To compensate for this variability in the IP-MS analysis, we adjusted the number of flies dissected for each transgene (Table S2).
All the samples were prepared in biological triplicates and the resulting spectra were searched against the Drosophila melanogaster proteome database (Fig. 1a, Fig. S2a). For confident identification of proteins and accurate intensity-based Label-Free Quantification (LFQ), we processed the raw data using the MaxLFQ module of the MaxQuant software (71,73). Additionally, we activated the "matching between runs" algorithm to quantify unidentified or un-sequenced peptides in the samples, by transferring peptide identifications among replicates. The global analysis of the proteomes resulted in the identification of 15,005 peptides mapping to 1878 protein groups, at a FDR of 1% at the peptide and protein level. Of these, 1841 unique protein groups were quantified in at least one of the 21 samples, which account for 87.5% of the total ovary proteome of Drosophila (84). The average correlation within replicates ranged from 0.71 (Glo) to 0.92 (Hrp48), suggesting overall good reproducibility of the data (Fig. 1c). In addition, the visualization of LFQ intensities of all the samples as a heat map demonstrates that all the baits were consistently enriched (Fig.   S2c). The replicate profiles looked largely similar, with only minor differences. However, the number of proteins quantified with each bait varied highly, as marked by the absence of information in the heat map ( Fig. S2c).
For statistical analysis, we considered only those proteins that were quantified in all three replicates of a given sample. To identify significantly enriched proteins, we employed the Welch's t-test (with the 5-10% For all the baits, we found several known interactants to be reproducibly enriched over control, mostly with statistical significance (Fig. 2a). For example, we co-purified all the other core components of the Exon Junction Complex (EJC) with eIF4AIII-GFP (25) and the NOT proteins with Nos-GFP (85,86). We also detected known partners of Vas, involved in both pole plasm assembly in the oocyte (Oskar (Osk); Gustavus (Gus); Fat facets (Faf); F-box synaptic protein (Fsn); Fmr1) and production of germline piRNAs in the nuage (Tejas (Tej); Spindle-E (SpnE); Kumo (Qin); Tapas), with high confidence (87). This indicates that our experimental conditions and analysis pipeline can preserve and identify true interactions. In addition, we also identified proteins that are known to be indirectly associated with Vas, such as the Cullin proteins (88) and Tudor-domain proteins Tudor (Tud) (89) and Krimper (Krimp) (90), suggesting that we not only recovered direct interactants, but whole complexes functional in distinct pathways (Fig. S3). With the Glo-GFP bait, we consistently identified other hnRNPs, including Hrp48 and the splicing factor pUF68, consistent with previous reports (38, 91). However, the enrichment was not significant for these proteins, suggesting weak or transient interactions.

Quantitative analysis complements the label-free MS data
To detect fold changes in protein abundances with high precision, we used dimethyl labeling MS. This approach is advantageous for Drosophila, where metabolic isotopic labeling is still challenging. To confirm the results of our label-free analysis, we carried out dimethyl labeling MS for the Hrp48-and Vas-GFP samples. We chose Hrp48 and Vas as these proteins express well and they are involved in the translational regulation of the same set of transcripts (35, 36, 40, 45-47). We carried out the experiments in duplicates and purified the samples the same way as for label-free MS (Fig. S2b). After in-gel digestion, we labeled the peptides with heavy, medium or light isotopes and inverted the labels in the replicate, to minimize the variability due to the labeling procedures. As before, the raw data were processed with the MaxQuant software (71), providing confident identification of proteins (1% FDR) and normalized protein-abundance ratios. The analysis of the Vas-and Hrp48-associated proteomes resulted in the identification of 4027 peptides, mapping to 615 protein groups. The replicates showed high correlation and the abundance ratios calculated could be well duplicated. We considered as a hit those proteins that we identified with an abundance ratio of >2 in both replicates. Consistent with the label-free analysis, we found several known interactors, most of them reproducibly enriched (Fig. 2b). To check how the two analyses relate to each other, we mapped the proteins identified in labeled MS onto the label-free MS data. As shown in Fig. 3, the proteins that were significantly enriched in the labeled MS followed the same distribution profile and showed up to 47% overlap (for Vas) with those enriched in the label-free MS analysis. Background proteins identified in labeled MS (<2 fold in both replicates) showed a similar profile when graded on the corresponding label-free MS data (Fig. 3). To get a comprehensive view of the proteomes associated with Hrp48 and Vas, we combined the enriched proteins from both analyses.

Global analysis of RBP interactomes reveals novel protein interactions
To understand how the proteomes identified with each bait interact with each other, we built a composite network of all statistically significant interactants. While each bait has its distinct proteome, the network is also connected, in agreement with the spatial and temporal distribution of the transcripts they regulate. In particular, we observed a considerable overlap between functionally related proteins such as Hrp48 and Glo (Fig. S4). To gain a systemic understanding of the network, we also added the known protein-protein associations from the String v. 11.0 (77) and FlyBase (78) databases. Next, we carried out a modularity analysis to identify highly connected communities of proteins (Fig. 4a). As expected, many proteins involved in oogenesis, mRNA localization, translational regulation and germ cell formation were selectively enriched with all the baits. Additionally, proteins involved in neurogenesis and splicing were also overrepresented, consistent with the well-studied function of selected RBPs in these processes (Fig. 4b).
Interestingly, we observed ribosomal proteins (components of both large and small subunits) to be significantly enriched with Vas-GFP and Hrp48-GFP, but not with the other RBPs analyzed (Fig. 5, Fig. S4).
Previous studies have shown the requirement of Vas in translational activation of osk, nos and grk mRNAs (35, 36, 40, 49). However, the molecular mechanism by which Vas activates translation is unclear. Studies in Drosophila have shown that Vas directly binds the translation initiation factor eIF5B (dIF2) to positively regulate grk and mei-P26 translation and possibly other germline-specific transcripts (92)(93)(94). In addition, Vas also interacts genetically with the translation initiation factor eIF4A for efficient germ cell formation (95).
However, neither eIF5B nor eIF4A were detected or enriched in our dataset. Instead, other translation initiation factors involved in the formation of the pre-initiation complex, such as eIF2, eIF3 and the cap-binding complex of eIF4E-4G were selectively co-purified (Fig. 5). This is consistent with the recently reported interaction of eIF3 subunits with Vas in the Drosophila oocytes (56).
Hrp48 is required for the translational repression of osk and grk mRNAs (45-47). Consistent with this, we copurified with Hrp48-GFP (in both label-free and labeled MS experiments) several P-body-components, associated with the RNA repression/decay machinery, most notably the deadenylase and decapping complexes (Fig. 6). Ribosomal proteins and translation initiation factors were also significantly enriched with the Hrp48-GFP bait, similar to Vas-GFP (Fig. 5). This includes eIF3d, which has recently been reported to interact with Hrp48 to translationally repress the msl-2 mRNA (96).

In vitro validation of protein-protein interactions
To validate the results of our IP-MS analysis, we co-expressed bait-candidate pairs in cultured mammalian HEK293 cells. This system can be effectively used to study direct interactions of Drosophila proteins because it reduces the likelihood of endogenous proteins mediating indirect associations. For validation, we selected significantly enriched candidates identified in the label-free MS analysis. Additionally, we also considered functionally relevant partners, enriched with a >2 times fold change but excluded by statistical filtering. For Hrp48 and Vas, where information from differential labeling was also available, we selected candidates among the interacting proteins identified in both datasets.
Typically, we co-expressed an EGFP (referred to as GFP)-tagged bait with an HA-tagged candidate protein and used the GFP-TRAP system (Chromotek) for IPs. We observed that small proteins (<25kDa) expressed poorly as fusions with HA or HA-Flag. Substitution with the GFP tag improved the expression in all the cases tested. To be able to validate the interactions of such small proteins, we co-expressed an HA-Flag-tagged candidate with a GFP-tagged bait and performed the IP with anti-Flag (Fig. 7f, h). Since Vas does not express well as an HA-or HA-Flag fusion, the small proteins could not be assayed efficiently for interaction with Vas.
As negative controls, we used MBP or GFP. We also included known interactions as positive controls, wherever possible. All the tested candidates are indicated in Fig. S5.
Out of 90 protein-protein interactions assayed, we could confirm 32 interactions (35%), of which 26 were found to be novel (summarized in Fig. S6a). All positive interactions were confirmed at least 3 times, in independent experiments. In addition, we were also able to validate some of the interactions by reciprocal IP, as shown in Fig. 7i. To further confirm that our MS analysis pipeline effectively separated background binders from true interactants, we also tested Nucleophosmin (Nph), which was depleted in all IP-MS datasets. We could not detect interactions of Nph with any of the 4 baits tested in vitro (data not shown), in line with the MS data. Additionally, we also tested Sqd that has been reported to interact with Hrp48 in an RNA-dependent manner (47). The negative results further confirm that our experimental conditions effectively disrupted RNA-mediated associations and only protein-protein interactions were confidently identified in our analyses. To visualize the co-IP results, we integrated the validated interactions with the IP-MS data (both labeled and label-free) and information from the literature to create a subnetwork (Fig. S6b). As the majority of the validated interactants are known regulators of maternal mRNAs, this subnetwork highlights a general machinery involved in oocyte development in Drosophila.

Discussion
This study presents a proteome interaction network of six RBPs (eIF4AIII, Hrp48, Glo, Nos, Stau and Vas) required for the localization of maternal mRNAs in Drosophila. To construct this network, we purified complexes associated with bait proteins and used the MaxLFQ algorithm (73) for label-free relative protein quantification. The accuracy of this approach is comparable to labeled MS techniques such as SILAC (97).
By statistical filtering, we could separate background and specific binders for each bait. Several well characterized interactions were significantly enriched with most baits, indicating the efficacy of our workflow.
To complement these data, we also obtained MS data from dimethyl labeling for a subset of the baits. We were able to validate 32 of the interactions assayed, including several novel associations.
In addition to the known regulators of mRNA localization and oocyte patterning, we co-purified nuclear and

Nuclear processing is intrinsically linked to cytoplasmic targeting of maternal mRNAs
In addition to their crucial role in pre-mRNA processing, splicing factors also affect the cytoplasmic fates of mRNAs. SmB, a spliceomosomal Sm protein is a known osk mRNP component. SmB contributes to germcell specification, at least in part by facilitating osk mRNA localization (103) and fails to localize to the posterior of the oocyte in the absence of Vas (104). Several Sm proteins have also been detected to be associated with Vas in the oocytes (56). Consistent with this, we co-purified SmB with Stau, Glo, Hrp48 and Vas. We also purified other splicing regulators with several of our baits, including pUF68 (105) and the SR family proteins SC35 and SF2 (106,107). Both SC35 and SF2 could be validated for their interaction with Glo and Stau, while pUF68 bound eIF4AIII and Stau in vitro. pUF68 was previously shown to interact with Hrp48 and Glo (38) and SF2 co-purifies with the short isoform of Osk (108). These results together with the well-studied role of SR proteins in cytoplasmic regulation of gene expression including mRNA export, decay and translation in mammalian systems (109,110) suggest that splicing factors are bona fide components of the mRNA localization machinery.

Translational regulation of maternal mRNAs by Hrp48 and Vas
In agreement with the function of Vas in enhancing the translation of maternal mRNAs (35, 36, 40, 49), we co-purified several ribosomal proteins and translation initiation factors with Vas-GFP. Our results suggest that Vas may recruit factors involved in translation initiation. This is also supported by the interaction of Vas with eIF5B and eIF3 subunits, as previously reported (56, [92][93][94]. In contrast to Vas, Hrp48 is a known translational repressor (45-47). In line with the localization of Hrp48 to P-bodies (111), we co-purified several components of the mRNA decay machinery, including the CCR4-NOT deadenylase complex with Hrp48-GFP. It is possible that these interactions are indirect and mediated by BicaudalC (BicC) and Belle (Bel). These proteins negatively regulate target mRNAs together with the CCR4-NOT complex (112,113). We demonstrated in vitro binding of Hrp48 with both BicC and Bel. This suggests that by recruiting these proteins, Hrp48 may regulate the nos and osk mRNAs, possibly via CCR4-NOT mediated deadenylation ( Fig. 8; ref. (112-115)). The function of Hrp48 in nos regulation remains to be investigated ( Fig. 8) However, the parallel enrichment of ribosomal proteins and P-body-components with Hrp48-GFP indicates its bifunctional role in modulating translation. Several lines of evidence from Drosophila, including binding of Hrp48 to a derepressor element in the osk 5'UTR (45), identification of Hrp48 as a part of a protein complex functioning in translational enhancement of Hsp83 mRNA (116) and interaction of Hrp48 with CPEB (cytoplasmic polyadenylation element binding) protein Orb (this study) support the dual nature of Hrp48.
hnRNP A2, the mammalian homolog of Hrp48, also exhibits the ability to mediate both translational stimulation (117) as well as repression (118), further strengthening the argument.

Identification of novel genes with a potential role in Drosophila oogenesis
Along with several known regulators of maternal mRNAs, we identified the protein products of many previously uncharacterized genes. One such example is CG5726, which encodes for a protein with a MIF4Glike domain. This domain is found in many proteins involved in RNA metabolism including translation initiation factors, NMD factors and nuclear cap-binding proteins (65, [119][120][121][122]. With no identifiable orthologs in humans, CG5726 protein shows up to 50% sequence identity among Drosophilids. In early embryos of Drosophila melanogaster, CG5726 interacts with short Osk (108). In this study, we found CG5726 to be interacting with multiple RBPs: Glo, Hrp48 and Vas, suggesting its possible role in translational regulation.    Normalized ratios (Log2) of both the replicates are plotted against each other. Dotted lines mark the proteins with more than 2-fold change over control, in each replicate. IP from GFP sample served as a control. "N" denotes the number of protein groups plotted and "r" denotes the Pearson correlation coefficient. Each identified protein is represented as a dot in light grey; each bait is highlighted in green; significantly enriched proteins are highlighted in pink; known interactants are highlighted in blue; background binders are highlighted in dark grey; empty circle represents control.     GFP tag served as a control. Cell lysates were immunoprecipitated with anti-Flag antibody and analyzed by western blotting. For HA-Flag-tagged proteins, 3-4% of the input and 10% of the eluates were loaded, whereas for GFP-tagged proteins, 3-4% of the input and 90% of the eluates were analyzed. (i) Cells were transfected the same way as in panel h, and the lysate was immunoprecipitated with an anti-GFP nanobody coupled to magnetic beads. For GFP-tagged proteins, 3.4% of the input and 10% of the eluates were loaded, whereas for HA-Flag-tagged protein, 3.4% of the input and 90% of the eluates were analyzed. In each panel, cell lysates were treated with RNases before immunoprecipitation. Novel interactions are highlighted in red.  Significantly enriched proteins in labeled MS data (ratio>2 in both replicates) Vas-GFP Hrp48-GFP