A Genome-wide Screen for Site-specific DNA-binding Proteins*

We used a biochemical genomics method of assaying Saccharomyces cerevisiae proteins, derived from a nearly complete set of glutathione S-transferase fusions, to develop an approach that is able to identify proteins that bind to a DNA element. Using the upstream activation sequence (UAS) of the promoter for the invertase gene, SUC2, we identified both specific and nonspecific binding activities, which could be classified based on whether they bound with equivalent affinity to a nonspecific DNA competitor. Three transcription factors, Mig1, Yer028c, and Rgt1, were found to be binding activities specific to the SUC2 UAS. Mig1 and Yer028c had been reported previously to bind to elements within the SUC2 UAS, validating the ability of the method to identify sequence-specific factors. The third activity, Rgt1, had not been reported previously to bind to SUC2. Additional gel shift assays narrowed the Rgt1 binding site to the SUC2-B element within the SUC2 UAS, which is similar to previously identified Rgt1 binding sites present in other genes. In vivo levels of invertase activity in an rgt1Δ strain were reduced relative to an isogenic RGT+ strain when these strains were grown under inducing (low glucose) conditions, suggesting that Rgt1 may have a role in the activated transcription of SUC2. This report demonstrates the feasibility of identifying DNA binding activities by rapidly assaying a large fraction of the predicted open reading frames of an organism for binding to a regulatory DNA motif.

The determination of transcription factor regulatory networks is a crucial component of the effort to understand biological pathways. For example, the Saccharomyces cerevisiae cell cycle is controlled by a series of transcription factors that form an underlying circuit (1). The development of chromatin immunoprecipitation in conjunction with the use of intergenic DNA arrays has allowed the genome-wide localization of transcription factor binding sites (1)(2)(3). This method allows the pairing of a large set of promoter regions to a specific transcription factor. Conversely, computational analyses are identifying many putative regulatory sequences within the intergenic regions of the yeast genome (4 -6); how-ever, in many cases, the corresponding DNA-binding proteins are unknown. Thus, the development of new methods to pair transcription factors with their target genes is needed.
We sought to develop an approach to determine the set of proteins that bind to a specific DNA sequence by combining a classical biochemical technique, the gel shift assay, with purified protein pools representing the yeast proteome (7). The upstream activation sequence (UAS) 1 for the SUC2 gene, which encodes invertase, a periplasmic enzyme that hydrolyzes external sucrose into glucose and fructose, was chosen as a test case for our genome-wide screen. This sequence has been subjected to numerous genetic experiments to identify proteins involved in its regulation (8 -10). In addition, the UAS has been delineated precisely by mutation and deletion (11)(12)(13)(14). Analysis of the UAS has implicated Mig1, Mig2 (15), and Yer028c (16), transcription factors that bind to a pair of GC-rich sequence motifs in the SUC2 UAS termed the SUC2-A and SUC2-B elements (see Fig. 1). Mig1 and Mig2 are repressors of SUC2 expression at high levels of glucose (repressed growth conditions). Furthermore, there is evidence for an unknown activator(s) that functions at very low levels of glucose (derepressed growth conditions) (12,14,16).
We report here the use of a genome-wide gel shift assay to identify proteins capable of binding to the SUC2 UAS. Additional gel shift assays, under high stringency conditions, allowed us to sort the activities into specific and nonspecific DNA-binding factors. In addition to identifying two transcription factors known to bind to the SUC2 UAS, we identified a novel binding activity, Rgt1. Further evidence for the role of Rgt1 as a transcriptional activator comes from the observation that an rgt1 strain had reduced SUC2 expression.

EXPERIMENTAL PROCEDURES
Strain Growth and Protein Purification-An array of 6144 individual yeast strains, each containing a different yeast ORF fused to GST, was obtained from Eric Phizicky. Yeast strains were grown in yeast extract, peptone, 2% glucose medium (YEPD) (17) to saturation in a 96-well microtiter plate format before pooling and storage in glycerol at Ϫ70°C. Pools of 96 strains were grown overnight in 2 ml of SD Ura Ϫ (17) liquid medium and washed with an equal volume of SD Leu Ϫ Ura Ϫ , and the resuspended cells were used to inoculate 400-ml cultures of SD Leu Ϫ Ura Ϫ . Cultures were grown for 18 -22 h to an A 600 of 0.8 -1 and induced with 0.5 mM copper sulfate for 2 h. Cells were harvested, and protein was extracted using Y-PER ( Whole cell extracts were passed over a 2-ml glutathione-agarose (Sigma) column, followed by three washes with 10 ml of wash buffer (20 mM Tris-HCl (pH 8), 50 mM NaCl, 1 mM dithiothreitol, 1 mM EDTA, 0.5% N-octyl-glucoside, and 10% glycerol). Proteins were eluted by addition of 2 ml of elution buffer (20 mM Tris-HCl (pH 8), 50 mM NaCl, 1 mM dithiothreitol, 1 mM EDTA, 0.5% N-octyl-glucoside, 10% glycerol, and 25 mM glutathione). Eluted proteins samples were adjusted to 40% glycerol and stored at Ϫ20°C. Typically, the total protein concentrations in the eluted samples were ϳ0.2 mg/ml as determined by Bio-Rad protein assay reagent.
For the deconvolution of the pools, the growth conditions were as above except that the strains were grown in 50-ml cultures, and the purification was scaled accordingly. Finally, individual yeast strains corresponding to DNA binding activities were grown in 50-ml cultures and purified accordingly. To confirm the identities of the GST-ORFs encoded by yeast strains that yielded gel shift activities, plasmids from yeast cells within individual wells were extracted and transformed into Escherichia coli, followed by sequencing.
PCR-generated products were agarose gel-purified, and the DNA was extracted with a Qiagen kit before labeling. DNA was end-labeled in 10 l containing 50 nM DNA fragment (60 ng of 145-bp DNA fragment), 5 units of T4 polynucleotide kinase (Invitrogen), 1ϫ polynucleotide kinase exchange buffer, and 70 Ci of ATP (3000 Ci/mmol) and incubated at 37°C for 1 h. Unincorporated nucleotide was removed by a gel filtration column (Centrisep 20) equilibrated with TES buffer (Tris-HCl (pH 8.0), 1 mm EDTA, and 100 mM NaCl). The SUC2-A and SUC2-B probes were generated by labeling oligonucleotides as above followed by annealing a 2-fold excess of cold complementary oligonucleotide. Unincorporated nucleotide was removed by gel filtration.
Gel Shift Assay-Protein-DNA complexes were formed by mixing equal volumes of purified protein eluates with 0. . Protein-DNA complexes were fractionated on non-denaturing 5% (49:1) polyacrylamide gels containing 0.5ϫ Tris borate EDTA buffer (5% glycerol, 1 mM MgCl 2 , 1 mM CaCl 2 , and 50 m ZnCl 2 ) at 5 volts/cm at 4°C. The gel was dried onto filter paper and exposed overnight to a phosphorimaging screen followed by scanning of the screen with a Molecular Dynamics Storm phosphorimager. Other gel shift assays were performed as above except with a 10% (49:1) polyacrylamide gel.
Invertase Activity-The yeast strains used to measure invertase activity were derived from the Saccharomyces genome deletion project (19). The rgt1⌬ strain (ATCC number 4004887) and its isogenic parent strain (BY4741 MATa his3⌬1 leu2⌬0 met15⌬0 ura3⌬0) were obtained from the American Type Culture Collection. Periplasmic invertase activity was measured from whole cells in log phase based on a method described previously (20). At least three individual colonies from each strain were grown in YEPD overnight, and the cells were derepressed by inoculation into yeast extract, peptone, glycerol medium (YEPG) containing 0.1% glucose followed by growth for 3 h at 30°C. One unit of invertase-specific activity is 1 mol of glucose released/min/1 absorbance unit of cells at 600 nM.

Establishment of Conditions for the Gel Shift
Assay-To screen the yeast proteome, we generated pools of purified proteins with defined constituent proteins in each pool (7). An array of 6144 yeast strains, each overexpressing a single yeast ORF fused to GST, was split into 64 pools of 96 strains each. Protein pools were then generated by purification of the GST fusion proteins from whole cell extracts of each pool of strains. The protein pools were used in a radioactive gel shift assay, a sensitive method to detect the binding of proteins to a radiolabeled DNA fragment in which protein-DNA complexes migrate with reduced electrophoretic mobility in a nondenaturing gel compared with the unbound DNA. Various parameters of the gel shift assay were investigated, including the size of the probe DNA, assay buffer components, and gel conditions, resulting in the use of a 145-bp fragment corresponding to nucleotides Ϫ401 to Ϫ546 relative to the SUC2 transcriptional start (Fig. 1). This fragment includes two GCrich elements, SUC2-A and SUC2-B, which are bound by Mig1, Mig2, and Yer028c (15,16).
Seven protein pools, including ones known to contain transcriptional regulators of SUC2, were selected to assess the effect of unlabeled nonspecific competitor on protein binding to the SUC2 UAS (Fig. 2). Our aim was to establish a level of nonspecific competitor that eliminated the majority of observed binding activities, thereby enriching for specific interactions. In the absence of unlabeled nonspecific competitor, binding activities could be observed in nearly every pool tested, indicating the many DNA binding activities found in yeast. Upon the addition of increasing amounts of poly[d-  (Fig.  2), which are likely SUC2 UAS-specific binding activities. Each activity had a distinct electrophoretic signature of mobility, signal intensity, and banding pattern, dependent on factors such as the efficiency of protein expression and purification, possible proteolytic degradation during purification, DNA binding affinity, and the mass and charge of each protein. The electrophoretic signature of each activity was consistent regardless of whether the protein was present in a pool of 96 or was purified individually (see below).
Identification of SUC2 UAS Binding Activities-Of the 64 pools representing the yeast proteome, eight pools with protein-DNA complexes of varying electrophoretic mobilities and radioactive intensities were investigated to determine the identities of the GST-ORFs responsible for the observed complex. Additional pools with protein-DNA complexes that are faintly visible in Fig. 3, but do not have their lane labeled, were not deconvoluted, because it became apparent that faint bands are more likely to be nonspecific complexes (see experiments below). The identities of the GST-ORFs in these eight pools were determined by a deconvolution strategy in which yeast strains from each 96-well microtiter plate were combined systematically as pools of those in each of the eight rows and each of the 12 columns. For example, purification of the GST fusion proteins from the row and column pools of plate 61 showed that protein-DNA complexes of an identical pattern and electrophoretic mobility as that present in the plate 61 pool were also present in the pools of column 7 and of row D (Fig. 4). Based on these coordinates, the relevant protein in pool 61 was identified as Rgt1. Similarly, other protein-DNA complexes were deconvoluted and yielded the identities observed in Fig. 3. Deconvolution of pool 53 identified two closely migrating protein-DNA complexes yielding a total of nine complexes from the eight pools.
At least three transcription factors have been reported previously to bind the SUC2 UAS, Yer028c, Mig1, and Mig2. Pool 12 contained the Yer028c complex. Pool 14 contained the Mig1 complex as observed in previous assays (Fig. 2, lane 3), but this complex was not clearly evident in Fig. 3 because of variability in the assay. However, the Mig1-DNA complex was  3. A genome-wide gel shift assay using 64 pools of purified proteins representing the yeast proteome. The pools with protein-DNA complexes whose identities were deconvoluted are labeled with the pool number and the identity of the GST-ORF responsible for the electrophoretic mobility shift (above the lane). GST-ORF identities were determined by a deconvolution procedure as in Fig. 4. Two GST-ORFs were responsible for the gel shift signal observed in pool 53. Mig1 binding activity was observed previously in pool 14 but is not readily apparent in this image. clearly visible upon deconvolution of pool 14 (data not shown). Mig2 binding activity was not detected in our assays, although a correct size insert with proper junction sequence was present in the appropriate transformant of pool 9.
The functional annotations for the other activities indicated that some nonspecific DNA binding activities were detected even in the presence of unlabeled competitor DNA. For example, the protein-DNA complex from pool 33 corresponded to Yku80, a protein that binds to double-stranded ends of DNA with little or no sequence specificity (21). The protein-DNA complex observed in pool 29 was due to Apn1, an apurinic/apyrimidinic endonuclease (22) that binds DNA under our experimental conditions. The functional annotations for Arc40, Mek1, Mrt4, Smm1, and Stb3 make them also unlikely to be sequence-specific activities for the SUC2 UAS. The assumption that these activities are not sequence-specific for the SUC2 UAS is corroborated by experimental results below and in Fig. 5.
Additional gel shift assays were performed to determine which binding activities identified in the initial genome-wide gel shift screen were specific for the SUC2 UAS. To this end, purified proteins were compared in their binding to the SUC2 UAS probe and a 272-bp non-homologous sequence. The non-homologous sequence is derived from an upstream region of the SUC2 promoter (Fig. 1) that lacks the GC elements present in the UAS. Additionally, high levels of nonspecific competitor were used to sort activities into specific and nonspecific categories. Fig. 5B shows that Rgt1, Mig1, and Yer028c exhibited specific binding to the SUC2 UAS, with minimal binding to the non-homologous sequence even at low levels of nonspecific competitor and significant binding to the SUC2 UAS even at high levels of nonspecific competitor. In contrast, Stb3, Mrt4, and Mek1 bound similarly to both the SUC2 UAS and to the non-homologous sequence probe at low levels of nonspecific competitor, and the addition of a 5-fold higher level of nonspecific competitor abolished binding to both probes (Fig.  5A). These results indicate that these proteins bound DNA nonspecifically. Other activities identified in the genome-wide screen also exhibited similar nonspecific activity (data not shown). Regulation of the SUC2 Promoter by Rgt1-Although previous studies identified Rgt1 binding sites in promoters of the HXT (hexose transporter) genes (23,24), the binding of this factor to SUC2 had not been observed. We performed gel shift assays with annealed oligonucleotide probes corresponding to the SUC2-B and SUC2-A elements and found that Rgt1 binds to only the SUC2-B probe (Fig. 6A). The sequence of the SUC2-B element (Fig. 1) includes a site with similar sequence and spacing from the transcriptional start site as the Rgt1 binding sites observed in HXT promoters. Mig1 and Yer028c bound specifically to both GC-rich elements (Fig. 6A), consistent with previous results indicating the presence of binding sites within these fragments.
We also investigated the effect of an rgt1 deletion on the expression of invertase to address whether Rgt1 plays a role in the regulation of the SUC2 gene. We compared invertase expression under inducing conditions in an rgt1⌬ yeast strain to the isogenic parent strain. Inducing (low glucose) condi-tions inactivate the Mig1 and Mig2 repressors and stimulate unknown activators for SUC2 expression. Under these conditions, invertase expression in the rgt1⌬ strain was reduced to 62% of the parent strain, indicating that Rgt1 is at least partially responsible for fully induced levels of SUC2 (Fig. 6B). DISCUSSION We used the gel shift assay on a genome-wide basis to identify yeast proteins that bind in a sequence-specific manner to a DNA element, in this case a regulatory region of the SUC2 gene. Two homologous transcription factors, Mig1 and Yer028c, were found that had been demonstrated previously to bind a pair of GC-rich elements in the SUC2 UAS. Whereas Mig1 is required for glucose repression of SUC2 expression, Yer028c does not appear to be involved in the glucosemediated regulation of SUC2 despite its ability to bind sites similar to those bound by Mig1 (16). The target genes of Yer028c are unknown. Mig2 is a transcription factor homologous to Mig1 that is known to bind the SUC2 UAS but was not identified in our assay even though the gene fusion was present in the array. The inability to detect Mig2 could be because of poor growth of the strain carrying the GST-Mig2 fusion or reduced expression or inefficient purification of the fusion.
We also identified the Rgt1 transcription factor as a novel SUC2 UAS binding activity. Rgt1 is a key regulator of the HXT genes (24) and is classically viewed as having three transcriptional modes, depending on the level of glucose in the growth media, that are as follows: (i) activation in high glucose, (ii) neutral activity in low glucose, and (iii) repression in the absence of glucose. Invertase expression is induced in low glucose media, and interestingly we observed a reduced level of induced invertase expression in an rgt1⌬ strain. Although similar results had been observed for invertase expression in an rgt1⌬ strain (11), the effect had not been considered significant. However, the invertase expression results, our identification of an Rgt1 binding site in the SUC2 promoter, and the similarity of this site to sites in the HXT genes support the idea that Rgt1 maybe an activator of SUC2 transcription at low glucose levels. In addition, a LexA-Rgt1 hybrid can activate a reporter gene with LexA binding sites under low glucose conditions, suggesting that Rgt1 can act as an activator at this level of glucose (23). More rigorous validation of the role of Rgt1 in the regulation of SUC2 requires the measurement of mRNA levels or in vivo binding by chromatin immunoprecipitations.
The assay conditions used in the genome-wide gel shift assay (Fig. 3) were not stringent enough to completely prevent nonspecific proteins from binding to the SUC2 UAS probe. These activities were classified easily as nonspecific for the SUC2 UAS by comparing binding to a non-homologous probe. The advantage of the less stringent conditions is that DNA binding activities may be detected from proteins that were unknown to bind DNA. An example of this is the Stb3 protein, which interacts with the transcriptional repressor Sin3 A, Rgt1 binds specifically to the SUC2-B element and not to the SUC2-A. Mig1 and Yer028c bind to both SUC2-B and SUC2-A elements. B, invertase activity in an rgt1⌬ strain compared with the parent isogenic strain (RGT1). (25), but whose molecular function is unknown.
Previous use of the biochemical genomics strategy with pooled yeast GST fusions has identified enzyme activities (7), whereas the activities identified in this report are observed under equilibrium binding conditions. The ability to detect such binding events is in large part because of the sensitivity of the gel shift assay. Another advantageous aspect of the gel shift assay is that even two binding activities in a pool can be deconvoluted readily, because each protein has a unique charge and size, and thus the complexes migrate differently through a native gel. Nevertheless, improvements of the approach are possible. Specifically, concentrating purified protein samples and removing proteins with nonspecific binding activities should improve the robustness of the procedure. More stringent initial assays that used high levels of nonspecific competitor DNA would streamline the procedure.
Computational analyses of yeast intergenic regions have identified many regulatory motifs in promoters from diverse sets of genes. In many cases, the binding factors for these motifs are unknown. Similar experimental and computational approaches are being applied to gene sets from other organisms. The combination of the gel shift assay and pools of GST-ORFs provides a convenient method to characterize these binding activities.