Interrogating Yeast Surface-displayed Human Proteome to Identify Small Molecule-binding Proteins*

Identifying proteins that interact with small molecules is often a challenging step in understanding cellular signaling pathways or molecular mechanisms of drug action. In this report, we describe the construction of libraries displaying human protein fragments on the surface of yeast cells and demonstrate the utility of these libraries for the study of small molecule/protein interactions. The libraries were used to select protein fragments with affinity for the phosphatidylinositides phosphatidylinositol 4,5-bisphosphate (PtdIns(4,5)P2) and phosphatidylinositol 3,4,5-trisphosphate (PtdIns(3,4,5)P3). We recovered cDNA inserts encoding pleckstrin homology domains, a phosphotyrosine-binding domain, and a fragment of apolipoprotein H. The pleckstrin homology and phosphotyrosine-binding domains are known phosphatidylinositide-binding domains, demonstrating the effectiveness of our approach. Binding of apolipoprotein H to PtdIns(4,5)P2 and PtdIns(3,4,5)P3 has not been reported previously and thus represents novel interactions. We expect that this method will be generally applicable to the study of small molecule/protein interactions and may facilitate the study of cellular signaling pathways and mechanisms of drug action or toxicity.

Small molecules are important regulators of many diverse cellular processes, and the majority of drugs in use and in development are small molecules. The biological activity of small molecules is usually dependent on specific interactions with cellular protein targets. Identifying proteins that interact with small bioactive molecules is an important but often challenging and rate-limiting step in understanding cellular signaling pathways or molecular mechanisms of drug action. Several methods for detecting small molecule-interacting proteins have been described, including affinity chromatography (1,2), small molecule or protein microarrays (3)(4)(5), phage display (6), and yeast and mammalian three-hybrid assays (7,8). Although all of these techniques have been used successfully, each has limitations. Affinity chromatography methods, although conceptually simple, are in practice most successful for high affinity interactions with abundant protein targets. Although having enormous potential, the application of human protein arrays has been limited by incomplete proteome coverage and the difficulties and high cost of producing high quality protein arrays. Phage display is limited by potential expression bias against eukaryotic proteins expressed in a prokaryotic host and the low number of fusion proteins displayed on each phage particle (9,10). Promising results have been obtained recently with yeast (7) and mammalian (8) three-hybrid assays, but these techniques require the production of a hybrid ligand that must be able to gain access to the cell interior.
Using yeast surface display techniques, heterologous protein fragments can be efficiently displayed on the surface of Saccharomyces cerevisiae yeast cells (11). In this system, protein fragments are displayed in a galactose-inducible manner on the yeast surface as C-terminal fusions to the yeast a-agglutinin subunit, Aga2p (11). We have previously described the construction of a yeast surface-displayed human testis cDNA library, which was used to identify human protein fragments with affinity for tyrosine phosphorylated peptides derived from the major autophosphorylation sites of the epidermal growth factor receptor and focal adhesion kinase (12). When coupled with fluorescence-activated cell sorting (FACS), 1 yeast surface-displayed cDNA libraries can theoretically be used to identify protein fragments with affinity for any soluble molecule that can be fluorescently detected.
Phosphatidylinositides are specialized lipids that function as important regulators of many cellular processes, including signal transduction, membrane trafficking, cytoskeletal organization, and nuclear events (13)(14)(15)(16)(17)(18)(19)(20)(21). The biological activity of phosphatidylinositides is mediated by protein binding, and several phosphatidylinositide-binding protein domains have been identified, including the pleckstrin homology (PH) domain (22). The identification and characterization of protein domains that bind phosphatidylinositides is therefore important for understanding the mechanisms by which they regulate cellular processes.
In this study, we sought to construct improved yeast surface-displayed human cDNA libraries with increased coverage of the human proteome and demonstrate their effectiveness for the study of small molecule/protein interactions by identifying phosphatidylinositide-binding proteins. To increase library coverage of the human proteome, we constructed frameshifted versions of the original pYD1 yeast surface display vector and used them to generate new libraries incorporating cDNA fragments from several additional human tissue sources. The expanded yeast surface-displayed human cDNA library was used for FACS-based selection of protein fragments with affinity for the phosphatidylinositides PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 . We recovered cDNA inserts encoding PH domains, a PTB domain, and a fragment of apoH. The PH and PTB domains are known phosphatidylinositide-binding domains, demonstrating the effectiveness of our approach for the study of small molecule/protein interactions. In the future, this method could be applied to the study of drug/protein interactions, which may facilitate our understanding of the molecular mechanisms behind drug efficacy or toxicity.

Construction of Yeast Surface-displayed Human cDNA Libraries-
The pYD1 yeast display vector (Invitrogen) was digested with EcoRI and the 5Ј overhang was either filled in with Klenow fragment (New England Biolabs, Ipswich, MA) or chewed back with mung bean nuclease (New England Biolabs) and religated to generate pYD1(ϩ1) and pYD1(Ϫ1), respectively. Random primed, size-selected human cDNA libraries containing 2.4 ϫ 10 6 -8.9 ϫ 10 6 primary clones derived from testes, brain, fetal liver, and breast tumor tissue (Invitrogen) were digested with EcoRI; ligated into EcoRI-digested and dephosphorylated pYD1, pYD1(ϩ1), and pYD1(Ϫ1) yeast display vectors; and transformed into 10G Supreme competent bacteria (Lucigen, Middleton, WI). Greater than 5 ϫ 10 7 transformants for each library were pooled, aliquoted, and stored at Ϫ80°C. At least 15 clones from each new library were analyzed by restriction digestion to verify cloning efficiency and insert diversity. All libraries analyzed had a diverse population of inserts in the expected size range, and no clones without inserts were observed (data not shown). A plasmid was prepared for each of the 12 libraries using a Qiagen Maxiprep kit (Qiagen, Hilden, Germany) and transformed into the S. cerevisiae strain EBY100 (Invitrogen). Transformants were selected on SD-CAA medium (2% dextrose, 0.67% yeast nitrogen base, 0.5% casamino acids). Greater than 4 ϫ 10 7 transformants for each version of the display vector (pYD1, pYD(ϩ1), and pYD(Ϫ1)) were pooled to generate libraries for each tissue source and stored in aliquots at Ϫ80°C. An additional library containing an equal mixture of all four libraries was prepared, and aliquots were stored at Ϫ80°C. Expression of the V5 epitope, located downstream of the cDNA cloning site in the pYD1 vectors, was detected by staining with Alexa Fluor 647-conjugated anti-V5 monoclonal antibody.
FACS Selection of Phosphatidylinositide-binding Clones-The mixed library was grown in SR-CAA (SD-CAA with 2% raffinose in place of dextrose) at 25°C to an A 600 of Ϸ5. To induce expression, the library was reinoculated at an A 600 of 1.0 in 200 ml of SRG-CAA (SR-CAA ϩ 2% galactose) and grown at 25°C for 24 h. For the first round of selection, 3 ϫ 10 8 induced yeast cells were collected by centrifugation, washed twice with PBS, and incubated in 600 l of PBS with 2 M biotinylated derivatives of either PtdIns(4,5)P 2 or PtdIns(3,4,5)P 3 for 4 h at 4°C. Cells were washed twice with 1 ml of ice-cold PBS and incubated with 600 l of 1:500 diluted phycoerythrin-conjugated streptavidin (SA-PE) (Invitrogen/BIOSOURCE, Camarillo, CA) for 20 min at 4°C. Cells were washed twice with ice-cold PBS, and binding cells were immediately sorted by flow cytometry (FACSAria, BD Biosciences) and recovered on SD-CAA plates. Approximately 2 ϫ 10 8 cells were analyzed in the first round selections. In subsequent rounds, Alexa Fluor 647-conjugated streptavidin (Invitrogen/Molecular Probes, Eugene, OR) was alternated with SA-PE to minimize selection of clones that bind the detection reagents. After three rounds of sorting, individual clones were picked, induced, and tested by FACS (LSRII, BD Biosciences) for binding to 500 nM PtdIns(4,5)P 2 -biotin or PtdIns(3,4,5)P 3 -biotin. Plasmids were recovered from yeast clones exhibiting phosphatidylinositide binding using a modified QIAprep Spin Miniprep protocol incorporating a glass bead cell lysis step (Qiagen). The isolated plasmid was transformed into DH5␣ cells and purified, and the cDNA inserts were sequenced. Public gene and protein databases were searched for matches to each cDNA insert.
Affinity and Specificity Analysis-Induced phosphatidylinositidebinding clones were incubated with various concentrations of either PtdIns(4,5)P 2 -biotin or PtdIns(3,4,5)P 3 -biotin for 6 h at 4°C. After two ice-cold PBS washes, cells were incubated with 1:500 SA-PE for 20 min at 4°C. After two ice-cold PBS washes, cells were immediately analyzed by FACS. All experiments were performed in duplicate. Mean fluorescence intensity data were plotted and analyzed with Kaleidagraph software (Synergy Software, Reading, PA) using a onesite binding, non-linear regression curve fit algorithm.

Construction of Yeast Surface-displayed Human cDNA
Libraries-We previously described the construction of a galactose-inducible yeast surface-displayed library of human protein fragments derived from human testis cDNA (12). In this system, protein fragments are displayed on the yeast surface as C-terminal fusions to the yeast a-agglutinin subunit, Aga2p (11). This library was successfully utilized to identify protein fragments with affinity for tyrosine phosphorylated peptides (12). We have sought to improve coverage of the human proteome using two methods. First, two new yeast surface display vectors, pYD1(ϩ1) and pYD1(Ϫ1), were derived from the original pYD1 by introducing frameshifts into the polylinker just upstream of the cloning site used for inserting cDNA fragments (Fig. 1A). The use of these three vectors allows all cDNA fragments to be expressed in the correct reading frame and should therefore increase the chance that cDNA fragments derived from low abundance mRNAs will be successfully displayed in the library. Second, cDNA from additional tissue sources was used to construct new yeast surface display libraries. We cloned cDNA fragments from random primed and size-selected human cDNA libraries derived from breast tumor, brain, and fetal liver tissues into all three frameshift derivatives of the pYD1 plasmid. In addition, cDNA fragments from the original testis cDNA library were cloned into the two new frameshifted vectors, pYD1(Ϫ1) and pYD1(ϩ1). At least 15 clones from each new library were analyzed by restriction digestion to verify cloning efficiency and insert diversity. All libraries analyzed had a diverse population of inserts in the expected size range, and no clones without inserts were observed (data not shown). These 11 new libraries were transformed into the yeast strain EBY100 to generate 11 new yeast surface-displayed human cDNA libraries. The three different frameshift libraries for each tissue source (testis, breast tumor, brain, and fetal liver) were pooled to create four master libraries. Subsequently these four libraries were pooled to create a mixed human cDNA yeast surface display FIG. 1. Construction of yeast surface-displayed cDNA libraries. A, frameshifted versions of the original pYD1 yeast display vector were created by digesting pYD1 with BamHI and either chewing back (pYD1(Ϫ1)) or filling in (pYD1(ϩ1)) the overhanging nucleotides before religating. B, upon induction with galactose, human protein fragments derived from human testis, brain, fetal liver, and breast tumor tissue cDNAs are displayed on the yeast cell surface as fusions with Aga2p. Cell surface tethering of Aga2p fusion proteins is accomplished by disulfide bonds with Aga1p, which is covalently linked into the cell wall by phosphatidylinositol glycan linkages. The cDNA inserts are flanked by epitope tags (Xpress TM and V5) that can be used to monitor expression. C, expression of the V5 epitope was monitored for the testis, fetal liver, brain, breast tumor, and mixed libraries by staining with Alexa Fluor 647-conjugated anti-V5 monoclonal antibody. As a negative control, EBY100 yeast was also stained. The cDNA insert of a clone must have an ORF that spans its entire length and is in-frame with both the upstream AGA2 coding region and the downstream V5 coding region for the V5 epitope to be expressed. Because only one-third of clones with an ORF in-frame with AGA2 running the full length of the insert will also be in-frame with the V5 epitope, the actual number of clones with an AGA2-fused ORF that spans the full length of the insert will be approximately 3 times the number of V5-positive clones. APC, allophycocyanin channel used to measure Alexa Fluor 647 signal. library with ϳ2 ϫ 10 7 human cDNA fragments. Upon induction of the library with galactose, human protein fragments are displayed on the yeast surface as C-terminal fusions to the yeast a-agglutinin subunit, Aga2p (Fig. 1B).
To verify the display of human protein fragments on the yeast surface, the four individual tissue-derived libraries (testis, breast tumor, brain, and fetal liver) and the mixed library were induced with galactose, and expression of the V5 epitope was monitored. The fraction of V5-positive cells ranged from 3.2 to 6.4% for the individual tissue libraries, whereas 4.0% of cells in the mixed library were V5-positive (Fig. 1C). For the V5 epitope to be expressed, the cDNA insert must have an ORF that spans its entire length and is in-frame with both the upstream AGA2 coding region and the downstream V5 coding region. Because only one-third of clones with an ORF in-frame with the AGA2 coding region and spanning the entire length of the insert will also be in-frame with the V5 epitope, the actual number of clones with an AGA2-fused ORF that spans the entire length of the insert will be approximately 3 times the number of V5-positive clones. Thus, we estimated that about 12% of the induced cells in the mixed library express an AGA2-fused ORF that spans the entire length of the insert.
Plasmids from phosphatidylinositide-binding clones were recovered, retransformed into EBY100 to verify the results of the initial screening, and sequenced to determine the identity of their cDNA inserts. A total of 55 binding clones were sequenced from the third round output of the PtdIns(4,5)P 2 sort, yielding 11 unique cDNA inserts in seven different genes ( Table I). Nine of the 11 cDNA inserts contain PH domains from SBF1, PDK1, ␤-Spectrin, PSD, and OSBP2 (Table I and Fig. 3). The other two inserts encode the PTB domain of DAB2 and the majority of the coding region of the apolipoprotein apoH (Table I and Fig. 3). A total of 38 binding clones were sequenced from the third round output of the PtdIns(3,4,5)P 3 sort, yielding nine unique cDNA inserts in four different genes ( Table I). The nine cDNAs recovered from the PtdIns(3,4,5)P 3 selection contain the PH domains from PDK1, ARNO, GAB2, and SBF1 (Table I and Fig. 3). Altogether we recovered 17 unique cDNA inserts representing nine different genes (Table  I and Fig. 3). FACS binding data for representative clones of each of these nine genes is shown in Fig. 4. Fourteen of the 17 recovered cDNAs were inserted into the new frameshifted

FIG. 2. Selection of phosphatidylinositide-binding clones from a yeast surface-displayed human cDNA library by FACS.
A, diagram of selection strategy. Biotinylated derivatives of either PtdIns(4,5)P 2 or PtdIns(3,4,5)P 3 were incubated with yeast displaying human protein fragments, and binding clones were enriched by three rounds of FACS. Individual clones were screened for binding to PtdIns(4,5)P 2 -biotin and PtdIns(3,4,5)-P 3 -biotin, and plasmids were recovered and sequenced. B, enrichment of PtdIns(4,5)P 2 -biotin-binding clones through three rounds of FACS. In each round, induced yeast were incubated with 2 M PtdIns(4,5)P 2 -biotin for 4 h at 4°C. Bound PtdIns(4,5)P 2 -biotin was detected with either SA-PE or Alexa Fluor 647-conjugated streptavidin. Sort windows are shown, and the fraction of the population within the sort window is indicated. The output of the third round was used for individual clone analysis. APC, allophycocyanin channel used to measure Alexa fluor 647 signal; Rd, round.
vectors, pYD1(ϩ1) and pYD1(Ϫ1) ( Table I), suggesting that the utilization of these vectors successfully increased library coverage of the human proteome. The PTB domain as well as most of the PH domains we recovered have been experimentally shown to bind phosphatidylinositides (23)(24)(25)(26)(27)(28), val-idating the effectiveness of our selection method. Although apoH has been shown to bind negatively charged phospholipids (29), binding to PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 has not been reported previously and thus represents novel interactions.
Affinity and Specificity Analysis of Phosphatidylinositidebinding Clones-To determine the affinity and specificity of binding of the recovered clones, the apparent equilibrium dissociation binding constants (K D ) of representative clones for each of the nine genes was measured for both PtdIns(4,5)P 2 -biotin and PtdIns(3,4,5)-biotin ( Fig. 5 and Table  II). The four clones recovered from the PtdIns(3,4,5)P 3 sort (PDK1, ARNO, GAB2, and SBF1) bind very specifically to PtdIns(3,4,5)P 3 -biotin with apparent affinities ranging from 30 nM to 10 M. Although most of these clones exhibited some binding above background to PtdIns(4,5)P 2 -biotin, no saturation was observed at the highest concentration tested, and therefore affinity values could not determined. The OSBP2 clone recovered from the PtdIns(4,5)P 2 sort also binds much better to PtdIns(3,4,5)P 3 -biotin than PtdIns(4,5)P 2 -biotin (630 nM versus 13 M, respectively). Three clones recovered from the PtdIns(4,5)P 2 sort (DAB2, apoH, and ␤-Spectrin) bind with similar affinities, ranging from 360 nM to 3 M, to both PtdIns(3,4,5)P 3 -biotin and PtdIns(4,5)P 2 -biotin. Although measurable binding above background to both PtdIns(4,5)P 2biotin and PtdIns(3,4,5)P 3 -biotin was observed for the PSD clone (Fig. 5), no binding saturation was evident at the highest concentrations tested, and affinity values could not be calculated. Thus, FACS analysis of the yeast surface-displayed  (3,4,5)P 3 sorting results NCBI Entrez protein database accession numbers for human protein matches and the expressed regions are shown for recovered clones. The frameshift version of the pYD1 display vector is indicated along with the frequency of recovery for each clone. The total amino acid length for each protein is shown on the diagram in Fig. 3  phosphatidylinositide-binding protein fragments identified by the selections reveals a wide range of apparent affinities and specificity for the two tested phosphatidylinositides.
Mapping the Phosphatidylinositide-binding Region of ApoH-To demonstrate the effectiveness of yeast display for mapping binding domains in proteins, we sought to further delineate the region in apoH required for binding to PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 . Previously it has been shown that the C terminus of apoH is important for binding to anionic phospholipids (30). To determine whether this region is sufficient for binding to phosphatidylinositides, we constructed a clone displaying the C-terminal region of apoH, apoH-(244 -326) (Fig. 7A, mature peptide amino acid numbering) and tested for binding to PtdIns(4,5)P 2 -biotin and PtdIns(3,4,5)P 3 -biotin. The binding of apoH-(244 -326) to both PtdIns(4,5)P 2 -biotin and PtdIns(3,4,5)P 3 -biotin was similar to that observed for the original apoH clone, indicating that the C-terminal region is sufficient for binding to these phosphatidylinositides (Fig. 7B). Codon mutations in the C-terminal domain of apoH that replace Trp-316 with Ser-316 eliminate binding to phosphatidylserine (31). To determine whether this residue is also important for binding to phosphatidylinositides, we engineered a yeast clone with this mutation, apoH(W316S) (Fig. 7A) and tested for binding to the phosphatidylinositides. Mutation of this residue completely eliminated binding of apoH to both PtdIns(4,5)P 2 -biotin and PtdIns(3,4,5)P 3 -biotin, indicating that Trp-316 is critical for the interaction of apoH with PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 (Fig.  7B). Thus, yeast surface display can be effectively utilized for fine mapping of small molecule-binding domains in proteins. DISCUSSION We have demonstrated that FACS-based screening of yeast surface-displayed human cDNA libraries can be an effective means of identifying protein fragments that interact with small molecules by successfully selecting protein fragments with affinity for the phosphatidylinositides PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 . Seventeen unique phosphatidylinositide-binding clones were recovered, encoding fragments from nine different proteins. All but one of the protein fragments identified have been shown experimentally or are strongly predicted based on the presence of a PH domain to bind phosphatidylinositides (23)(24)(25)(26)(27)(28)31). Although apoH has been reported to bind negatively charged phospholipids (29), binding to PtdIns(4,5)P 2 and PtdIns(3,4,5)P 3 has not been reported previously and thus represents novel interactions. Several clones, including those containing PH domains from ARNO, PDK1, and GAB2, bind with significantly higher affinity to PtdIns(3,4,5)P 3 than to PtdIns(4,5)P 2 . The observed phosphatidylinositide binding specificity of these PH domains is qualitatively similar to results obtained by other groups (25,26,28), demonstrating that these yeast surfacedisplayed protein fragments maintain the binding properties of their native proteins. The inserts that we identified closely define phosphatidylinositide-binding domains such as the PH and PTB domains. Thus, our fragmented, domain-sized cDNA display library can provide clues to functional domains involved in direct binding to small molecule ligands. In addition, the results show that these domains are folded properly on the yeast cell surface. It is worth noting that phosphatidylinositide-binding proteins are intracellular proteins. Thus, the protein domains we recovered retain their ability to bind phosphatidylinositides after passage through the yeast secretory pathway. This is consistent with our previous study where we used the yeast cDNA display approach to identify phosphopeptide-binding proteins, which are also intracellular proteins (12). Thus, we believe that the yeast display method does not introduce significant bias against intracellular proteins.
Because library diversity is critically important for the success of any cDNA display screening method, we sought to improve coverage of the human proteome by cloning cDNA FIG. 6. Binding of recombinant apoH to PtdIns(3,4,5)P 3 . A, wells were coated with apoH and control proteins and probed with PtdIns(3,4,5)P 3 -biotin. Binding was detected with streptavidinhorseradish peroxidase. B, graph of PtdIns(3,4,5)P 3 -biotin binding data. bFGF, basic fibroblast growth factor; TGF␤, transforming growth factor ␤. fragments from libraries derived from breast tumor, brain, and fetal liver tissues into three frameshift derivatives of the pYD1 plasmid. By using three different frameshift versions of pYD1, each protein-encoding cDNA fragment should be able to be expressed as an in-frame fusion to AGA2. We reasoned that this strategy would increase library diversity and increase the representation of cDNAs derived from rare transcripts. Analysis of the selection output showed that cDNA inserts in all three frameshift versions of pYD1 were recovered, validating our methods of expanding the diversity of the cDNA library. In the future, normalized cDNA libraries from various tissue sources may also be used to create an even larger library, thereby increasing the likelihood of recovering low abundance gene products. Our selection was performed using FACS, which allowed real time monitoring of the selection process. FACS-based selection also allows fine discrimination of clones with differences in binding characteristics, which may reflect unique binding kinetics and affinity. Ligand concentrations can also be adjusted to more specifically target clones with different affinities in the binding population. These flexibilities are not readily available to other screening methods such as yeast and mammalian two-or three-hybrid assays and phage display. Notably the affinity of the identified phosphatidylinositide-binding clones spans a wide range from 30 nM to Ͼ13 M. The selection output is thus diverse and is not dominated by a few high affinity binders. This is in contrast to methods that are based on affinity purification of target proteins directly from cell lysates, which are prone to dominance by cellular proteins with high affinities and high abundance.
Yeast surface-displayed cDNA libraries can theoretically be used to identify protein fragments with affinity for any soluble molecule that can be fluorescently detected. Using FACS, yeast surface-displayed cDNA libraries can be screened at a rate of up to 50,000 -70,000 cells/s, allowing the full diversity of large (50 million) libraries to be practically screened. In the future, this method could be applied to the study of drug/ protein interactions, which may facilitate our understanding of the molecular mechanisms behind drug efficacy or toxicity.