GYF Domain Proteomics Reveals Interaction Sites in Known and Novel Target Proteins*S

GYF domains are conserved eukaryotic adaptor domains that recognize proline-rich sequences. Although the structure and function of the prototypic GYF domain from the human CD2BP2 protein have been characterized in detail, very little is known about GYF domains from other proteins and species. Here we describe the binding properties of four GYF domains of various origins. Phage display in combination with SPOT analysis revealed the PPG(F/I/L/M/V) motif as a general recognition signature. Based on these results, the proteomes of human, yeast, and Arabidopsis thaliana were searched for potential interaction sites. Binding of several candidate proteins was confirmed by pull-down experiments or yeast two-hybrid analysis. The binding epitope of the GYF domain from the yeast SMY2 protein was mapped by NMR spectroscopy and led to a structural model that accounts for the different binding properties of SMY2-type GYF domains and the CD2BP2-GYF domain.

The GYF domain is a protein interaction domain ubiquitously expressed in eukaryotic species (1,2). It belongs to the functional class of proline-rich sequence (PRS) 1 recognition domains such as SH3 (3,4), WW (5), EVH1 (6), and UEV domains (7,8) as well as profilin (9). Although the sequence requirements have been investigated in detail for most of the PRS-binding domains, only little is known about the recognition code of GYF domains, and the biological role of GYF domain-containing proteins remains largely unknown. So far, an in-depth analysis has been performed solely for the GYF domain of CD2BP2. For this GYF domain a role in CD2 receptor-dependent T cell signaling has been observed (1,10), and the identification of CD2BP2 as part of the U5 snRNP (11,12) and as interaction partner of the core splicing protein SmB/BЈ (13) suggests an independent role in splicing or splicing-associated processes. The structure of the complex of the CD2BP2-GYF domain with a CD2-derived proline-rich peptide defined a set of mostly conserved aromatic amino acids of the domain to act as the primary contact site for the interaction (2,10). Analysis of the binding properties revealed two classes of ligands for CD2BP2-GYF. The CD2 class shows a charge dependence of binding and is characterized by the recognition motif PPGX(R/K), whereas the so-called PPGW class requires a hydrophobic residue directly C-terminal to the PPG core, and its recognition motif was identified as PPG(W/F/Y/M/L) (39). Sequence alignment suggests that the GYF domain of CD2BP2 belongs to a GYF domain subfamily that is characterized by a tryptophan at position 8 and an extended loop between ␤-strands 1 and 2 (10) (see Fig. 1). Most GYF domains contain an aspartate at position 8 and a shorter ␤1-␤2 loop (Fig. 1) forming a second subfamily. Here we describe the analysis of four GYF domains from yeast, plant, and man that belong to the second subfamily, which we name the SMY2 subfamily of GYF domains. SMY2 was originally cloned as a suppressor of the myo2-66 mutation in the motor protein MYO2 in Saccharomyces cerevisiae (14), and our investigation of the GYF domains of SMY2 and three other proteins shows that all sequences recognized by the different GYF domains converge on the PPG motif followed by one of the hydrophobic residues Phe, Ile, Leu, Met, and Val (⌽). Except for this PPG⌽ motif there is little dependence on additional flanking residues within the respective ligands. The consensus motifs that were derived by a combination of phage display and substitution analysis allowed us to identify natural target sites. Peptides comprising these sites were tested for binding by membrane SPOT analysis. Several interactions were further investigated by fluorescence and NMR titration experiments revealing binding epitopes and affinities for the novel ligands. Yeast two-hybrid analysis and pull-down experiments confirmed the association of selected proteins with GYF domains under more physiological conditions and allowed the placement of GYF domain-containing proteins into known functional contexts. In S. cerevisiae, the GYF domains of SMY2 and its paralog YPL105C (named SYH1 for SMY2 homolog 1) interact with proteins that have been im-plicated in pre-mRNA branch point binding (MSL5) or regulation of translation (EAP1). For the human O75137 protein (also indicated as PERQ2 because of its homology to human PERQ amino acid rich with GYF domain protein 1 (PERQ1)), our work identified the core splicing proteins SmB/BЈ and snRNP-N as the most likely interaction partners, whereas the Arabidopsis thaliana Q9FMM3-GYF domain selects for sequences in the homolog of the human transcription regulator CNOT4 (accordingly termed GYN4, GYF domain-containing protein binding to Not4 homolog). GYN4 and SYH1 contain internal binding sites for their respective GYF domains, and we show that intramolecular association precludes intermolecular binding in the case of GYN4.

MATERIALS AND METHODS
Constructs-The estimated domain borders of the GYF domains of proteins PERQ2 (Swiss-Prot accession number O75137; residues 531-596), GYN4 (Swiss-Prot accession number Q9FMM3; residues 546 -604), SMY2 (Swiss-Prot accession number P32909; residues 243-340), and SYH1 (Swiss-Prot accession number Q02875; residues 150 -226) were obtained from sequence alignments, and the corresponding fragments were subcloned into expression vectors. For GYN4 and SYH1, longer constructs containing an internal PRS (GYN4 residues 546 -619, termed GYN4-PR, and SYH1 residues 141-226, termed PR-SYH1; see Fig. 1) were also cloned into protein expression vectors. All constructs were amplified from the following DNA clones. PERQ2 was derived from the cDNA clone HJ03496 and GYN4 was derived from genomic P1 clone MBD2 (kind gifts from Takahiro Nagase and Satoshi Tabata, Kazusa DNA Research Institute, respectively), whereas SMY2 and SYH1 were amplified from genomic DNA of yeast strain S288C. For GST fusion protein expression, the fragments of PERQ2, SYH1, and SMY2 were cloned into pGEX4T-1 (Amersham Biosciences) via BamHI and XhoI restriction sites, and the fragments of GYN4 were cloned via BamHI and NotI restriction sites. For yeast two-hybrid screens the GYF domains of PERQ2 and GYN4 were cloned into the bait vector pG-BKT7 (Clontech) via NcoI and BamHI and via NcoI and NotI restriction sites, respectively. For the yeast two-hybrid analysis of selected candidates, the same fragment of PERQ2 was also introduced into the prey vector pGADT7 (Clontech). Fragments of the candidates NPWBP (Swiss-Prot accession number Q9Y2W2; residues 388 -551), SmB (Swiss-Prot accession number P14678-2; residues 148 -231), and SWAN (Swiss-Prot accession number Q9NTZ6; residues 700 -869) were amplified from the following I.M.A.G.E. Consortium (Lawrence Livermore National Laboratory) cDNA clones (15) obtained from the Deutsches Ressourcenzentrum fü r Genomforschung GmbH: IRALp962P0114Q2 (NPWBP), IMAGp998D118415Q3 (SmB), and IRALp962K1725Q2 (SWAN). The respective I.M.A.G.E Consortium clone identification numbers are 3829990, 3445210, and 3956772. These fragments, comprising the proline-rich regions of the respective proteins, and the cytoplasmic tail of CD2 (Swiss-Prot accession number P06729; residues 245-351) were inserted into pGBKT7 via NcoI-NotI or EcoRI and BamHI restriction sites in the latter case.
The His 6 -tagged GYF domain of SMY2 was expressed from a modified pET28 vector, containing an N-terminal His 6 tag, followed by a thrombin cleavage site and BamHI and XhoI restriction sites, which allowed cloning similar to that for the GST fusion construct; for His 6 -tagged PERQ2, the corresponding fragment was cloned into the pTFT74 vector (16) via NcoI and HindIII sites with an N-terminal His 6 tag introduced via PCR.
Cloning of the focused library RKRSHRXXPPPXXXVQ into PC89 was similar to the procedure described elsewhere (17). PC89 and the PC89 nonapeptide library (X 9 ) were a gift from Gianni Cesareni (Dipartimento di Biologia, Universita di Roma).
Protein Preparation-Proteins were expressed in Escherichia coli BL21 (DE3-pLys S) and purified from the soluble fraction after sonication. GST-GYF and His 6 -tagged GYF domains were purified by affinity chromatography using glutathione-Sepharose and Ni 2ϩ -nitrilotriacetic acid-agarose according to the manufacturer's manual (Amersham Biosciences), respectively, and dialyzed against PBS. To obtain NMR samples for titration experiments and backbone resonance assignments of the SMY-GYF domain, cells were grown on defined media supplemented with [ 15 N]NH 4 Cl and/or [ 13 C]glucose. The His 6 tag of SMY2-GYF and GST tag of SYH1-GYF were removed by thrombin cleavage (Calbiochem, 10 units/mg of protein, at 4°C and 16°C overnight in PBS, respectively), and the domains were purified by subsequent gel filtration (Superdex® 75, Amersham Biosciences) in PBS.
Phage Display-Phage displaying the nonapeptide (X 9 ) or the focused peptide library fused to the major capsid protein were produced by transforming E. coli XL-1 Blue cells with PC89 constructs followed by superinfection with the VCS-M13 helper phage (Stratagene). After overnight incubation in 2ϫ YT medium in the presence of ampicillin and kanamycin (30°C at 270 rpm) phage particles were purified by three successive polyethylene glycol/NaCl precipitations (18). Library screening was performed as follows. 30 -50 l of GST-GYF-loaded glutathione-Sepharose 4B beads (Amersham Biosciences) were incubated with 5 ϫ 10 9 -5 ϫ 10 11 infectious particles at 4°C overnight in PBS. After washing three times with PBS, bound phages were eluted with 100 mM glycine HCl, pH 2.2, and neutralized with 2 M Tris. For phage amplification, E. coli XL-1 Blue cells were infected with eluted phage followed by superinfection with helper phage and subsequent incubation as described above. After six rounds of panning, the inserts of selected phage were sequenced to identify their displayed peptides.
NMR Spectroscopy-The NMR experiments were performed at 297 or 299 K on either a Bruker DRX600 or DMX750 instrument equipped with standard triple resonance probes. Data processing and analysis were carried out with the XWINNMR (Bruker) and SPARKY (40) software packages. Backbone assignment of the SMY2-GYF domain was based on the CBCA(CO)NH (21), the CB-CANH (22), and the HNCO (23) experiments of SMY2-GYF in PBS. In the NMR titration experiments, increasing amounts of synthetic peptides (see Table IV) were added to a 0.2 mM sample of the 15 N-labeled CD2BP2-or SMY2-GYF domain. The gradual change of chemical shifts in the heteronuclear single quantum coherence spectra allowed the resonances of the ligand-bound GYF domain to be unambiguously assigned. The sum of the chemical shift changes for 15 Table IV) at 25°C on a PerkinElmer Life Sciences LS-50B fluorometer, and the emission spectra were recorded between 300 and 400 nm. Centroid shifts were calculated using the software SpecWin (a kind gift of Sebastian Modersohn). Binding data were analyzed as described above.
Yeast Two-hybrid Experiments-Yeast two-hybrid experiments were performed using MATCHMAKER GAL4 two-hybrid system 3 according to the manufacturer's manual (Clontech). For library screens, the pGBKT7 bait construct encoding for the GAL4 DNAbinding domain fused to the GYF domain of PERQ2 or GYN4, either with or without its PRS extension, was introduced into the yeast strain AH109 followed by the transformation with the human lung cDNA library in the pGAD vector (Clontech) or the Horwitz and Ma A. thaliana two-hybrid library (Arabidopsis Biological Resource Center) (24), respectively. Plasmids of cotransformants growing on synthetic defined medium deficient for His, Leu, and Trp or Ala, His, Leu, and Trp were rescued from yeast according to a modified MATCHMAKER protocol. After incubation with lyticase and lysis of cells with SDS and freeze-thawing, lysates were mixed with N3 buffer of the Qiagen plasmid isolation kit, and plasmid preparation followed the protocol thereof. Selected candidates were sequenced to identify the polypeptide interacting with the respective GYF domains. For the analysis of suggested interaction partners and for reconfirming candidates from the yeast two-hybrid screen, the corresponding bait/prey vector combinations were introduced into yeast and cultured on synthetic defined medium as described above.
25 l of glutathione-Sepharose 4B beads (Amersham Biosciences), loaded with GST or GST fusion proteins of the respective GYF domains (SYH1 and SMY2), were incubated with 100-l lysates at 4°C overnight in the absence or presence of 1 mM competing peptide MSL5L1 (see Table IV). Beads were washed three times with PBS. Bound proteins were eluted in SDS sample buffer, separated by SDS-PAGE, and transferred onto nitrocellulose membrane. HAtagged MSL5 or EAP1 was detected by probing the membrane with anti-HA antibody (BD Biosciences) and a horseradish peroxidaseconjugated secondary antibody. The immunoblots were then developed as described for the SPOT analysis.

RESULTS
Phage Display-GYF domains from evolutionary distant species of the eukaryotic kingdom were chosen for analysis. The sequence alignment of these domains is shown in Fig. 1 and highlights the conserved amino acids of the N-terminal part of the domains. In the case of GYN4 and SYH1, two GYF domain constructs of varying length were used to account for the presence of an internal PRS that is localized directly adjacent to the anticipated domain borders (Fig. 1). The phage libraries applied for the screen were of the format X 2 PPPX 3 or X 9 , and the peptides were expressed as gene VIII fusion proteins (17). Individual clones were sequenced after six rounds of panning, and the results are summarized in Table I. As can be seen, the obtained sequence motifs for both libraries are similar, and the preference for the PPG core motif is observed for all of the GYF domains. A strict requirement for a third consecutive proline is not seen in the case of the X 9 library and is in agreement with previous structural data of the CD2BP2-GYF domain⅐CD2 peptide complex, showing that the PPG motif contributes most of the van der Waals interactions (10). The general requirement for the PPG motif suggests that the binding mode for the four GYF domains and CD2BP2-GYF is similar. The requirement for only two prolines Conserved residues are highlighted as white letters on a black background. Residues characteristic for the two subfamilies of GYF domains are shown as white letters on a gray background. Sequences and numbers in brackets belong to the respective single chain constructs used in this study (GYN4-PR and PR-SYH1) or analyzed previously (CD2BP2). Note that for CD2BP2, two single chain constructs are displayed, either with an N-terminal or with a C-terminal extension (27). Numbers above the alignment refer to the CD2BP2-GYF domain; numbers flanking the sequences indicate the position within the respective full-length proteins. For CD2BP2-GYF single chain constructs, linker residues and the GYF domain-binding motif from CD2 are depicted as small italicized letters in regular type and in bold on a gray background, respectively. Residues depicted as bold capitals on a gray background comprise intramolecular binding motifs for the GYF domains of GYN4 and SYH1. Secondary structure elements are depicted above the alignment. raises the question, however, whether formation of the polyproline II helix is a prerequisite for binding of ligands to GYF domains in general.
For the position directly C-terminal to the PPG motif, the GYF domains of SMY2, SYH1, and PERQ2 seem to accept most hydrophobic amino acids with a preference for leucine or isoleucine. For the GYN4-GYF domain (containing amino acids 546 -604 of the full-length protein), phenylalanine is the preferred amino acid at this position, whereas a longer construct (GYN4-PR; amino acids 546 -619) that includes the proline-rich sequence KSGPPPGFTG failed to select for peptides defining a consensus (Table I and data not shown). Interestingly the internal PRS comprises the consensus motif PPGF for this GYF domain as can be seen from Table I. Because previous experiments from our group have shown that a C-terminally linked proline-rich ligand can bind intramolecularly to the CD2BP2-GYF domain ( Fig. 1) (27), we suggest that the intramolecular binding of the ligand in GYN4-PR masks the GYF domain binding site accordingly. In contrast, the presence of the internal PRS directly N-terminal to the Phage display results for various GYF domains after six rounds of panning. Glutathione bead-coupled GST-GYF domain fusions from the proteins indicated in the first row were used for the selection procedure. PR-SYH1 includes the N-terminal intramolecular PRS. The triple proline stretch in the focused library as well as prolines of the binding motif from the X 9 library are depicted in white letters on a gray background. Residues other than proline that show significant presence in both screens or that are characterized by chemical properties similar to the frequently found amino acids are shown in white on a black background. Based on this classification the strict consensus sequence for each GYF domain was defined as shown in the row "Strict Consensus." For the relaxed consensus, additional amino acid types at the last position of the consensus were included to comprise also the essential binding motifs derived from substitution analysis (last row). For both strict consensus and relaxed consensus, the second position following the PPG motif was excluded from all consensus sequences due to substitution analysis results (see Fig. 2).
GYF domain in SYH1 does not interfere with phage selection (Table I). This is in agreement with experiments of CD2BP2-GYF domain single chain constructs, showing that a short linker sequence is not sufficient to allow the intramolecular interaction between an N-terminal binding motif and the GYF domain to take place (27).
Analysis of Phage Display Peptides-Substitution analysis of individual peptides selected by phage display were conducted to identify residues within the peptide that are important for binding. Therefore, peptides were synthesized on a cellulose membrane, and the binding to GST-GYF domains was analyzed by anti-GST antibody detection (Fig. 2). The substitution analysis highlights the requirement for the PPG(F/ I/L/V) motif for the GYF domains of PERQ2, SMY2, and SYH1. A large hydrophobic amino acid is clearly preferred as the most C-terminal amino acid of the motif in these cases, whereas GYN4-GYF exclusively binds to the PPGF motif. For all four domains, amino acids outside of the central PPG⌽ motif of the phage display-derived peptides are not critical for binding. This suggests that hydrophobic interactions almost entirely account for the observed binding behavior. In summary, these results identify the general recognition code for the SMY2 subfamily of GYF domains to be PPG(F/I/L/M/V) with subtle domain-dependent variations in specificity (Table  I and Fig. 2).
Proteome Analysis for GYF Domain Binding Sites-The phage display results allowed us to define the recognition signature for each individual domain (Table I, strict consensus). Strict consensus sequences included the most frequently found PPGX motifs or motifs with amino acids at position X that show physicochemical properties similar to the frequently found amino acid. The relaxed consensus motifs comprised the essential binding motifs derived from the substitution analysis. Additional positions were not taken into account for both strict and relaxed consensus because the substitution analysis results showed that the PPGX motif is the major determinant for binding. We performed database searches in the Swiss-Prot/TrEMBL, The Arabidopsis Information Resource (TAIR), and SGD databases for the different GYF domains with relaxed and strict consensus motifs (Table  I) following a strategy that has also been used by others (28). In the latter two cases, motifs were only selected if present twice in the protein with a maximal linker distance of 40 amino acids. This accounts for avidity effects of tandem binding motifs and reduces the number of database hits below 1000. Based on our phage display results, database searches were also performed with the strict motifs PPGL and PPGF for the GYF domains of PERQ2 and GYN4, respectively, to identify optimal individual binding motifs. All human proteins (except for procollagens, collagens, and hypothetical proteins) were considered in the search for PERQ2. 349 non-redundant motifs in 152 proteins for the relaxed and 166 non-redundant motifs in 157 proteins for the strict consensus were found. Correspondingly for A. thaliana the relaxed motif was found 34 times in 15 proteins, and the strict FIG. 2. Substitution analysis of peptides that were selected by phage display. All possible single substitution analogs of the peptide were synthesized on a membrane. The single letter code above each column indicates the amino acid that replaces the corresponding wild-type residue; rows define the position of the substitution within the peptide. Spots in the most left column (WT) are identical and represent the wild-type peptide. The membranes were incubated with GST-GYF fusion constructs of the indicated proteins and processed as described under "Materials and Methods." PR-SYH1 includes the N-terminal intramolecular PRS of the protein (Fig. 1).
consensus was found 227 times in 221 proteins of the A. thaliana proteome. For S. cerevisiae, 168 motifs in 153 sequences containing the relaxed consensus were analyzed. To test binding, the peptides obtained from the database searches were synthesized on a membrane and incubated with the respective GST-GYF domain constructs. Fig. 3 shows the results for the various domains and clearly indicates that many of the sequences deduced from the database bind to the GYF domains (Supplemental Table 1). To validate the results from the proteome analysis, several interaction candidates for the different GYF domains were chosen for further analysis according to signal strength and number of consensus motifs present within the proteins (Table II). In addition, fluorescence and NMR titrations of GYF domains with selected peptides were performed. The dissociation constants for the GYF domain peptide interactions are in the range of 4 -300 M (see Table IV and Supplemental Fig. 1) when a simple two-state binding model was assumed.
Pull-down Experiments: Novel Interaction Partners of the Yeast GYF Domains-From the proteome analysis a large set of interactions was derived for the individual GYF domains. For the two yeast GYF domains of SMY2 and SHY1, the most likely interaction candidates are MSL5, EAP1, and PRP8 with seven, four, and three motifs fitting to the relaxed consensus, respectively (Table II), whereas three other proteins contain the motif twice. SPOT analysis confirmed that several of the PRSs in the yeast proteins MSL5 and EAP1 present bona fide binding sites for SMY2 and SYH1 in vitro, and avidity effects likely contribute to enhanced binding in vivo. To validate MSL5 and EAP1 as interaction partners, we performed pull- Selected candidates obtained from the proteome SPOT analysis of proline-rich peptides comprising the identified recognition motifs are shown. For the different GYF domains, candidates were selected from the respective proteome according to binding strength and number of motifs present. Portions of the proteins that comprise the putative binding sites are shown, and motifs fitting to the indicated relaxed or strict consensus motifs are depicted as underlined or bold and underlined letters, respectively. The indicated fragments of candidates for PERQ2 were tested for binding in the yeast two-hybrid system with CD2 and lamin C as controls. Numbers in parentheses below the protein names refer to the position of the cloned region within the full-length protein. The two candidates MSL5 and EAP1 were subjected to pull-down analysis with SYH1 and SMY2. down experiments with the two GST-tagged yeast GYF domains and lysates of cells overexpressing HA-tagged MSL5 or EAP1.
The experiment confirmed the interaction between the yeast GYF domains and both MSL5 and EAP1 (Fig. 4, lanes 3  and 5), and the interactions could be competed out with the proline-rich peptide MSL5L1 (Fig. 4, lanes 4 and 6, and see Table IV). This establishes that the PRSs of MSL5 and EAP1 mediate the direct interaction with SMY2 and SYH1. Interestingly MSL5 has been reported as an interaction partner for both SMY2 and SYH1 in a yeast two-hybrid screen (see the SGD at www.yeastgenome.org), further supporting our findings. EAP1 was described as a protein binding to the capbinding protein eIF4E (25), and subsequently a role in translational attenuation in response to vesicular transport defects has been reported (29). Because SMY2 was cloned as a suppressor of the myo2-66 mutation of type V myosin (14), a motor protein playing a decisive role in vesicular transport, this interaction establishes a particularly interesting link between transport processes and splicing or translational control. From a biochemical point of view, the binding of identical target proteins by SMY2-and SYH1-GYF is not surprising because the two GYF domains are very homologous. The biological implications of this redundancy are not clear, but the identified interaction partners hint to a role of SMY2 and/or SYH1 in processes that are coupled to the transport of splicing factors and translational attenuators.
Yeast Two-hybrid Analysis: Novel Interaction Partners of the PERQ2-and GYN4-GYF Domains-For the GYF domain of the human PERQ2 protein we chose the three nuclear proteins SmB, NPWBP, and SWAN for further investigations by yeast two-hybrid analysis. As can be seen from Fig. 5, an interaction with PERQ2-GYF could be established for all three proteins, and the results show that the proposed interaction can take place under more physiological conditions. In addition to this knowledge-driven identification of binding partners, we performed a yeast two-hybrid screen for the human PERQ2-GYF and the A. thaliana GYN4-GYF domain. Unbiased screening of a human lung cDNA library for interaction partners of the PERQ2-GYF domain resulted in the identification of a number of putative interactors, including the small nuclear ribonucleoproteins SmB/BЈ, the closely related snRNP-associated protein N (snRNP-N, SmN), U1 snRNP protein C, and the SWAN protein ( Fig. 6A and Table III). These findings are in good agreement with the interaction partners that were proposed by the knowledge-based strategy (Figs. 3 and 5) described elsewhere (28) because all of the above mentioned proteins contain the phage display-derived recognition motif (Table I). SmB/BЈ was previously identified as a binding partner of CD2BP2 (13), and it will be interesting to further investigate whether SmB/BЈ and SmN also attract the PERQ2-GYF domain in vivo. Such an interaction would imply that the two GYF domain-containing proteins converge in the recognition of proteins of the SmB/SmN family. Because U1 snRNP protein C and SWAN also represent nuclear proteins and because NPWBP colocalizes with splicing factors (30), a function of PERQ2-GYF in splicing or splicing-associated processes is well conceivable. Initial localization experiments with GFP-PERQ2-GYF support this notion, showing a predominant nuclear localization (data not shown). However, the mouse homolog of PERQ2, GIGYF2, was identified as a binding partner of the cytosolic adaptor protein Grb10 by a yeast two-hybrid screen. Because human Grb10, in contrast to mouse Grb10, is devoid of a PPG⌽ recognition motif, the function of the human PERQ2 and the mouse GIGYF2 protein may well be different.
For the GYN4-GYF domain of A. thaliana, the yeast twohybrid screen resulted in the selection of At2g28540.1, At3g45630.1, At5g60170.1, and At5g65410.1 (Fig. 6B and Table III). All of these proteins contained the PPGF motif that was also selected by the phage display screen (Table I). Interestingly the first three proteins obtained from the yeast two-hybrid anal-  (Table II)  ysis contained two PPGF motifs spaced by six or seven amino acids (Table III), a pattern previously found in the CD2BP2-GYF ligand CD2 (1). These proteins were also identified by our proteome analysis as potential interaction partners (Fig. 3, strict consensus) and show that the two complementary methods result in the identification of an overlapping set of proteins. One protein, Atg45630.1, has the same domain composition but a low overall sequence homology to the CNOT4 protein (39.5% similarity). CNOT4 has been described to be part of the CCR4⅐NOT transcription complex (31). It is a potential transcriptional repressor and possesses E3 protein ubiquitin ligase activity. The biochemical characterization of this complex in plant is elusive, but our data hint toward an involvement of GYF domain-containing proteins in transcriptional regulation of A. thaliana gene expression. Based on these findings, the binding of the human CNOT4 protein to GYF domain-containing proteins should be further tested for its potential biological relevance since the human protein contains the PERQ2-GYF domain recognition motif PPGL.
NMR Analysis of the SMY2/MSL5 Interaction-To obtain a detailed molecular picture of the interaction between a GYF domain of the SMY2 subfamily of GYF domains and a PRS, backbone assignments of NMR heteronuclear triple resonance spectra of the 13 C/ 15 N isotope-labeled SMY2-GYF domain were performed. This allowed us to map the binding site for the MSL5S1 peptide (Table IV). The overlay of spectra of isolated SMY2-GYF (red) with SMY2-GYF domain in the presence of equimolar amounts (green) and a 10-fold excess (blue) of MSL5S1 peptide is shown in Fig. 7A. The deduced NH chemical shift changes of residues of the SMY2-and CD2BP2-GYF domain upon addition of MSL5S1 and CD2S peptides, respectively, are depicted in Fig. 7B. Most of the residues that show large chemical shift changes are part of the characteristic bulge-helix-bulge motif within the N-terminal half of the domains and are indicated as green bars. The three residues Leu-289, Gln-290, and Ile-291 in the C-terminal half of SMY2-GYF display significant chemical shift changes, but the homologous residues in CD2BP2-GYF only show small changes (Fig. 7B, blue bars). Because large chemical shift changes are indicative of residues that are close in space to the ligand, the data can be qualitatively used to compare the binding epitopes of SMY2 (or homologous SMY2-type GYF domains) and CD2B2-GYF. The structure of one SMY2type GYF domains was recently deposited in the Protein Data Bank (GYF domain from the A. thaliana protein Q9FT92, Protein Data Bank code 1WH2), 2 and based on sequence comparisons ( Fig. 1) we plotted the results for the SMY2-GYF domain (Fig. 7B) onto the surface of the GYF domain of Q9FT92 (Fig. 7C). Residues forming the novel binding epitope, as depicted in blue in Fig. 7, B and C, are partially solvent-exposed in the case of SMY2-like GYF domains but not in CD2BP2-GYF, which might explain the differences seen in the binding preferences for the two types of GYF domains (see "Discussion").
Intramolecular Recognition of Proline-rich Motifs by GYF Domains-The two proteins GYN4 and SYH1 display internal sequence motifs that match the consensus of the respective GYF domain. Fig. 3 shows that the two GYF domains recognize these motifs when synthesized on a membrane. The presence of the internal PRS in the GYF domain construct PR-SYH1 had no influence on phage display (Table I) or pull-down experiments (Fig. 4), whereas the KSGPPPGFTG sequence of GYN4 interfered with binding in phage display (data not shown) and yeast two-hybrid experiments (Fig. 6). In contrast, the shorter GYN4-GYF construct allowed us to define a recognition motif (Table I) and to identify interaction partners (Fig. 6). Misfolding of the domain due to the prolinerich extension could be ruled out because the GST fusion protein of GYN4-PR was able to bind to peptides synthesized on a membrane where concentrations are high enough to compete out the intramolecular interaction (data not shown). Taken together, these findings suggest that interaction in cis between the GYN4-GYF domain and the internal PRS masks its binding site, whereas for the SYH1 protein the intramolecular sequence does not influence the interaction competence of its GYF domain. Further support for intramolecular encounter between GYN4-GYF and its internal PRS is derived from fluorescence titration experiments (Fig. 8). For the GYF domain-only construct (residues 546 -604 of GYN4), a K D value of ϳ4 M for binding to the GYN4 peptide was obtained (Table  IV). The presence of the internal recognition motif within the construct GYN4-PR decreased the affinity to the titrated GYN4 peptide by a factor of 8. In addition, the spectrum of the GYN4-PR and the shorter construct in its bound form display similar centroid positions (Fig. 8). The most likely explanation for the different behavior of GYN4-PR is the intramolecular binding of a large fraction of the GYF domain to the internal PRS. Additional binding is only observed at a large excess of externally added peptide. The assumption of an intramolecular interaction is substantiated by studies on artificial CD2BP2-GYF single chain constructs with N-or C-terminally linked interaction sites (Fig. 1) that were used to obtain NMR restraints for structural investigations of the CD2BP2/CD2 interaction (27). Only C-terminal linkage resulted in the formation of a soluble, intramolecular interaction and supported the view that the CD2 peptide SHRPPPPGHRV binds exclusively in one orientation to the CD2BP2-GYF domain (27). Similarly the linker between the GYF domain and the C-terminal interaction site within the GYN4 protein is of sufficient length to enable the autoinhibited conformation, whereas the attachment of a PRS directly N-terminal to the GYF domain, as it is Interaction partners for PERQ2-and GYN4-GYF identified by yeast two-hybrid screening of a human lung and an A. thaliana cDNA library, respectively. Protein codes refer to Swiss-Prot/TrEMBL and TAIR accession codes and names. In the column "Total/Individual," the total number of identified clones encoding for the same interaction partner found on selective plates and the number of individual clones of the respective interaction partners are shown. Residue numbers and sequences of the protein fragments identified in all clones for the individual proteins are listed in the columns "Region" and "Sequence," respectively. For 10 of the 15 clones of At2g28540.1, the insert was too long to allow complete sequencing of the indicated fragments with the standard forward sequencing primer. present within the SYH1 protein, excludes such an encounter. The observation of intramolecular masking is likely to be of functional relevance and might prevent unwanted low affinity interactions from taking place. Co-compartmentalization of a protein containing a high affinity binding site can be envisaged to outcompete the intramolecular interaction under appropriate conditions. In addition, post-translational modification of residues close to the binding site, as for example phosphorylation or methylation, might act as regulatory switches for the GYF domain-mediated interactions.

DISCUSSION
In this first comprehensive study of GYF domains of the SMY2 subfamily, it is shown that PRS recognition of these domains converges on the four-amino acid recognition motif PPG⌽. Because all of the analyzed GYF domains contain the conserved signature (Y/F/W)X(Y/F)X 6 -11 GPFX 4 (M/V/I)X 2 WX 3 -GYF, it is very likely that the hydrophobic cavity formed mainly by the aromatic side chains of this signature constitutes the main side of interaction with the PPG⌽ moiety. Albeit there is no three-dimensional complex structure available for a GYF domain of the large SMY2 subfamily, the high conservation of the peptide-binding signature of GYF domains allows us to put our new results into a structural context and to compare the recognition code defined in this study to the well characterized CD2BP2-GYF⅐CD2 peptide complex (see below). Defining the recognition rules sets the stage for further functional investigations of GYF domain-containing proteins. While our results reveal a considerable number of possible protein targets for GYF domains, yeast two-hybrid analysis and pulldown experiments indicate that the selected interactions could take place under more physiological conditions.
PERQ2-It was of some surprise to find the core splicing protein SmB/BЈ as the most prominent interaction partner obtained from the yeast two-hybrid analysis of the PERQ2-GYF domain. SmB/BЈ contains numerous proline-rich motifs in its C-terminal tail that represent several binding sites for GYF domains. It was previously shown that CD2BP2 and SmB/BЈ interact and colocalize within the same nuclear compartment and that the PPPGMR motifs within the SmB/BЈ tail represent the target sites for CD2BP2-GYF (2,13). According to our data, PERQ2-GYF can also bind to these PPPGMR motifs. It is therefore plausible that CD2BP2 and PERQ2 can simultaneously interact with SmB/BЈ under conditions where SmB/BЈ is not limiting. Alternatively CD2BP2 could be the major interaction partner for SmB/BЈ in most cells, whereas PERQ2 could preferentially bind to SmN, the brain-specific variant of SmB/BЈ. Because the two human proteins CD2BP2 and PERQ2 display no sequence homology in regions outside the GYF domain, functional redundancy of the two proteins is unlikely. It is rather possible that the two proteins utilize the SmB/BЈ recruitment in a different functional context. The third human protein containing a GYF domain is PERQ1. Its GYF domain shows a high degree of sequence homology to PERQ2-GYF and is therefore expected to exhibit very similar binding specificities. Competition for the same interaction partners, as for example SmB/BЈ, is a possible scenario for PERQ1 and PERQ2 proteins, and the presence of homologous regions in the two proteins suggests similar but not necessarily identical functions. The role of SmB/BЈ as part of the seven-protein core of all snRNP complexes is firmly established and supports its role as an important structural component of the spliceosomal machinery (32). A recent report identified SmB/BЈ at the sites of initial adhesion of certain primary cells, so-called spreading initiation centers (33). Strikingly no mRNA was found at these sites, opening the possibility that SmB/BЈ and also other Sm proteins play an important role in processes other than splicing. It is also known that the snRNPs undergo a complex maturation process in Homo  . NH resonances of residues with weighted geometrical differences of sapiens that includes shuttling between the nuclear and cytoplasmic compartment (32), and the GYF domain-containing proteins could be involved in transport and/or localization of SmB/BЈ and associated proteins. Interestingly Bedford et al. (34,35) have described the proline-rich motifs RPP and PPR within NPWBP and SmB/BЈ as interaction sides for the WW domains of FBP21 and FBP30. Furthermore these motifs are also recognized by the SH3 domain of Fyn and p85, and assembly of different signaling proteins with these sequences has been suggested (34). Our results extend the list of prolinerich sequence recognition domains with similar or overlapping binding motifs to include subgroups of WW, SH3, and GYF domains. Several different signaling pathways could converge on the protein SmB/BЈ mediated by the interactions of FBP21 and FBP30 WW domains, SH3 domain-containing proteins, and CD2BP2-and PERQ2-GYF. However, additional information concerning the subcellular localization of PERQ2 is required to validate the proposed functions of PERQ2 associated with SmB/BЈ or its variants.
Yeast GYF Domains-In yeast, two-hybrid experiments have previously identified MSL5 as an interaction partner of the homologous proteins SMY2 and SYH1 (see www.yeastgenome.org/). Our analysis shows that the GYF domains of the two proteins strongly interact with the PPG⌽ motif-con-taining MSL5 C-terminal tail. MSL5 is the yeast branch pointbinding protein that is involved in spliceosome assembly (36). Global analysis of yeast protein localization suggests a predominant nuclear occurrence of MSL5, whereas SMY2 and SYH1 are mostly found in the cytoplasm (37). A candidate for a cytoplasmic interaction partner of SMY2 and SYH1 is EAP1 that could also be detected by pull-down experiments of cellular lysates (Fig. 4). EAP1 was originally found as a translational inhibitor protein that competes with eIF4G and p20 for interaction with the eIF4E cap-binding protein. In addition, EAP1 has a separate function in mediating genetic stability (38), and a recent report highlights the involvement of EAP1 in the attenuation of translation in cells with mutations in the secretory pathway (29). Further investigations should address the question whether the two GYF domain-containing proteins are involved in transport processes that depend on the cap structure of mRNA.
Comparing Different Classes of GYF Domains-The SMY2and CD2BP2-GYF domains represent two subfamilies of GYF domains. All the GYF domains investigated here belong to the SMY2 type. This large subfamily is characterized by a short ␤1-␤2 loop and an aspartate as the last residue of the ␤1 strand (position 8), whereas the CD2BP2-GYF domain contains a tryptophan (Trp-8) at this position. Trp-8 contributes to binding of the CD2 ligand SHRPPPPGHRV by directly contacting glycine 8 and arginine 10 of this peptide. At the same time, the bulky Trp-8 side chain of the domain shields the conserved Tyr-6 and Phe 34 residues from solvent. A recent NMR structure of the A. thaliana Q9FT92-GYF domain (Protein Data Bank code 1WH2) shows that Asp-8 allows a partial solvent exposure of the conserved aromatic residues Tyr-6 and Phe-34, which are now able to contribute to the hydrophobic surface that most likely represents the interaction site for the proline-rich ligand. Fig. 7C shows a comparison of the molecular surface of the CD2BP2-GYF domain and the Q9FT92-GYF domain from A. thaliana. The binding epitope for Q9FT92-GYF/ligand interactions is not known; however, alignment of the SMY2-and Q9FT92-GYF domain (Fig. 1) was used to color the suggested Q9FT92 epitope based on the the chemical shifts colored in green or blue in B are labeled according to residue type and number. B, combined chemical shift changes of CD2BP2-GYF and SMY2-GYF for their respective ligands, CD2S and MSL5S1. The weighted geometrical differences of the chemical shifts for each residue of the domains upon addition of a 10-fold excess of peptide are plotted against the corresponding sequence. Residues of both domains (black for CD2BP2-GYF and red for SMY2-GYF) have been aligned for a better comparison. The key residues that define the conserved GYF domain signature are depicted as white letters on a black background. Weighted geometrical differences of the chemical shifts of residues comprising the common binding epitope of CD2BP2-GYF and SMY2-GYF are depicted as green bars. Blue bars belong to residues that extend the binding epitope in SMY2-GYF. The NMR titration experiments were performed at 297 K with 0.2 mM sample of the 15 N-labeled CD2BP2-or SMY2-GYF domain in 50 mM sodium phosphate buffer, pH 6.3, and PBS, pH 7.3, respectively. C, proposed model of ligand binding for the SMY2 subfamily in comparison to the known orientation of the CD2BP2-GYF ligand CD2. Because the structure of the SMY2-GYF domain is not known so far, the binding epitope of SMY2-GYF⅐MLS5S1 was plotted onto the structure of Q9FT92-GYF based on sequence homology (Fig. 1). The known orientation of a CD2 class ligand bound to CD2BP2-GYF (left) and the suggested orientation of a ligand for the SMY2 domain (right) is depicted as a red line above the indicated epitopes. The binding epitopes are color coded as in B. Residues Lys-7, Pro-19, and Pro-33 in CD2BP2-GYF and the corresponding residues in Q9FT92 are also depicted in green because they are part of the binding epitope. Prolines cannot be observed in 15 N heteronuclear single quantum coherence spectra, whereas the backbone NH resonance of Lys-7 displayed line broadening and could therefore not be followed during the titration. results for SMY2 (Fig. 7B). The substitution W8D in Q9FT92 extends the hydrophobic cleft in the A. thaliana GYF domain, probably allowing the ligand to bind along its main axis as indicated by the red line (Fig. 7C). This mode of binding is supported by the observation of significant chemical shift changes for residues 38 -40 of the SMY2-GYF domain (indicated as a blue surface) when interacting with the MSL5S1 peptide. A reorganization of the binding pocket by the W8D replacement could explain the preference for the large hydrophobic residues Phe, Leu, and Ile that was found in the consensus motifs of all GYF domains analyzed in this study. On the other hand, our results with CD2BP2-GYF show that the PPGW motif represents an optimal binding motif (39) that is rarely found in the phage display sequences of the SMY2 subfamily of GYF domains. We propose that the tryptophan of the ligand forms a "stacked" aromatic interaction with the Trp-8 of CD2BP2-GYF at the surface of the domain, whereas it is too bulky to be optimally placed in the deep binding pocket of the SMY2-GYF subfamily. In conclusion, the data suggest a common binding mode for PPG⌽-containing ligands of the SMY2 subfamily of GYF domains. For the CD2BP2-GYF domain, two classes of ligands were identified, the PPGW class and the PPGX(R/K) class (CD2 class), that require a positively charged amino acid within the flanking regions of the PPG core. The PPGW class for the CD2BP2-GYF domain and the PPG⌽ class of ligands are chargeindependent and dominated by hydrophobic interactions. However, although stacked aromatic interactions at the surface of the domain probably characterize the CD2BP2-GYF/ ligand interaction, the hydrophobic residue of the PPG⌽ motif is expected to insert into the extended hydrophobic cleft of the SMY2-GYF domain family (Fig. 7C).