Defining the Specificity Space of the Human Src Homology 2 Domain*

Src homology 2 (SH2) domains are the largest family of interaction modules encoded by the human genome to recognize tyrosine-phosphorylated sequences and thereby play pivotal roles in transducing and controlling cellular signals emanating from protein-tyrosine kinases. Different SH2 domains select for distinct phosphopep-tides, and the function of a given SH2 domain is often dictated by the specific motifs that it recognizes. Therefore, deciphering the phosphotyrosyl peptide motif recognized by an SH2 domain is the key to understanding its cellular function. Here we cloned all 120 SH2 domains identified in the human genome and determined the phosphotyrosyl peptide binding properties of 76 SH2 domains by screening an oriented peptide array library. Of these 76, we defined the selectivity for 43 SH2 domains and refined the binding motifs for another 33 SH2 domains. We identified a number of novel binding motifs, which are exemplified by the BRDG1 SH2 domain that selects specifically for a bulky, hydrophobic residue at P (cid:1) 4 relative to the Tyr(P) residue. Based on the oriented peptide array library data, we developed scoring matrix-assisted ligand identification (or SMALI), a Web-based program for predicting binding partners for SH2-containing proteins. When applied to SH2D1A/SAP (SLAM-associated protein), a protein whose mutation or deletion underlies the X-linked lymphoproliferative syndrome, SMALI not only recapitulated known interactions but also identified a number of novel interacting proteins for this disease-associated protein. SMALI also identified a T-cell kinase; Nemo, NF-kappa-B essential modu- lator; RIN, Ras and Rab interactor.

The SH2 1 domain is a prototypical protein interaction module initially identified as a conserved, non-catalytic region in the Src family of cytoplasmic kinases (1). By virtue of their ability to bind tyrosine-phosphorylated polypeptides (2)(3)(4)(5)(6), SH2 domains are involved in signaling processes that regulate cell growth or differentiation, protein ubiquitination and/or degradation, gene transcription, and cytoskeletal rearrangement (7,8). Although SH2 domains have been identified in some protozoa (9), they have undergone a marked expansion in multicellular animals. A human cell is estimated to harbor 120 nonredundant SH2 domains in 110 unique proteins. These proteins include kinases, phosphatases, cytoskeletal proteins, regulators of small GTPases, and E3 ubiquitin ligases among others. Not surprisingly, mutations in genes encoding some SH2 domain proteins are associated with human diseases (7). For instance, missense mutations within the N-terminal SH2 domain of PTPN11/SHP2 are associated with the Noonan syndrome (10), nonsense mutations in the SH2 domain of RASA1/RasGAP underlie basal cell carcinoma (11), and a mutation or deletion of the SH2D1A SH2 domain causes the XLP syndrome (12).
All characterized SH2 domains share the same structure of a central antiparallel ␤-sheet flanked by two ␣-helices (13)(14)(15)(16). All but a few SH2 domains contain a conserved Arg residue (corresponding to Arg 175 in v-Src) at the ligand-binding site that is essential for Tyr(P) recognition. Despite these common features, different SH2 domains recognize distinct peptide sequences. In general, the specificity of an SH2 domain is governed by 3-5 residues C-terminal to the Tyr(P). Because a peptide usually lies perpendicular to the central ␤-sheet of the bound SH2 domain in an extended conformation, the interaction between the SH2 domain and the peptide is largely independent of the context of the native protein from which the peptide is taken. This makes it possible to study the specificity and binding mechanism of an SH2 domain using short synthetic peptides. Indeed much of our current knowledge on SH2 domain specificity has been gleaned from the use of peptides or peptide libraries (17,18). Cantley and colleagues (18) carried out the first large scale characterization of SH2 domain specificity more than a decade ago for a group of 14 SH2 domains. An oriented synthetic peptide library containing the degenerated sequence GDGpoYXXXS-PLL, where X represents a mixture of 19 naturally occurring amino acids except for Cys and poY represents phosphotyrosine, was used to identify residues at the Tyr(P) ϩ1 to Tyr(P) ϩ3 positions that were preferred by an SH2 domain (18). A subsequent study by the same group led to the characterization of an additional eight SH2 domains (17). Recently an oriented peptide array library (OPAL) approach, a variation of the original synthetic library method developed by Songyang et al. (18), was shown to be a simple, yet powerful tool for charting the specificity of interaction domains and for identifying kinase substrates (19,20). In the past 2 decades since the discovery of the SH2 domain, numerous experiments have explored the specificity of individual SH2 domains, using both biochemical and biophysical approaches as well as screening peptide libraries. Together these studies have delineated the sequence motifs selected by 38 mammalian SH2 domains (37 of which are human). Because the function of an SH2 domain is dictated by the specific proteins with which it interacts, knowledge on the specificity of an SH2 domain is invaluable to understanding the function of the corresponding SH2 protein in cells.
Here we report an extended analysis of human SH2 domains. We cloned all 120 SH2 domains identified in the human proteome and determined the specificity of 76 SH2 domains using the OPAL approach. Our study led to the initial characterization of specificity for 43 SH2 domains and refinement of motifs selected by 33 others. This expands our knowledge of binding specificity to include approximately twothirds of the human SH2 domains and allows us to gauge the specificity space used by the complement of human SH2 domains. Scoring matrices derived from the OPAL binding data were calculated and used in scoring matrix-assisted ligand identification (SMALI), a Web-based program, to predict interacting proteins for an SH2 domain. When applied to SH2D1A, SMALI identified not only known interactors but also novel binding partners. We provide evidence showing that a number of predicted interactions mediated by the BRDG1 SH2 domain indeed occur at the intact protein level. Besides aiding in the identification of novel binding partners for individual SH2 proteins, the results presented here will facilitate systematic exploration of signaling networks mediated by the human SH2 proteins in normal cells and in disease pathophysiology.

EXPERIMENTAL PROCEDURES
Cloning and Expression of Human SH2 Domains-An initial survey of the various protein databases by Huang et al. (21) led to the identification of 119 non-redundant SH2 domain in humans. This number was revised to 120 in a recent comprehensive search of the GenBank TM , Simple Modular Architecture Research Tool (SMART), and Swiss-Prot databases (7). These 120 SH2 domains were aligned using ClustalX, and the conserved SH2 region was identified. For each SH2 domain, the conserved region plus a 10-residue overhang at either terminus were defined as the SH2 domain boundary. Of the 120 human SH2 domains cloned, 111 were obtained by PCR amplification of the corresponding IMAGEா cDNA clones and nine were synthesized chemically (for lack of cDNA clones). All SH2 domains were expressed as GST fusions in Escherichia coli and purified on glutathione-Sepharose beads. A number of SH2 domains, notably those from the SOCS, STAT, JAK, and RIN families, formed inclusion bodies in bacteria and failed to produce soluble proteins (supplemental Table S1). In summary, 98 GST-SH2 domains were expressed in soluble forms of which 76 produced defined sequence motifs in subsequent OPAL screens. These latter SH2 domains were each purified to homogeneity by fast protein liquid chromatography before the OPAL screens and were found to be properly folded by CD spectroscopy (supplemental Figs. S3-S5).
OPAL Membrane Synthesis-The OPAL peptide array was synthesized on a TFA-soluble cellulose membrane (Intavis AG) by the synthetic peptide arrays on membrane supports (SPOT) technology (22) on an Auto-Spot ASP 222 Robot (Abimed). Each peptide spot was cut from the membrane and incubated in a TFA mixture (88% TFA, 4% trifluoromethanesulfonic acid, 4% triisopropylsilane, and 4% H 2 O) until the cellulose disc was completely dissolved. Dissolved peptidecellulose conjugates were then precipitated and washed three times with tert-butyl methyl ether. The dried peptide-cellulose conjugate corresponding to each OPAL spot was dissolved in 100 l of dimethyl sulfoxide and diluted into 500 l of SSC buffer (15 mM sodium citrate, pH 7.0, and 150 mM sodium chloride) to make the OPAL printing stock. Multiple copies of the OPAL membranes were produced by printing the OPAL stocks on Brady labels (Connex Electronics Inc.).
Determination of SH2 Domain Specificity by Screening OPAL Membranes-The OPAL membrane was blocked with 5% skim milk in TBST (0.1 M Tris-HCl, pH 7.4, 150 mM NaCl, and 0.1% Tween 20) for 1 h. GST-SH2 protein was added directly in the blocking buffer to a final concentration of 1 g/ml and incubated with the OPAL membrane at room temperature for 1 h. The membrane was then washed 3 ϫ 5 min with TBST before a rabbit anti-GST antibody (Santa Cruz Biotechnology) was added. The membrane was allowed to incubate at room temperature for 30 min prior to 3 ϫ 5-min washes with TBST. A goat anti-rabbit GST-horseradish peroxidase conjugate was then added and incubated with the membrane for another 30 min. After final 3 ϫ 5-min washes in TBST, the OPAL membrane was visualized by enhanced chemiluminescence.
Consensus Binding Motif and SMALI Matrix-The OPAL membrane was scanned and quantified on a Bio-Rad FluoroImager. To generate consensus motifs based on the specific binding signals, the background signal for the membrane was subtracted from each data spot. For a number of SH2 domains, a conspicuous column of Asp or Glu was detected in the corresponding OPAL screen. Because these strong Asp or Glu signals are position-independent, they might have originated from the mimetic effect of a Asp or an Glu residue for Tyr(P) rather than being indicative of true selectivity. To minimize this position-independent effect, the value for each Asp or Glu spot was readjusted by subtracting the average of the Asp or Glu in the column.
Next the signal for each residue in a row was normalized so that the sum of all signals equaled 19 (i.e. the number of substituting amino acids in a row). A residue with a normalized signal Ͼ1 was considered a positive selection. To derive motifs listed in Table I, only residues with a normalized value of Ͼ1.8 were considered. To generate a SMALI matrix, we followed an information-entropy algorithm that was used to generate a sequence logo (23,24). In particular, residue conservation at a given position in a peptide, R seq , was defined as the difference between the maximum entropy and the entropy of the observed residue distribution, R seq ϭ log 2 N Ϫ (ϪΑ n i ϭ 1 p i log 2 p i ) where p i is the percentage of the normalized value of residue type i at the position and N is the number of residue types in the OPAL array (n ϭ 19). Therefore, the maximum residue conservation per site is log 2 19 Ϸ 4.25. The entropy of residue type i at the position, given by the product of R seq and p i , was entered as an element of the SMALI matrix. The entropy of Cys, which was not included in the OPAL, was set equal to the mean value for a position. The SMALI score for a given peptide sequence was calculated by the summation of values for residues (P Ϫ 2 through P ϩ 4) in the matrix. Based on this calculation, a peptide with a higher SMALI score has a greater propensity to bind a query SH2 domain. The SMALI site for predicting SH2 domaintarget interaction is currently accessible through the Li lab homepage.
GST Pulldown and Western Blot-Lysate of ϳ20 ϫ 10 6 A20 cells was incubated with 20 g of GST or GST-BRDG1 SH2 domain at 4°C for 30 min. Glutathione-Sepharose beads (20 l) were subsequently added and incubated with the lysate for an additional 30 min. The beads were collected by centrifugation and washed in TBST. Proteins precipitated with the beads were resolved by 10% SDS-PAGE before Western blotting using appropriate antibodies against NTAL, TRAF7, Cbl-b, EPS-15R, and NEMO, respectively (Santa Cruz Biotechnology). Phosphorylation status of a precipitated protein was assessed by an anti-Tyr(P) Western blot.

Identification of Specific Phosphopeptide Motifs Recognized by the Human SH2 Domains by Screening a Tyr(P)
OPAL-Oligonucleotides encoding for the 120 human SH2 domains were obtained either by PCR amplification of available cDNA clones or through chemical synthesis (supplemental Table S1).To eliminate end effects to protein folding, the boundaries of an SH2 domain were extended by 10 amino acids at either terminus beyond the conserved region identified by multiple sequence alignment. Each SH2 domain was expressed as GST fusion in E. coli and purified to homogeneity before its characterization by screening against an OPAL (supplemental Fig. S1). Folding of an SH2 domain was confirmed by CD spectroscopy measured using the corresponding isolated SH2 domain (supplemental Fig. S2). The peptide library contained the degenerated sequence KGXXpoYX-XXXGD with an invariant Tyr(P) residue to orient the peptide and 2 and 4 randomized residues, respectively, at the N and C terminus of the peptide (19). The flanking residues Lys, Asp, and Gly were chosen to make the peptide more soluble in an aqueous solution. To facilitate comparison, multiple copies of the same OPAL were produced by printing a premade array library onto a solid support (see "Experimental Procedures" for details). We obtained specific OPAL binding patterns for 76 SH2 domains of which 33 had been analyzed previously and 43 were novel (supplemental Fig. S3). The former group of SH2 domains with described specificity was included in this study for confirmation and to produce comparative data. Moreover the current OPAL examined a longer region (from P Ϫ 2 to P ϩ 4) of the peptide ligand, which yielded additional information on SH2 specificity. Table I lists phosphopeptide motifs obtained in the current OPAL screen as well as those garnered from the literature. For the SH2 domains that were re-examined here, the OPAL binding motifs agreed, in most cases, with the literature for the equivalent positions (i.e. P ϩ 1 to P ϩ 3) of the same domain. For example, the CRK SH2 domain was shown previously to recognize a poY(D/K/N)(H/F/R)(P/V/L) motif where po indicates phospho (18); the current OPAL screen produced a similar motif, poY(M/D/K/N/S)(T/M/S)(P/L/V)(R/M/A/S), with almost identical selections for positions P ϩ 1 and P ϩ 3, respectively. The SH2D1A SH2 domain was analyzed using a peptide derived from its physiological ligand SLAM, which led to the identification of a consensus motif, (S/T)XpoYXX(V/I) (25). An unbiased OPAL screen produced a related, yet more comprehensive motif that bore distinct selectivity for positions P Ϫ 1, P ϩ 2, and P ϩ 4 ( Fig. 1 (17). The OPAL screen identified a highly specific motif, poY(A/S/T)N(V/P), that matches perfectly with the literature at P ϩ 1 and overlaps at P ϩ 2 and P ϩ 3 ( Fig. 1).

). The CSK SH2 domain was shown to recognize a poY(T/A/S)(K/R/Q/N)(M/I/V/R) consensus
In certain cases, the OPAL-derived motifs differ notably from the literature. For example, the FES SH2 domain was shown previously to recognize a poYEX(V/I) motif with no apparent selectivity observed for P ϩ 2 (17). In contrast, the OPAL screen identified Asn as a strongly favored amino acid for P ϩ 2 ( Fig. 1). In agreement with this observation, FER, a kinase belonging to the same family as FES, was found to also be biased for an Asn at P ϩ 2 (supplemental Fig. S3). Although it is likely that differences in the libraries used in the two studies might have contributed to this discrepancy, the identification of additional selectivity at P ϩ 2 for the FES and FER SH2 domains suggests that the OPAL screen is highly sensitive in defining SH2 domain specificity.
We mapped the specificity of 43 SH2 domains whose binding motifs were not known prior to this study. Whereas certain motifs uncovered for this group of SH2 domains conform to known sequences recognized by related SH2 domains, others are entirely novel. Generally SH2 domains belonging to the same protein family, such as the Shc family members SHC1, SHC2, SHC3, and SHC4, exhibited a preference for similar motifs. However, subtle differences were detected. For instance, the SHC3 SH2 domain displayed narrower selectivity for positions P ϩ 1 through P ϩ 3 than did the other members of the SHC family (Table I and supplemental Fig. S3), suggesting that the former may target a relatively small pool of proteins in the cell. Slight differences in specificity were also observed for the SH2 domains from the SRC kinase family and the VAV family of guanine nucleotide exchange factors that play important roles in cytoskeleton organization and immune functions (26,27).
Novel motifs were identified for a number of SH2 domains (Table I). In particular the BRDG1 SH2 domain selected for a minimal motif poY(S/E/D)XX(I/L/F) (Fig. 1). Whereas numerous SH2 domains prefer a hydrophobic residue at P ϩ 3 (Table I), the BRDG1 SH2 is unusual in showing an exclusive selectivity for a hydrophobic residue (in particular Ile or Leu) at P ϩ 4. The SH2 domains of the SHB/SHD/SHE/SHF family adaptors with unknown function were found to select for a poY(E/D)(N/ E)L motif that overlaps with the specificity of the Shc1 SH2 domains, suggesting that the functions of these two distinct families of adaptors may be related. Of the top 25 proteins predicted (by SMALI, see later sections) to bind to the Shc1 SH2 domain, seven are also favored by the Shd SH2 domain (supplemental Table S2). The SH2 domain of BMX/ETK, a tyrosine kinase up-regulated in human prostate cancers (28), selected for a unique sequence, (P/E)poY(E/D)N(E/D) (Table I).
Of note, the Tyr 194 site in BMX is predicted to bind to its own SH2 domain, suggesting that the kinase activity of BMX may be regulated by an intramolecular interaction (supplemental Table S2). Besides a strong inclination for acidic residues at the C terminus, the BMX SH2 domain also exhibited N-terminal selectivity for a Pro or a Glu at P Ϫ 1. Similarly SH2 domains from BTK, HSH2D, GRB10, TENC1, TNS4, PTPN11/ SHP2, and SH2D1A each displayed a strong N-terminal selectivity (Table I and supplemental Fig. S2). N-terminal residues can increase the contact area of an SH2 domain with its ligand as is the case for the PTPN11/SHP2 and SH2D1A SH2 domains (25,29).
Functional Classification of the Human SH2 Domains-The binding motifs for 85 mammalian (83 human) SH2 domains are now known when results from the current study are combined with those from the literature. The availability of these motifs enables the classification of human SH2 domains according to specificity and, therefore, function. Songyang et al. The amino acids preferentially selected by the indicated SH2 domains at each position N-and C-terminal to the fixed poY residue. The values in parentheses indicate relative importance of a residue. THe total value for each position of P Ϫ 2 to P ϩ 4 is normalized to 19 (the number of natural amino acids ussed in the mixture to generate the OPAL). A value greater than 1.0 denotes a positive selection while residues with values greater than 1.8 are listed (details of the calculation are described under "Experimental Procedures"). (17) categorized SH2 domains into four groups on the basis of the amino acid at position ␤D5 (nomenclature according to the structure of Src SH2 domain), a residue that shows contacts with both the P ϩ 1 and P ϩ 3 residues of the phosphopeptide in many SH2-ligand complexes and therefore plays an important role in determining SH2 domain specificity. Taking advantage of our increased knowledge of binding motifs while keeping with tradition, we propose to reclassify the human SH2 domains into three major groups according to the ␤D5 identity. Group I SH2 domains contain an aromatic residue such as Tyr or Phe at this position; group II SH2 domains each harbor a hydrophobic residue such as Ile, Leu, Val, Cys, or Met at the same position; whereas group III, which is composed solely of the STAT family of transcription regulators, has a hydrophilic ␤D5 such as Glu, Gln, or Lys (Table I). The first two groups correspond to the groups I and III SH2 domains, respectively, in the earlier classification by Songyang et al. (17,18). The lone member, VAV1 of the group II SH2 domain of Songyang et al. (17,18), is now reclassified into the new group II according to its binding site characteristics (␤D5 ϭ Ile) and specificity. For the same reason, SH2 domains belonging to the old group IV, such as those from SHB and PTPN11C/ SHP2C, are now placed in the new group II ( Fig. 2 and Table I).
Group I SH2 domains, which prefer a general consensus poY where and denote a hydrophilic and a hydrophobic residue, respectively (30), can be further divided into four subgroups according to the fine selectivity displayed by each subgroup and the identities of the corresponding interface residues on the SH2 domain. Subgroup IA, which consists of members of the SRC, SYK/ZAP-70, and TEC kinase families as well as the adaptor proteins NCK1 and NCK2, selects for the common motif poY-where "-" denotes a negatively charged residue. Subgroup IB, including SH2 domains from SH2D1A, SHIP1/2, and CRK/CRKL ( Fig. 2 and Table I), are related to one another by a shared propensity for a hydrophobic residue at P ϩ 3. Selectivity at P ϩ 1 or P ϩ 2 for this group of SH2 domains is wider than for subgroup IA. Subgroup IC, identifiable by a strong proclivity for an Asn residue at P ϩ 2, forms the second largest subgroup within group I with 18 members. It includes not only the GRB2/GRAP/GADS family but also the GRB7/10/14 family, the tensin family, and the Fes/Fer family. That such a diverse array of SH2 domains select for an Asn at P ϩ 2 is unexpected because only the GRB2 family members contain a Trp at the EF1 position. This bulky Trp occupies part of the conventional P ϩ 3-binding pocket and prevents the SH2 domain from binding to an extended peptide and selects exclusively for a ␤-turn structure mediated by an Asn(P ϩ 2). Indeed a single mutation of a Thr(EF1) residue to a Trp switches the specificity of the Src SH2 domain to one that is akin to the GRB2 SH2 domain (31). The lack of a Trp(EF1) residue in the majority of SH2 domains in subgroup IC suggests that alternative mechanisms exist to facilitate the selection of an Asn residue at P ϩ 2. To explore this possibility, we examined the structures of three representative SH2 domains in this subgroup and compared them with that of a GRB2-peptide complex. As shown in Fig.  3A, the GRB2 SH2 domain contains a bulky Trp residue at the EF1 position that sits in the P ϩ 3-binding pocket and forces the peptide backbone to reverse and adopt a type I ␤-turn conformation. For the GRB7 SH2 domain, the equivalent position is occupied by a smaller Gln residue. However, the EF loop in GRB7 SH2 is longer than that in GRB2 SH2, and structural superposition suggests that this EF loop occupies the P ϩ 3 pocket in GRB7 SH2 and therefore plays a role similar to that of the Trp(EF1) in GRB2 SH2 (Fig. 3B). The Fes and HSH2D SH2 domains, on the other hand, may exploit yet another unique mechanism to force the peptide ligand to adopt a ␤-turn conformation. When the FES SH2 domain structure was superimposed on the GRB2 SH2 domain, it became evident that a Thr residue in the BG loop (BG5) of the FES SH2 domain filled in the P ϩ 3 pocket in effect blocking the extension of the peptide chain beyond P ϩ 2 (Fig. 3C). Although the structure of an FES SH2-peptide complex is not yet available, it is likely that this BG loop swing accounts for the selection of an Asn residue at P ϩ 2. Similarly the HSH2D SH2 domain, which also selects exclusively for an Asn at P ϩ 2, may use the Glu (BG5) to occupy the P ϩ 3-binding pocket (Fig. 3D). Nevertheless it should be pointed out that all peptides containing an Asn(P ϩ 2) do not necessarily adopt a ␤-turn conformation. Instead a peptide ligand may assume an extended structure as long as specific interactions mediated by Asn(P ϩ 2) are fulfilled.
Group II SH2 domains contain a hydrophobic but nonaromatic residue at ␤D5 and are further divided into four subgroups according to specificity (Table I, Fig. 2). Subgroup IIA loosely selects for the degenerated motif poYX. This subgroup is represented by the SH2 domains from several protein families that include VAV, phosphatidylinositol 3-kinase, PLC␥, PTPN, and SOCS. The PLC␥1-C SH2 domain was shown to engage residues from P ϩ 1 to P ϩ 6 of the Specificity Space of the SH2 Domain peptide using an extended groove (32). Nevertheless no defined binding pocket was identified for residues beyond P ϩ 3. Screening of the domain against a longer OPAL (i.e. X 3 poYX 6 ) did not reveal defined selectivity for the P ϩ 4 to P ϩ 6 positions (data not shown). Similarly the SOCS3 SH2 domain, which requires the presence of extra residues beyond the N-terminal boundary of the SH2 domain for stability (33), bound to a peptide from the cytokine receptor gp130 using an extended hydrophobic groove. However, the presence of a Val residue at P ϩ 4 of the gp130 peptide was believed to facilitate the extended backbone conformation of the peptide rather than to enlarge van der Waals contact with the SOCS3 SH2 domain (33).
Similar to subgroup IIA, subgroup IIB also selects for a hydrophobic residue at P ϩ 3 within the general consensus poY(E/D/X)X. The SHC and SHB families of adaptor proteins, BLNK, and SLNK all belong to this subgroup. We assigned the BRDG1 SH2 domain to a separate subgroup within group II because it specifically selected for a hydrophobic residue at P ϩ 4 (Table I). It is likely that this SH2 domain contains a defined binding pocket for P ϩ 4. The last major group, group III, comprises the STAT family of SH2 domains. These SH2 domains are unique not only in the ␤D5 residue but also in specificity. Structural analysis demonstrated that the STAT1 SH2 domain contained a unique binding site for P ϩ 4 (34). Nevertheless in contrast to BRDG1, binding sites identified in physiological ligands of STAT1 and STAT3, such as IL-9R, IL-10R, and gp130, are characterized by a hydrophilic residue such as Thr or Asp at P ϩ 4 (35). Moreover because a STAT protein functions as a dimer, it is unclear whether an isolated STAT SH2 domain selects for the same type of ligands as its dimeric form (36). Thus, it seemed appropriate to assign the STAT SH2 domains to a separate group.
SMALI, a Facile Method for the Prediction of Binding Partners for Human SH2 Proteins-Having now defined phosphopeptide motifs for the majority of SH2 domains, we sought to use this information to predict potential physiological binding partners for the corresponding SH2 proteins. To this end, we quantified the OPAL binding data for the 76 SH2 domains screened in this study and constructed corresponding weighted data matrices. We then developed a Web-based program called SMALI for the prediction of SH2 domainbinding proteins. The first version of the SMALI program is accessible via the World Wide Web. Fig. 4 shows the SMALI interface for SH2 domain-binding ligand prediction and the top 10 peptides predicted to bind to the GRB2 SH2 domain. Details of the SMALI program and its usage are described elsewhere. 2 Compared with Scansite (37), a widely used Web tool for predicting protein-protein interactions based on short peptide motifs or the corresponding scoring matrices, the current version of SMALI is dedicated to the prediction of SH2 domain-mediated interactions and contains matrices for 76 SH2 domains (versus 14 in Scansite). In addition, SMALI contains built-in filters to narrow the prediction against phosphorylated Tyr sites identified in the literature and/or to match the query and subject by function and/or cellular localization (Fig. 4). To demonstrate the utility of SMALI in recapitulating known interactions and predicting novel ones, we applied it to the prediction of binding partners for a representative member of each class of human SH2 domain except for classes IE and III of which no OPAL matrices are currently available (supplemental Table S2). For this small group of representative SH2 domains, especially for GRB2 and VAV1, SMALI recapitulated many interactions listed in the I2D and IntAct protein-protein interaction database (38,53). To establish a correlation between a SMALI score and the probability of occurrence for the corresponding SH2-ligand interaction, we performed systematic binding studies on SH2D1A and BRDG1 SH2 domains with their respective peptide ligands (Table II). SH2D1A is a small protein containing an SH2 domain and a short C-terminal tail. Since the discovery of SH2D1A in the pathogenesis of the XLP syndrome, a number of SH2D1A SH2-interacting proteins, most of which belong to the SLAM family of immune receptors (39), have been identified. Many of these receptors, including CD84, SLAMf6, SLAM, 2B4, and Ly9, are ranked at the top by SMALI as predicted binding partners for SH2D1A (Table II). To test whether the SMALI score correlated with the binding energy of the corresponding peptide, we synthesized 22 peptides with SMALI scores ranging from 1.27 to 2.93 and measured their respective dissociation constants (K d ) in solution. The majority of proteins represented by this assay were immune receptors because SH2D1A functions as a modulator of immune receptor signaling (40). The K d value was then used to calculate the dissociation free energy (⌬G) of the Specificity Space of the SH2 Domain corresponding peptide-SH2 complex. Peptides with greater SMALI scores generally bound more tightly (i.e. smaller K d and larger ⌬G) to the SH2D1A SH2 domain. Plotting the ⌬G values against the corresponding SMALI scores produced a linear correlation (R 2 ) of 0.74, suggesting that a SMALI score correlates positively with the affinity of the peptide (Fig. 5A). Besides recapitulating known targets, SMALI prediction and the peptide binding assay identified a number of proteins that likely interact with SH2D1A. These include LILRB1/2, KIR2DL, and Fc␥RIIb that act as inhibitory receptors in leukocytes, natural killer cells, and B cells, respectively (Table II), suggesting that SH2D1A may play a much wider role in regulating immune function than currently known. Indeed SH2D1A was shown to bind physically to Fc␥RIIb and promote negative signaling by the latter in B cells. 2 For BRDG1, a lymphocyte-specific protein (41), very little is known of its physiological binding partners. To test whether SMALI could identify proteins that interact with BRDG1, we performed a set of experiments similar to those done for SH2D1A. A total of 22 peptides that represented proteins ranked by SMALI at the top (e.g. NEMO, STS1, TRAF7/3, and NTAL), in the middle (e.g. ITK and linker for activation of T-cells (LAT)), or near the bottom (e.g. CD28 and plateletderived growth factor receptor) were synthesized and assayed for binding to purified BRDG1 SH2. As shown in Table  II, peptides ranked at the top generally displayed greater affinities for BRDG1 SH2 with K d values in the submicromolar to low micromolar range. In contrast, peptides with SMALI scores lower than 1.0 exhibited weak or no binding. Plotting the corresponding dissociation free energy of the peptide-SH2 complex against the SMALI score produced a linear correlation with R 2 ϭ 0.73 (Fig. 5B).
To investigate whether proteins ranked at the top by SMALI represented potential physiological targets, we performed pulldown assays using GST-BRDG1 SH2. Because BRDG1 is expressed primarily in B cells (41), we focused on proteins that have high endogenous expression levels in A20 murine B cells such as NTAL, Cbl-b, NEMO, EPS15R, and TRAF7. As shown in Fig. 5C, of the five proteins examined, four showed binding to the BRDG1 SH2 domain in a phosphorylation-dependent manner. NEMO, the only protein that failed to bind, did not appear to be tyrosine-phosphorylated as verified by an anti-phosphotyrosine blot. This result suggests that BRDG1 may regulate B cell receptor signaling through NTAL, a non-T cell activation linker (42), and that the level of BRDG1 or an associated protein could be regulated by an E3 ligase such as TRAF7 and Cbl-b through the ubiquitin-interacting protein EPS15R. These possibilities are currently under investigation.
Inspection of the peptides listed in Table II revealed that those displaying strong binding to the BRDG1 SH2 domain invariably contained a Leu or an Ile residue at P ϩ 4. To further define the role of a Leu(P ϩ 4) in the NTAL peptide binding to the BRDG1 SH2 domain, we synthesized an analog of the NTAL peptide by replacing Leu(P ϩ 4) with an Ala. The result-ing peptide, Lϩ4A, bound to the BRDG1 SH2 domain with a markedly reduced affinity (K d ϭ 8.61 M) compared with the parent peptide (K d ϭ 0.67 M) (Fig. 5D). In contrast, substitution of any other residue in the NTAL peptide (except for the Tyr(P)) with an Ala had only a minor effect on peptide binding (data not shown). These results are in excellent agreement with the OPAL-derived motif for the BRDG1 SH2 domain that strongly favors a bulky, aliphatic residue at P ϩ 4. DISCUSSION We report herein the cloning of 120 human SH2 domains and the characterization of 76 domains by OPAL screening. To the best of our knowledge, this work represents the first effort in cloning the complete set of SH2 domains encoded by the human genome and the most comprehensive survey of their binding specificity to date. The OPAL screen results reported herein were highly reproducible due to optimization of the peptide library and the ability to print identical copies of a premade OPAL. The uniformity of the OPAL used in the screens also ensured that data generated for different SH2 domains were comparable. The validity of the OPAL approach was established by re-examining a group of 33 SH2 domains with known specificity. For the majority of them, the OPALderived motifs agreed predominantly with the literature. Importantly we defined the specificity for a group of 43 uncharacterized human SH2 domains. A phosphopeptide derived from NTAL containing the defined motif for the BRDG1 SH2 domain binds strongly to the latter. Ala-scanning studies further suggest that the novel specificity determined for this SH2 domain is authentic. Combining the OPAL data with the literature results in the determination of 85 mammalian SH2 domain-binding motifs, including 83 from human (Table I). Using this broad information, we reclassified the human SH2 domains into functional groups that are related by not only binding site characteristics (i.e. ␤D5 identity) but also specificity. Exploiting the rich information contained in the OPAL binding profiles, we developed SMALI, an easy-to-use Web tool for the prediction of SH2 domain-mediated interactions in silico. Combining SMALI prediction with peptide binding assays carried out in solution, we identified a number of potential binding partners for BRDG1, a protein whose function is largely unknown. Moreover we identified a group of potential interacting proteins, many of which are receptors, for  Table I for a complete list of OPAL-derived SH2-binding motifs.
FIG. 2. Classification of the complement of human SH2 domains. The 120 human SH2 domains are classified into three major groups based on identity of residue ␤D5 (group I: Y/F; group II: C/I/L/V/M; group III: others). Within each major class, the SH2 domains are further divided into subgroups based on binding specificity from the OPAL screens and the literature. A circle denotes that an OPAL-derived motif is available for that SH2 domain. A square indicates that the specificity is based on literature. No binding motif has been described for subgroup IE SH2 domains. The JAK family SH2 domains may not be a bona fide Tyr(P)-binding module; they may instead play a structural role in interacting with receptors (51). *, JAK3 and TYK2 have a Cys residue at the ␤D5. This figure was generated using Molecular Evolutionary Genetics Analysis (MEGA) (52). SH2D1A, an SH2-only adaptor protein whose mutations directly cause the XLP syndrome. Studies on other diseaseassociated SH2 proteins using the same approach will likely yield valuable insights into the mechanisms underlying a number of human conditions.
Comparison of OPAL-derived Motifs with the Literature-Although the specificity of 37 human SH2 domains had been characterized prior to the current study, the identified motifs generally covered only a short region, typically from P ϩ 1 to P ϩ 3 relative to the Tyr(P). The current OPAL screen used a longer peptide library, covering residues from P Ϫ 2 to P ϩ 4. Although most SH2 domains primarily recognize residues from P ϩ 1 to P ϩ 3, residues beyond this region can play a role in fine tuning the specificity and modifying the affinity of an SH2 domain. This has been demonstrated for SH2 domains from proteins such as SH2D1A, PLC␥1, PTPN11 (SHP2), and STAT1. The identification of an Ile or a Leu favored by the BRDG1 SH2 domain at P ϩ 4 lent further support to this notion. In addition, the N-terminal region, which had generally been overlooked in previous studies, displayed distinct selectivity for a number of SH2 domains. For instance, the BTK SH2 domain preferred a Gln or Val at P Ϫ 2, whereas the TNS4 SH2 domain selected for a Gly at P Ϫ 1. To confirm the observed N-terminal selectivity, we screened the SH2D1A, BTK, and TNS4 SH2 domains, respectively, against an N-terminal OPAL bearing the degenerated sequence GXXXpoYAG. However, except for SH2D1A that displayed strong selectivity for Thr/Ser at P Ϫ 2, mirroring its KD-X6 OPAL result, the other SH2 domains failed to produce clear binding patterns. These results suggest that, in most cases, the N-terminal residues alone may not be sufficient to confer ligand binding for an SH2 domain and that their role in an SH2-ligand interaction may be more ancillary than primary. Nevertheless the precise role of the N-terminal residues for the interaction between individual SH2 domains and their cognate ligands remains an interesting topic for further investigation.
Molecular Basis of SH2-ligand Interactions-The discovery of distinct peptide motifs offers an unprecedented opportunity to gauge the specificity space of human SH2 domains. Because SH2-containing proteins play diverse roles in the Interactions listed in I2D intact (38,53) and/or in the Phospho.ELM (54) database are identified. See supplemental Table S2 for application of SMALI to a representative set of SH2 domains. cell, one would assume that the specificity of different SH2 domains differs significantly. However, our work indicates that the specificity space of the human SH2 domain is limited. That is, despite their large number, SH2 domains appear to select for only a few distinct consensus motifs. This is not only reflected in the similar sequence motifs favored by SH2 domains from the same protein family but also seen when apparently unrelated SH2 domains can recognize a common consensus sequence. This is exemplified by the subgroup IC SH2 domain in which 18 SH2 domains are related to one another by the common recognition of an Asn at P ϩ 2. The relatively narrow specificity space of the SH2 domain may be a consequence of the few conserved modes of SH2-ligand binding (43,44).
Variations to the canonical schemes of SH2-ligand interaction do, however, exist. Whereas most SH2 domains bind to their cognate ligands in a phosphorylation-dependent manner, some SH2 domains recognize unphosphorylated sequences or sequences that are phosphorylated on Ser or Thr instead of Tyr. The SH2D1A SH2 domain was shown to bind to an unphosphorylated peptide with low micromolar affinity and to the unphosphorylated SLAM receptor in cells (12,25). Recently the CTEN or Tensin 4 SH2 was found to interact with deleted in liver cancer 1 (DLC1) protein in a phosphorylationindependent manner (45). Although this interaction needs to be investigated more quantitatively by measuring the corresponding affinity using synthetic peptides and purified Tensin 4 SH2 domain, it reinforces the notion that certain SH2 domains possess the ability to bind Tyr-containing peptides or the corresponding unphosphorylated proteins. Binding of an SH2 domain to a target that is phosphorylated on Ser instead of Tyr was recently reported for the Spt6 (Supt6H in human) SH2 domain (46). Interestingly the same SH2 domain also demonstrated an ability to bind to the Tyr(P) OPAL (supplemental Fig. S3). Although it remains to be seen whether the Spt6 SH2 domain selects for a Ser(P)-containing motif similar to the Tyr(P)-based motif, it is likely that the ability to bind specifically to a Ser(P)-or Thr(P)-containing ligand is an idiosyncratic rather than a general phenomenon for human SH2 domains. Specificity Control in SH2 Domain-mediated Interactions-The few common consensus motifs selected by the human SH2 domains suggest that proteins targeted by different SH2 domains may overlap. This raises the question of how the 110 human SH2 domain-containing proteins exert their specific functions in a cell? Although the relative affinity of a particular SH2-ligand pair may play an important part in dictating the physiological relevance of that interaction, it is likely that the cellular as well as the protein context also plays a role. In general, a protein must be phosphorylated to engage a specific SH2 domain, and the state of the cell or the identity of a stimulus often dictates which kinases are activated to phosphorylate a distinct set of substrate proteins. This, in turn, will govern which particular SH2-target interactions will occur in the cell. Protein context can also play a critical role in regulating SH2-ligand interactions. In the case of SRC and many SH2-containing kinases and/or phosphatases, the presence of an SH2 domain and a Tyr(P) within the same polypeptide chain ensures that an intramolecular interaction will take place in preference to intermolecular interactions (47). The presence of multiple modular interaction domains in a single polypeptide chain could also contribute to specificity in SH2 domain signaling. In this regard, it is notable that receptor tyrosine kinase-initiated interactions are often aided by lipid interaction modules such as the pleckstrin homology and PTB domains (48).
Another mechanism by which the signaling specificity of an SH2 domain is controlled is by coupling the interaction to a tyrosine kinase. These two important classes of modular domains apparently co-evolved (9), and their specificity is closely related. As observed by Songyang and Cantley (18), the subgroup IA SH2 domains, typified by those found in the SRC family of cytosolic kinases, recognize motifs that are also preferred by the kinase domains from the same class of proteins, suggesting that the activity and interactions of the Src kinase family are governed by their own SH2 domains.

FIG. 5. SMALI prediction and validation of SH2 domain-ligand interactions.
A, correlation of SMALI score with dissociation free energy for peptides predicted to bind the SH2D1A SH2 domain. See also Table II for prediction and binding data. B, correlation of free energy with SMALI scores for peptides predicted to bind the BRDG1 SH2 domain based on data shown in Table II. C, GST or GST-BRDG1 SH2 domain pulldown of endogenous proteins from A20 B cells. Cells were treated with either PBS or sodium pervanadate to preserve phosphorylation. Phosphorylated bands were detected using the anti-Tyr(P) antibody 4G10. WCL, whole cell lysate. D, isothermal binding curves to the BRDG1 SH2 domain for an NTAL-derived phosphopeptide and a Leu(P ϩ 4) to Ala analog. Peptide sequences and K d values are as indicated in the diagram. p, phospho.
The tyrosine kinase family comprises ϳ90 proteins that have overlapping substrate specificities (49). The highly conserved structure of the kinase domain suggests that an even smaller number of conserved substrate motif classes may exist for protein-tyrosine kinases than for the SH2 domains. Because protein-tyrosine kinases generate sites for SH2 or PTB to bind and because Tyr(P)-binding PTB domains usually recognize an NXXpoY motif, it is likely that only a small number of distinct codes exist that involve phosphotyrosine signaling.
Systematic identification of SH2 domain-binding motifs is a necessary step toward mapping the phosphotyrosine signaling space and provides a basis for analysis of Tyr(P)-dependent signaling events under normal and perturbed conditions. It is conceivable that by mapping the substrate specificity of all human protein-tyrosine kinases using the OPAL screen and/or complementary approaches and by combining this knowledge with the specificity of SH2 domains one would be able to more reliably predict SH2 domain-ligand interactions.
Recently it was shown that network context is essential for accurate systems-wide modeling of phosphorylation-mediated signaling (50). Therefore, by making our data available through the Web-based prediction program SMALI and by entering our data into the motif repository powering Networ-KIN (50), we anticipate the research community will have powerful tools for future studies of cellular signaling processes. This systems-wide modeling of cellular signaling, which is based, in part, on the specific linear motifs determined in this study, will provide the foundation for accurate disease modeling and for unraveling the molecular networks underlying many important biological processes.