|
Advertisement | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 7:181-192, 2008.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
Recent methodological developments in MS-based proteomics have made it possible to identify hundreds to thousands of protein phosphorylation sites in a single project (6–14). Extensive mapping of the phosphoproteome is an important step toward analyzing the regulatory components of the cell. Because the majority of newly identified phosphopeptides are uncharacterized with respect to signaling context, there is now a unique opportunity to mine the phosphoproteome for novel phosphorylation motifs. Methods have been developed that successfully mine for overrepresented motifs from large protein data sets in general (15–17) and more recently also from phosphoproteomics data sets (18). However, these methods do not partition the data set into smaller subsets with high sequence similarity prior to motif extraction. Sequence patches flanking the motif also govern phosphorylation-dependent recognition (19); consequently there is a risk of extracting false positive motifs from functionally unrelated peptides. Furthermore the above mentioned methods are in silico approaches and do not combine prediction with experimental validation.
To overcome such limitations in the area of Tyr(P) motif discovery and classification, one may partition the data set into smaller subsets e.g. by sequence similarity with known kinase or binding substrates prior to motif extraction. Thus, the risk of retrieving false positive motifs is minimized because overrepresented motifs are extracted from peptides closely related in sequence and function. Besides Tyr(P) recognition motifs for kinases and interaction domains, there may also potentially exist Tyr(P) motifs that mediate other processes than binding such as e.g. enzyme activation and nucleic acid binding. Thus, it is essential to validate the extracted motifs both by experimental and bioinformatical means to obtain a functional classification.
With this in mind we developed a motif extraction and validation methodology and classified Tyr(P) motifs on a proteome level. Operating in sequence space, we stretched the MS-mapped Tyr(P) peptides over a backbone of ligands already known to be involved in Tyr(P)-dependent interaction. Using experimentally verified Tyr(P) ligands of the PTB and SH2 domains as both a clustering backbone and as a control for the partitioning, we split a literature-extracted data set of mammalian Tyr(P) peptides into 20 different clusters. We obtained a meaningful clustering because the controls partitioned correctly into separate clusters.
From the 20 clusters we extracted both known and unknown phosphorylation motifs, and peptides matching these motifs were synthesized and assayed for phosphorylation-specific interaction partners using a peptide pulldown assay based on quantitative proteomics (20). In contrast to the oriented peptide library approach that uses artificial degenerate peptides, we used naturally occurring peptides as baits to pull down binding partners from the cell lysate. Moreover because the interaction partners are in competition for binding, mimicking the in vivo binding situation, the risk of finding kinetically unfavorable interaction motifs is minimized. Finally this technique can potentially identify new types of domains with modification-specific binding capability.
Using the pulldown assay we identified the expected binding partners for numerous known C-terminally directed SH2 domain motifs. We also found 15 new phosphorylation-dependent interactions mediated by phosphosites not previously shown to direct interaction. Surprisingly we identified a new N-terminal hydrophobic motif ((L/V/I)(L/V/I)pY where pY is phosphotyrosine) for the SH2 domain-containing inositol phosphatase SHIP2. The specificity of the motif was confirmed by mutational analysis. Surprisingly this motif is N-terminally directed, which is in contrast to previous observations showing that binding of prototypical SH2 domains are directed by C-terminal recognition (21). Until now the only other known SH2 domain binding motif that is partly directed by N-terminal recognition is the immunoreceptor tyrosine-based inhibition motif (ITIM) (I/L/V)XpYXX(I/L/V) (22, 23).
On a proteome level we analyzed which Gene Ontology (GO) categories were overrepresented in proteins matching the extracted motifs. We found that motifs that mediate interaction in the pulldown assay are typically found in proteins involved in signal transduction, whereas non-binding motifs are found in enzymes and ion- and nucleic acid-binding proteins. Thus, we estimate that one-third of the in vivo Tyr(P) sites are not directly involved in interaction via domains such as SH2 and PTB but rather are sites that could alter the catalytic activity of enzymes or modulate the DNA binding affinity of e.g. transcription factors.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Generation of Weight Matrices—
13-mers of the 162 phosphopeptide ligands of the 11 respective PTB and SH2 domains (see Table I) were used to create 11 weight matrices using the weight matrix mode of EasyGibbs 1.0 (27). Default settings were used except motif length was set to 13 fixed around the central Tyr(P) residue. Subsequently all phosphopeptides in the MS-based data set (481) and the positive control data set (162) were scored by each of the 11 weight matrices, and thus each phosphopeptide could be represented as a vector of the 11 weight matrix scores.
|
The choice of an appropriate clustering algorithm is a complex one because no given algorithm is universally superior (29, 30). Rather the best choice will depend on the data set and in particular on what constitutes a good distance measure for it. Another relevant concern is the desired outcome and whether a hierarchical or partitional result is preferable. Many sophisticated methods exist that are capable of automatically determining the number of "natural clusters" in the data like the popular density-based clustering algorithms that can describe very complex non-circular relations in the data (31). It is, however, not clear whether the ability to recognize non-circular structures in the data is beneficial in this case. Proteins that share the same features are likely to be related and will form a circular relation in feature space. On the other hand, an elongated cluster in feature space will contain proteins that share only some features but not others, and the biological implications thereof can be quite diverse. Other than being computationally effective and easy to implement, the PAM algorithm was selected because it satisfies the need for a robust clustering algorithm and because its reliance on an Euclidean distance measure ensures that the result can be easily interpreted. The primary weakness of PAM is the need to arbitrarily select a number of clusters for the data, which in this case is overcome by the mentioned application of the hypergeometric test.
Dendrogram and Sequence Logo Plots—
Weight matrices of the peptides in the 20 clusters were made using positional weighting of the three residues flanking the central Tyr(P) residue (27) and used to calculate distance matrices as described previously (32). The distance matrices were used as input to the program neighbor from version 3.5 of PHYLIP (Phylogeny Inference Package). To estimate the significance of the neighbor-joining clustering we used the bootstrap method and estimated the consensus tree by bootstrapping for 1000 repetitions as described earlier (32).
The frequencies of amino acids at particular positions in each cluster were calculated, and subsequently sequence logo plots were used for graphic visualization (33). Each position in the aligned sequences corresponds to a column in the logo plot. The height of the column represents the degree of conservation at that position, whereas the height of the individual letters is proportional to the relative frequency of this amino acid residue. The maximal height of the column for the 20-amino acid alphabet is log220 = 4.32 bits.
Extraction of Motifs and Selection of Peptides to Synthesize—
The identified phosphomotifs in each of the 20 clusters were found using the publicly available TEIRESIAS pattern discovery tool from IBM Bioinformatics (17). Parameters were set so the extracted motifs were within a window of 13 residues centered on the phosphoresidue. The minimal number of literals in the motif was set to 4, and the amino acids were grouped according to their chemical nature (Ala/Gly, Asp/Glu, Phe/Tyr, Lys/Arg, Ile/Leu/Met/Val, Gln/Asn, Ser/Thr, Pro, Trp, His, and Cys (17)). For each of the 20 clusters the most abundant motif was selected, and subsequently one peptide matching the motif was chosen from the respective cluster. Because multiple peptides in each cluster matched the extracted motif, peptides from mouse and peptides not previously known to be involved in phosphorylation-dependent interaction were preferred. In the few cases (three) where mouse sequences could not be obtained, peptides from humans with high homology in mouse were chosen.
Gene Ontology Analysis—
Gene Ontology categories were obtained from Gene Ontology Annotation mouse database version 29.0. The extracted motifs were matched to proteins in the International Protein Index (IPI) mouse proteome version 3.20. Using a hypergeometrical test (see "Statistics") with the total proteome as background we found the 10 most overrepresented GO terms in retrieved proteins. The hits were inspected manually, and the consensus GO term was assessed for each motif. For the purpose of the hypergeometrical test, each annotated GO category was taken to include all of its ancestral terms to avoid problems with diverging levels of annotation.
Statistics—
To determine whether the positive controls were significantly overrepresented in specific clusters compared with the whole data set, hypergeometric sampling without replacement (34) was performed. The hypergeometric test is a statistical test used to describe the arbitrariness of a sampling without replacement from a background of true or false examples. The probability (p) to observe a given or more extreme situation by a pure coincidence is given by the hypergeometric distribution,
![]() |
where N is the total number of peptides, M is the number of peptides in the given set, K is the number of peptides in a particular cluster, and x is the number of K that belongs to M. A Bonferroni correction was performed to correct for multiple comparisons. In the case of GO analysis, we performed the test once for each GO category present in the data and evaluated the probability of sampling the set of retrieved proteins from the background of the total proteome by mere chance, considering a protein true or false depending on whether it had been assigned the category in question. The end result of this test was one p value for each GO category, describing the degree of overrepresentation of that particular assignment in the retrieved set against the background of the entire proteome.
Cell Culture—
Mouse C2C12 muscle cells were grown in arginine- and lysine-deficient Dulbecco's modified Eagle's medium with 10% dialyzed fetal bovine serum for at least five passages and then switched to 2% dialyzed fetal bovine serum to differentiate the cells for 8 days. In accordance with the stable isotope labeling by amino acids in cell culture (SILAC) procedure, one cell population was supplemented with normal isotopic abundance L-arginine (Sigma) and L-lysine, and the other was supplemented with >99% isotopic abundance [13C6,15N4]arginine and [13C8]lysine (Aldrich) as described previously (35). Thereby full labeling of all proteins was achieved.
Peptide Synthesis and Pulldowns—
Desthiobiotinylated peptides were synthesized on a solid-phase peptide synthesizer using amide resin (Intavis, Koeln, Germany). All peptides were designed as 15-mers with the Tyr(P) residue placed centrally at position 7 or 8 expect for one peptide from cluster 1 (see Table I) that was 20 amino acids long. The peptides were synthesized with an N-terminal biotin and an SG dipeptide linker. Peptides were synthesized as pairs in phosphorylated and a non-phosphorylated "control" form. The identity and purity of the synthesized peptides was confirmed by mass spectrometric analysis. For pulldowns, 1.5 nmol of immobilized peptide was added to an average of 1.5 mg of cell lysate. Dynabeads MyOne Streptavidin were saturated with biotinylated peptide prior to incubation with cell lysates. Cells were lysed as described previously (36), and equal amounts of protein were incubated overnight with the respective immobilized peptides at 4 °C. After three rounds of washing with lysis buffer, beads of pulldown pairs with the phosphorylated form and control were combined (20), and bound proteins were eluted using 16 mM biotin. Eluted proteins were precipitated and subsequently digested with trypsin for LC-MS/MS analysis.
LC/MS/MS, Database Searching, and Quantitation—
After reduction in 1 µg of DTT and alkylation with 5 µg of iodoacetamide the eluted proteins were in-solution digested with 1 µg of endoproteinase Lys-C (Wako) for 3 h at room temperature. Subsequently samples were diluted with 4 volumes of 50 mM NH4HCO3 and further digested with 1 µg of trypsin (Promega) overnight at room temperature. Peptide mixtures were desalted on stop and go extraction tips (37) and loaded onto reversed phase analytical columns for liquid chromatography (38). Peptides were eluted from the analytical column by a multistep linear gradient running from 2 to 40% acetonitrile in 100 min and sprayed directly into the orifice of an LTQ-FT or an LTQ-Orbitrap mass spectrometer (Thermo Electron, Bremen, Germany). Proteins were identified by MS/MS by information-dependent acquisition of fragmentation spectra of multiply charged peptides. The peak list was generated using in-house software, raw2msm version 1.2, with default settings. The identified proteins were then searched against the mouse IPI database using the Mascot (version 2.1.0) algorithm (39). The MS/MS ion search parameters were set as follows: enzyme specificity for trypsin, trypsin/Pro + AspPro; maximum number of missed cleavages, 2; fixed modification, carbamidomethylcysteine; variable modifications, oxidation (Met), N-acetyl (protein), deamidation (NQ), [13C6,15N4]arginine, [13C8]lysine, and pyro (N-terminal QC); mass tolerance for precursor ions, 5 ppm; fragment mass tolerance, 0.5 Da; database version, IPI_mouse mouse_v314 with 68,655 entries. Common contaminants like human keratins were manually added. No species-specific restrictions were used. MSQuant (SourceForge) was used for quantitation and spectra validation. MSQuant uses peak area and extracted ion chromatogram for quantitation.
Determination of Significant Binding Partners—
Intensity ratios of labeled to unlabeled forms of each validated tryptic peptide and the associated average ratio for the whole protein were obtained by MSQuant. We used crossover experiments in which the specific interaction partners were required to have inverse ratios compared with the normal experiment (20). A significant binder was defined as a protein with a log ratio at least three standard deviations over the log average ratio (>3 log(
) + log(µ)) of all the proteins identified in a pulldown experiment. Furthermore the binder had to be confirmed in at least two pulldown experiments (normal and crossover experiment), and we only report specific binders for the phosphopeptide and not the non-phosphorylated peptides. Finally at least one peptide had to have a score above 30, corresponding to p < 0.05. In the 64 pulldowns performed we identified a few sequence-unspecific binders with high affinity to either the phosphorylated or non-phosphorylated peptides (staphylococcal nuclease domain-containing protein, eukaryotic translation initiation factor, peptidylprolyl isomerase B, RNA-binding protein SiahBP, and RIKEN cDNA 2410104I19). These proteins were excluded because we consider them as false positive binders, i.e. they bind in a sequence-unspecific manner and occasionally bind most strongly to the non-phosphorylated peptide. In all the pulldown experiments an average of 140 ± 41 proteins was quantified with an average ratio of 1.217 ± 0.529.
| RESULTS |
|---|
|
|
|---|
|
We were able to obtain a convenient fit of the model because the positive controls were distributed such that all were statistically overrepresented in separate clusters (p < 0.05) (see Table I). For example, 10 of a total of 10 ligands of the PTB domains were grouped in cluster 1 (p < 7.34e–12). Only the 31 ligands for the Grb2 SH2 domain were split up in two significant groups (clusters 6 and 7). Furthermore eight of the 20 clusters did not contain a significant overrepresentation of the known ligands used.
Motif Extraction—
From each of the 20 clusters the most conserved motif was extracted with the TEIRESIAS pattern discovery tool from IBM Bioinformatics (Fig. 1D) (17) as described under Experimental Procedures. The 20 identified motifs are presented in Table I followed by the number of matches to peptides in the particular cluster and in the total data set. For example, a unique motif, NPXpYX(S/T), was extracted from cluster 1 because all peptides (seven) that matched this motif were in this cluster. To compare the 20 identified motifs with already characterized interaction motifs, we matched each motif to a comprehensive library of Tyr(P) interaction motifs in the Human Protein Reference Database (HPRD) (41). Using the same example again, the motif extracted from cluster 1 could be matched to the NPXpY motif described for the SHC and IRS-1 PTB domains (1), although our extended motif contains a Ser or Thr residue at position +2 from the Tyr(P) residue. Considering that the ligands (the positive controls) of the PTB domain were grouped in cluster 1, it is not surprising that we extracted the NPXpYX(S/T) motif in this cluster; however, only three of the seven matching peptides in cluster 1 were from these PTB ligands (data not shown). Because this was a general trend it shows the ability of the clustering method to gather previously uncharacterized peptides with high sequence similarity to known ligands of the different Tyr(P) binding domains. In total, 16 of the 20 identified motifs could be matched to the motif library, showing an overall consistency between the positive controls and the extracted and matched motifs. In four clusters new Tyr(P) motifs were identified.
Gene Ontology Analysis—
We used a GO analysis of the 20 extracted motifs on a proteome level to determine in which type of proteins the extracted motifs generally are present (Fig. 1E). We retrieved all proteins in the mouse proteome that matched the motifs and used a hypergeometrical test with the total proteome as background and thereby found the 10 most overrepresented GO terms in retrieved proteins (see "Experimental Procedures" for details). Using the same example again, the NPXYX(S/T) motif from cluster 1 was overrepresented in proteins involved in processes like receptor activity and intrinsic to membrane with the consensus parent GO term assessed to be transmembrane receptor activity (see supplemental Table 2). Thus, analyzing the motif on a proteome level indicates that proteins containing the motif are involved in early signaling transduction. This makes sense in that this is a motif for the PTB domain, which is found in proteins that function as molecular scaffolds and adaptors in signaling pathways (1).
Tyr(P)-specific Interaction Partners—
To experimentally verify the 20 extracted motifs we used a phosphorylation-specific peptide-protein interaction screen (Fig. 1F) (20, 36). This assay is based on differential labeling of proteins using stable isotope labeling by amino acids in cell culture (SILAC) making it possible to distinguish specific binders from background binders by their isotope ratios determined by quantitative mass spectrometry (35, 42). The peptides are synthesized in a phosphorylated form and a non-phosphorylated form and used as baits to pull down competing binding partners from cell lysate, thus mimicking the in vivo binding situation.
We synthesized peptide pairs matching the 20 extracted motifs. If there were multiple matches we chose peptides with Tyr(P) sites not previously known to mediate interaction. Using this experimental approach we could test our clustering and motif extraction method and investigate the relevance of known motifs in a near in vivo situation, i.e. endogenous proteins competing for binding, and potentially discover binding partners for novel motifs. Again using cluster 1 as an example, we synthesized a peptide pair from the engulfment and cell motility protein 2 in which the Tyr-48 residue was either phosphorylated or non-phosphorylated. This is an uncharacterized phosphosite identified in a large scale phosphoproteomics study (12), and it has not previously been shown to direct Tyr(P)-dependent interaction. Using this peptide pair as bait we retrieved one specific binder with a ratio more than three standard deviations over the log mean of a total of 162 background binders. This protein, SHC-transforming protein 1 (SHC), which contains both a PTB and an SH2 domain, had a total Mascot score of 1654 with 15 identified peptides of which nine were quantifiable (see supplemental Table 1). Theoretically it could be either the SH2 or the PTB domain that binds the bait phosphopeptide; however, because the peptide matches the consensus NPXpY motif known to direct PTB domain binding, it is most likely that SHC binds to the peptide through its PTB domain.
Assaying the Tyr(P) Sequence Space for Interaction Partners—
The identified phosphospecific binding partners of the representative peptides from each cluster can be seen in Table I. Besides the aforementioned SHC protein, these proteins all contain SH2 domains, making it very likely that this domain governed the phosphospecific interaction. The majority (13 of 20) of the peptides retrieved one or more interaction proteins. Of these proteins seven were unique because some proteins were identified several times. This is not surprising because some of the clusters were close to each other in sequence space resulting in extraction of similar motifs and ultimately retrieving the same interaction partners.
To get an overview of the results of the pulldown experiments, the GO analysis, and the sequence similarity between the different clusters, we generated weight matrices of the peptides in the individual clusters and constructed a dendrogram based on an alignment of these matrices (32). The tree can be seen in Fig. 2 together with sequence logo plots where the height of each position represents the degree of conservation (33). The logo plots visually illustrate a successful partitioning because each cluster has a distinct pattern where particular amino acids are highly abundant at specific positions flanking the central Tyr(P) residue. The tree is colored according to the identified interaction partners retrieved by peptides matching the motifs in the different clusters. For example, clusters 8 and 11 are close in sequence space with an overrepresentation of hydrophobic residues, especially methionine, at position +3 from the central tyrosine residue. Rather than being distinct clusters, these are more likely to be subsets of the same motif. Thus, from these two clusters the motifs pY(I/L/M/V)XMXP and pY(I/L/M/V)PMXP were extracted and matched to the consensus pYXXM in the library of motifs. Accordingly the two peptide pairs synthesized from these clusters both retrieve the PI3K-p85
protein. In the same manner the majority of the expected interaction partners were identified using peptides matching to the well characterized C-terminally directed Tyr(P) motifs, such as pYXN, pYXXP, and pY(I/V/L)X(I/V/L), that retrieved Grb2, RasGAP, and SHP2, respectively. This illustrates the clear consistency between the sequence similarity of the clusters, the conserved residues in the motifs, and the interaction partners identified.
|
Non-binding Tyr(P) Motifs—
We observed that motifs that do not mediate interaction in the pulldown assay are typically found in proteins involved in processes other than signal transduction (see Fig. 2). It is particularly interesting that we extracted a highly conserved H(S/T)GXKPpYXCXXXCG motif from a number of closely related peptides from Cys2-His2 zinc finger proteins (zf-C2H2) concentrated in cluster 19 that did not pull down any phosphorylation-specific interaction partner. The phosphosite of the first tyrosine residue in the zf-C2H2 domain was also described recently in a study that mined for novel motifs in the phosphoproteome (18), although the identified motif in this work (EXXpY) was different from our top scoring motif (H(S/T)GXKPYXCXXXCG), which does not contain an acidic residue in position –3. However, our second best motif in cluster 19 (HXGEXXpY) closely resembles that reported by Schwartz and Gygi (18).
The H(S/T)GXKPYXCXXXCG motif is extremely specific for proteins containing the zf-C2H2 domain: of 33,758 proteins we retrieved 656 matches, all of which had the GO term nucleic acid binding (GO:0003676) (see supplemental Table 2), whereas 647 of the 656 matches had the term zinc ion binding (GO:0008270) (p < 1e–100).
Recently a role for phosphorylation of zf-C2H2 domains in inhibition of transcription has been suggested (43, 44), supposedly as a consequence of the negatively charged phosphomoiety that reduces DNA affinity (45). Indirectly our results support this; because we did not retrieve any interaction partner for the synthesized phosphopeptide matching the zinc finger motif, it is unlikely that this motif directs protein-protein interaction, but rather phosphorylation of this motif modulates nucleic acid binding.
Similarly the novel motifs from clusters 16 and 20 could mediate mechanisms other than protein-protein interactions, for example, a kinase motif that mediates enzyme activation, nucleic acid binding, protein folding, etc. Supporting this, we found these motifs to be present in proteins overrepresented in proteins with GO terms ion binding and kinase activity, respectively. Likewise the motifs from clusters where we did not find the expected partners (clusters 3, 9, 10 and 18) are all except motif 3 overrepresented in proteins involved in enzymatic processes and ion binding (see Fig. 2 or supplemental Table 2).
This indicates that the motifs not mediating protein binding could govern processes such as phosphorylation-dependent enzyme activation and nucleic acid binding. Taken together with the results from the pulldown experiment where 13 of 20 motifs mediated interaction, we estimate that one-third of the Tyr(P) motifs in the proteome mediate processes other than interaction through prototypic SH2 and PTB domains.
Identification of a New N-terminal Hydrophobic Tyr(P) Binding Motif for SHIP2—
From cluster 4 we extracted a new N-terminal hydrophobic motif, (D/E)XXX(I/L/V)(I/L/V)pY. We synthesized a peptide pair from the neural Wiskott-Aldrich syndrome protein (N-WASP) matching the motif where Tyr-253 was either phosphorylated or non-phosphorylated. This phospho-Tyr-253 was identified in a large scale phosphoproteomics experiment (12). It has previously been shown that phosphorylation of Tyr-253 modulates localization of N-WASP from nucleus to cytoplasm, thereby possibly stimulating cell migration (46). Using the N-WASP peptide pair in our quantitative proteomics experiment, we found the SH2 domain-bearing inositol 5-phosphatase 2 (SHIP2) as a significant binding partner with 12 peptides (ratio, 9.5 ± 1.5) of 131 protein background partners (ratio, 1.2 ± 0.9) (see supplemental Table 1). To address the specificity of this N-terminal hydrophobic phosphomotif we repeated the experiment with the only difference being that the two hydrophobic residues were mutated to alanines (AApY) and paired this peptide with the wild type N-WASP phosphopeptide (VIpY) (Fig. 3). The SHIP2 protein bound specifically to the wild type phosphopeptide (ratio, 10.2 ± 3) but not the mutated phosphopeptide, confirming the specificity of the hydrophobic motif. In a third interaction experiment we obtained similar results by using a phosphopeptide in which all three residues in the extracted motif were mutated (see supplemental Table 1). This supports the notion that recognition is based on two hydrophobic amino acids adjacent to the Tyr(P), and we could confine the motif to (I/L/V)(I/L/V)pY.
|
Similarly to previous studies using this phosphospecific pulldown method (20, 36) in this work we only retrieved specific interaction partners containing either an SH2 or a PTB domain. Because SHIP2 contains an SH2 domain it is highly unlikely that interaction between SHIP2 and the phosphopeptides could be mediated by domains other than the SH2 domain.
Because the observed interaction seems to be one of the first examples where N-terminal residues of the ligands guide SH2 domain binding, we wanted to exclude the possibility that the peptide was bound in a reverse fashion. We synthesized the N-WASP peptide pair with the reverse sequence (GTKEIFDpYIVKSTER) and found that SHIP2 was not retrieved using this scrambled peptide pair as bait (see supplemental Table 1). Thus, the assay has directional specificity, and only the two hydrophobic residues N-terminal and not C-terminal from the Tyr(P) direct SH2 domain-mediated SHIP2 binding.
To investigate the significance of the motif on a proteome level, we used the aforementioned GO analysis and found that proteins containing the motif indeed are involved in signal transduction. The (I/L/V)(I/L/V)Y is not particularly specific in itself because it matches 3618 times, which is about 10% of all proteins in the mouse proteome. However, of the 3618 proteins, 2495 can be backtracked to the term cell surface receptor-linked signal transduction (GO:0007166) (p < 1e–100). In summary, the N-terminal hydrophobic motif we characterized mediates SHIP2 interaction and is generally overrepresented in proteins involved in signal transduction.
| DISCUSSION |
|---|
|
|
|---|
Methodological Considerations—
Compared with previously published methods that mine for motifs in large scale phosphoproteomics data sets we partitioned by similarity with functionally characterized peptides prior to motif extraction. By this clustering approach we obtained high resolution and can extract meaningful patterns from functionally and physically related groups of peptides. We used the binding ligands of PTB and SH2 domains both as a clustering backbone and as positive controls in the clustering and could consequently obtain a satisfactory fit of the model. Ideally as more interaction data become available one would use independent controls; however, this was not possible due to data limitations. For all the 20 extracted motifs we could either detect a match to an existing Tyr(P) motif, retrieve a binding partner, or obtain a meaningful GO term for all proteins containing the motif, and thus, we estimate that we have an insignificantly low false positive motif extraction rate in our method.
In the few cases where we did not retrieve the expected binding partner from the Tyr(P) motif alone, i.e. Fps/Fes, STAT, and Vav, it can be speculated that 1) the motifs, originally defined in vitro, could be low affinity motifs in vivo, 2) the proteins were not expressed in the C2C12 muscle cell line used in this study, or 3) it could be due to a technical limitation in our assay.
Implications for a New Motif for SHIP2—
One of our findings was that the SH2 domain-containing inositol 5-phosphatase SHIP2 binds to a novel N-terminal hydrophobic motif, (I/L/V)(I/L/V)pY. The specificity of this motif was confirmed by mutational analysis. A scrambled peptide pair with the reverse sequence and thus with the motif C-terminal of the Tyr(P) did not retrieve SHIP2, confirming that the recognition lies on the N-terminal side of the Tyr(P). SHIP2 is also retrieved by two other peptide pairs matching the motifs, indicating this is a generic motif for SHIP2 binding.
In general, the C-terminal residues Tyr(P) +1 and +3 are considered as the most important for the binding specificity of prototypic SH2 domains (1, 2). Interestingly the SH2 binding motif of SHIP2 that we describe is N-terminal, indicating that the peptide binding groove of some SH2 domains also accommodates residues N-terminal of the Tyr(P). In agreement with this, the binding properties of the tandem SH2 domains of the protein-tyrosine phosphatase SHP-2 are governed by residues Tyr(P) –2 to +5 (47). SHP-1 and SHP-2 both bind the (I/L/V)XpYXX(I/L/V) ITIM motif in the cytoplasmic part of Fc receptors, and the Tyr(P) –2 hydrophobic residues have specifically been shown to mediate binding (23, 48, 49). In contrast to the prototypic SH2 domains that have an Arg in the
A2 binding pocket groove, the tandem SH2 domains of the SHP-1 and SHP-2 phosphatases instead have Gly (50). Supposedly this creates a gap that is filled by the side chain of the Tyr(P) –2 residue of the bound peptide. Supporting this hypothesis, it has been shown that a single point mutation in
A2 Gly
Arg disrupts the Tyr(P) –2-mediated binding specificity of SHP-2 (48). Furthermore it is known that signaling lymphocytic activation molecule-associated protein (SAP) and Eat2 SH2 domains in part are directed by N-terminal binding to ITIM motifs (51).
The binding motif of the SH2 domain of SHIP2 has not previously been investigated using degenerate peptide binding experiments; however, the ITIM motif has also been reported as a docking point for the SH2 domain of SHIP2 (52, 53). Combined with our observations, this indicates that the binding specificity of SHIP2 may be conferred by hydrophobic residues immediately upstream of the Tyr(P) with contributions from downstream hydrophobic patches. To our knowledge, other than that of SHP, this is the only other reported N-terminally directed Tyr(P) binding motif.
It could be speculated that the SHIP inositol phosphatases could have a binding mechanism similar to that of SHP protein-tyrosine phosphatases. However, in contrast to SHP the SH2 domains of the SHIP phosphatases resemble the prototypic SH2 domain because they have the highly conserved
A2 Arg (50), indicating that the N-terminally directed binding mechanisms differ between SHIP and SHP phosphatases. Because the crystal structure of the SHIP2 phosphatase with a bound ligand has not been resolved, the specific binding mechanisms have yet to be described.
SHIP2 is involved in membrane signaling by dephosphorylating the 5'-phosphate group of the key secondary messenger phosphatidylinositol 3,4,5-trisphosphate (PIP3). Through this action SHIP2 inhibits PI3K-mediated receptor tyrosine kinase signaling because PIP3 is generated by PI3K (54). The new motif we describe for the SH2 domain of SHIP2 fits into this overall function because a GO term analysis of all proteins containing the (I/L/V)(I/L/V)pY motif revealed that these proteins are involved in cell surface receptor-linked signal transduction. Presumably SHIP2 could bind to a number of yet unidentified membrane signaling proteins through its SH2 domain and in this way be translocated to the membrane or act cooperatively with these proteins. Negative regulators of PI3K and PIP3 are attractive as antiobesity and diabetes drug targets because PI3K is the main effector of insulin signaling. Recently a role for SHIP2 as a candidate for such therapeutic intervention has been proposed by studies of SHIP2 knock-out mice (55, 56). Thus, the new motif described in this report may not only be involved in regulation of SHIP2-mediated signal transduction but could also be relevant for medical use.
Conclusions—
This work presents the first system-wide approach to mine the proteome for Tyr(P) interaction motifs using both bioinformatics methods and experimental validation. Strikingly 16 of the 20 motifs extracted could be matched to previously described interaction motifs. Our experimental validation shows that the majority of the Tyr(P) interaction motifs that previously have been defined in vitro are also able to pull down interaction partners from complex lysate mixtures. The GO analysis revealed that motifs that mediate interaction in the pulldown assay are found in proteins involved in signal transduction, whereas remaining non-binding motifs are found in enzymes and ion- and nucleic acid-binding proteins. This raises the intriguing possibility that about one-third of the in vivo Tyr(P) sites are not directly involved in interaction via domains such as SH2 and PTB but rather are sites that could alter the catalytic activity of enzymes or modulate the DNA binding affinity of e.g. transcription factors.
Perspectives—
The developed clustering method is applicable to other types of complex large scale data sets of post-translational modification where a substantial amount of peptide-protein interactions have been identified. As MS-based methods map more modifications such as acetylation and methylation sites, such interactomes could be mined for conserved motifs and assayed for binding partners in a similar manner. Combining proteomics and bioinformatics enables one to do large scale screens in an unbiased way and thus allows one to reconfirm previous knowledge and discover new mechanisms at the same time. As mass spectrometric streamlining and automation advances, entire post-translational modification proteomes can be mapped, binding motifs can identified, and thus ultimately the specificity of signaling networks can be unraveled.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, October 15, 2007, DOI 10.1074/mcp.M700241-MCP200
1 The abbreviations used are: PTB, phosphotyrosine binding; PAM, partitioning around medoids; ITIM, immunoreceptor tyrosine-based inhibition motif; SH2, Src homology 2; GO, Gene Ontology; IPI, International Protein Index; zf-C2H2, Cys2-His2 zinc finger protein; N-WASP, neural Wiskott-Aldrich syndrome protein; PI3K, phosphatidylinositol 3-kinase; GAP, GTPase-activating protein; STAT, signal transducers and activators of transcription; PIP3, phosphatidylinositol 3,4,5-trisphosphate. ![]()
* This work was supported by Interaction Proteome, FP6, Contract LSHG-CT-2003-505520, a grant from the European Commission in the 6th framework program; by the Danish Platform for Integrative Biology, a grant from the Danish National Research Foundation (Danmarks Grundforskningsfond); and by the Danish Research Agency (Forskningsstyrelsen). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
¶ To whom correspondence should be addressed. Tel.: 45-4525-2477; Fax: 45-4593-1585; E-mail: nikob{at}cbs.dtu.dk or nblm{at}novozymes.com
| REFERENCES |
|---|
|
|
|---|
signaling pathway using a combination of immunoprecipitation and immobilized metal affinity chromatography.
Mol. Cell. Proteomics 4, 721
–730
RIIB in B cells under negative signaling.
Immunol. Lett. 72, 7
–15[CrossRef][Medline]
RIIB.
J. Biol. Chem. 275, 37357
–37364This article has been cited by other articles:
![]() |
S. Hanke and M. Mann The Phosphotyrosine Interactome of the Insulin Receptor Family and Its Substrates IRS-1 and IRS-2 Mol. Cell. Proteomics, March 1, 2009; 8(3): 519 - 534. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |