Advertisement

A Proteome-wide Domain-centric Perspective on Protein Phosphorylation*

  • Antonio Palmeri
    Affiliations
    Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
    Search for articles by this author
  • Gabriele Ausiello
    Affiliations
    Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
    Search for articles by this author
  • Fabrizio Ferrè
    Affiliations
    Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
    Search for articles by this author
  • Manuela Helmer-Citterich
    Correspondence
    To whom correspondence should be addressed: Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome. Italy. Tel.:+39 067259-4324;
    Affiliations
    Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
    Search for articles by this author
  • Pier Federico Gherardini
    Footnotes
    Affiliations
    Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
    Search for articles by this author
  • Author Footnotes
    * This work was supported by the EPIGEN flagship project and PRIN 2010 (prot. 20108XYHJS_006) to M.H.C.
    This article contains supplemental Tables S1 to S3.
    ¶ Present address: Baxter Laboratory for Stem Cell Biology, Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, California, USA.
Open AccessPublished:May 15, 2014DOI:https://doi.org/10.1074/mcp.M114.039990
      Phosphorylation is a widespread post-translational modification that modulates the function of a large number of proteins. Here we show that a significant proportion of all the domains in the human proteome is significantly enriched or depleted in phosphorylation events. A substantial improvement in phosphosites prediction is achieved by leveraging this observation, which has not been tapped by existing methods. Phosphorylation sites are often not shared between multiple occurrences of the same domain in the proteome, even when the phosphoacceptor residue is conserved. This is partly because of different functional constraints acting on the same domain in different protein contexts. Moreover, by augmenting domain alignments with structural information, we were able to provide direct evidence that phosphosites in protein-protein interfaces need not be positionally conserved, likely because they can modulate interactions simply by sitting in the same general surface area.
      Phosphorylation, the most widespread protein post-translational modification, is an important regulator of protein function. The addition of phosphate groups on serine, threonine, and tyrosine residues can modulate the activity of the target protein by inducing complex conformational changes, by modifying protein electrostatics, and by regulating domain-peptide interactions, as in 14-3-3 or SH2 domains, that specifically recognize phosphorylated residues. The standard experimental technique for the high-throughput identification of phosphorylation sites is mass spectrometry (
      • Stirnimann C.U.
      • Petsalaki E.
      • Russell R.B.
      • Müller C.W.
      WD40 proteins propel cellular networks.
      ).
      Phosphorylation is catalyzed by protein kinases, a family that in humans comprises ∼540 members (
      • Cohen P.
      The regulation of protein function by multisite phosphorylation–a 25 year update.
      ,
      • Manning G.
      • Whyte D.B.
      • Martinez R.
      • Hunter T.
      • Sudarsanam S.
      The protein kinase complement of the human genome.
      ). It is well understood that these enzymes recognize specific sequence motifs in their substrates (
      • Bridges D.
      • Moorhead G.B.G.
      14-3-3 proteins: a number of functions for a numbered protein.
      ,
      • Miller M.L.
      • Jensen L.J.
      • Diella F.
      • Jørgensen C.
      • Tinti M.
      • Li L.
      • Hsiung M.
      • Parker S.A.
      • Bordeaux J.
      • Sicheritz-Ponten T.
      • Olhovsky M.
      • Pasculescu A.
      • Alexander J.
      • Knapp S.
      • Blom N.
      • Bork P.
      • Li S.
      • Cesareni G.
      • Pawson T.
      • Turk B.E.
      • Yaffe M.B.
      • Brunak S.
      • Linding R.
      Linear motif atlas for phosphorylation-dependent signaling.
      ). Accordingly the sequence around the phosphorylation site is undisputedly the most important feature for phosphosite prediction (
      • Gao J.
      • Thelen J.J.
      • Dunker A.K.
      • Xu D.
      Musite: a tool for global prediction of general and kinase-specific phosphorylation sites.
      ,
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ). However the “context,” in a broad sense, where these motifs occur is also important as sequence alone is not enough to achieve the observed specificity of phosphorylation. Therefore, several studies have characterized multiple aspects of phosphosites such as their preference for loops and disordered regions (reviewed in (
      • Via A.
      • Diella F.
      • Gibson T.J.
      • Helmer-Citterich M.
      From sequence to structural analysis in protein phosphorylation motifs.
      )), or the tendency of phosphoserines and phosphothreonines to occur in clusters (
      • Schweiger R.
      • Linial M.
      Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data.
      ), and these features have been used to improve the performance of phosphosite predictors (
      • Gao J.
      • Thelen J.J.
      • Dunker A.K.
      • Xu D.
      Musite: a tool for global prediction of general and kinase-specific phosphorylation sites.
      ,
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ,
      • Iakoucheva L.M.
      • Radivojac P.
      • Brown C.J.
      • O'Connor T.R.
      • Sikes J.G.
      • Obradovic Z.
      • Dunker A.K.
      The importance of intrinsic disorder for protein phosphorylation.
      ,
      • Palmeri A.
      • Gherardini P.F.
      • Tsigankov P.
      • Ausiello G.
      • Späth G.F.
      • Zilberstein D.
      • Helmer-Citterich M.
      PhosTryp: a phosphorylation site predictor specific for parasitic protozoa of the family trypanosomatidae.
      ,
      • Moses A.M.
      • Hériché J.K.
      • Durbin R.
      Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase.
      ). Moreover placing kinases and substrates in the context of protein interaction networks has been shown to improve the prediction of phosphorylation by specific kinases (
      • Linding R.
      • Jensen L.J.
      • Ostheimer G.J.
      • van Vugt M.A.
      • Jørgensen C.
      • Miron I.M.
      • Diella F.
      • Colwill K.
      • Taylor L.
      • Elder K.
      • Metalnikov P.
      • Nguyen V.
      • Pasculescu A.
      • Jin J.
      • Park J.G.
      • Samson L.D.
      • Woodgett J.R.
      • Russell R.B.
      • Bork P.
      • Yaffe M.B.
      • Pawson T.
      Systematic discovery of in vivo phosphorylation networks.
      ).
      Perhaps one of the most puzzling observations when looking at the phosphoproteome as a whole, is the fact that a large proportion of phosphorylation sites is poorly conserved. This has led to various hypotheses. First some sites may represent nonfunctional, possibly low-stoichiometry, phosphorylation events that are picked up because of the sensitivity of mass-spectrometry (
      • Lienhard G.E.
      Non-functional phosphorylations?.
      ,
      • Landry C.R.
      • Levy E.D.
      • Michnick S.W.
      Weak functional constraints on phosphoproteomes.
      ). Indeed functionally characterized sites and those matching known kinase motifs are more conserved on average (
      • Landry C.R.
      • Levy E.D.
      • Michnick S.W.
      Weak functional constraints on phosphoproteomes.
      ,
      • Nguyen, Ba A.N.
      • Moses A.M.
      Evolution of characterized phosphorylation sites in budding yeast.
      ,
      • Beltrao P.
      • Albanèse V.
      • Kenner L.R.
      • Swaney D.L.
      • Burlingame A.
      • Villén J.
      • Lim W.A.
      • Fraser J.S.
      • Frydman J.
      • Krogan N.J.
      Systematic functional prioritization of protein posttranslational modifications.
      ). However, although in biology function often equates with conservation, there could be genuinely functional fast-evolving phosphosites, that are responsible for species-specific differences in signaling and regulation. Moreover in some cases, especially in the regulation of protein-protein interactions, the exact position of the phosphosites may be unimportant (
      • Tan C.S.
      • Jørgensen C.
      • Linding R.
      Roles of “junk phosphorylation” in modulating biomolecular association of phosphorylated proteins?.
      ,
      • Serber Z.
      • Ferrell J.E.
      Tuning bulk electrostatics to regulate protein function.
      ).
      Here we explore the issues of “context” and “conservation” of phosphorylation sites from the perspective of protein domains. To this end, we assembled a comprehensive database of phosphosites from publicly available sources and studied their proteome distribution with respect to the location and identity of protein domains. We focus on the human phosphoproteome because it has been very well characterized in a multitude of low- and high-throughput experiments, thus providing the opportunity for a comprehensive, proteome-wide, study. In particular, the issues we want to address are the following:
      • Are specific domain types preferentially phosphorylated? Or conversely are some domains specifically depleted of phosphorylation sites?
      • Can the domain context be used to improve the prediction of phosphorylation sites?
      • What is the conservation pattern of phosphosites when looking at multiple instances of the same domain in the proteome?

      MATERIALS AND METHODS

      We collected human phosphorylation sites from the following databases: Phospho. ELM (
      • Dinkel H.
      • Chica C.
      • Via A.
      • Gould C.M.
      • Jensen L.J.
      • Gibson T.J.
      • Diella F.
      Phospho.ELM: a database of phosphorylation sites–update 2011.
      ), PhosphositePlus (
      • Hornbeck P.V.
      • Kornhauser J.M.
      • Tkachev S.
      • Zhang B.
      • Skrzypek E.
      • Murray B.
      • Latham V.
      • Sullivan M.
      PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
      ), UniProt (
      • UniProt Consortium
      Reorganizing the protein space at the Universal Protein Resource (UniProt).
      ), and PHOSIDA (
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ). All the phosphorylation sites were mapped on UniProt sequences, checking for the identity of a 10-residue window centered on the phosphosite. Phosphosites on different isoforms were mapped on the UniProt reference isoform using the program water from the EMBOSS package. HMMs for the identification of protein domains were downloaded from the PFAM database (
      • Punta M.
      • Coggill P.C.
      • Eberhardt R.Y.
      • Mistry J.
      • Tate J.
      • Boursnell C.
      • Pang N.
      • Forslund K.
      • Ceric G.
      • Clements J.
      • Heger A.
      • Holm L.
      • Sonnhammer E.L.
      • Eddy S.R.
      • Bateman A.
      • Finn R.D.
      The Pfam protein families database.
      ), selecting only the PFAM-A entries. The human proteome was scanned against this collection of HMMs using the pfam_scan.pl program.

      Phosphorylation Propensity of Domains and Inter Domain Regions

      We first estimated an average phosphorylation propensity by pooling all the domain types together and calculating the ratio of phosphorylated residues to the total number of phosphorylatable residues in the proteome. If a specific domain is not more or less phosphorylated than the average domain we expect this ratio, when calculated for a single domain type, to be similar to the value obtained by pooling all the domains together. The difference between these two proportions can be quantified with a Fisher test, that is, by asking what would be the probability of obtaining the observed phospho/nonphospho domain counts for a specific domain type if its probability of phosphorylation was equal to the overall phosphorylation propensity. The p values were adjusted for multiple testing by controlling the False Discovery Rate. We considered a domain as significantly enriched or depleted in phosphorylation when the adjusted p value was less than 0.05.
      We performed a similar procedure for Inter Domain Regions (IDR)
      The abbreviations used are: IDR, interdomain region; SVM, support vector machines.
      defined as the sequence regions lying between two domains of a given type, or a single domain and the N- or C-term of the protein (irrespective of the ordering). Thus, we obtained an overall propensity for IDRs that was compared with the propensity of each specific IDR in order to identify IDRs enriched or depleted in phosphorylation.

      Extracellular Domains Filtering in Phosphorylation Data Set

      Extracellular domains are expected to be depleted in phosphorylation and they were excluded from the data set to avoid introducing biases. To this end, we first predicted signal peptides in the whole proteome using SignalP (
      • Petersen T.N.
      • Brunak S.
      • von Heijne G.
      • Henrik N.
      SignalP 4.0: discriminating signal peptides from transmembrane regions.
      ). Thereafter we predicted Transmembrane segments with TOPcons single (
      • Hennerdal A.
      • Elofsson A.
      Rapid membrane protein topology prediction.
      ), after removing the predicted signal peptides from the sequences. All the proteins having a signal peptide were discarded, whereas the other proteins containing TM sequences were considered for further filtering. We tested the reliability of these predictions using the set of proteins from SwissProt annotated with the GO-term “cell.” There were 13,253 proteins that have the GO-term cell and are intracellular according to our procedure. Only 705 have the GO-term cell but are not correctly predicted and 5171 are predicted to be intracellular, for example, do not possess a signal peptide, or have a TM region, but are not annotated with the GO-term cell. Despite the good reliability of signal peptides and transmembrane segments predictions, these predictors are not perfect. Therefore, we calculated an “intracellular propensity” of each domain/IDR, as the ratio between the number of residues predicted as intracellular and the total number of the domain/IDR residues, and we used this measure to discard nonintracellular domains/IDRs. The majority of domains score either 1 or 0 on this intracellular propensity, highlighting the sharp distinction between the two classes. We concluded that a threshold of 0.7 was not overly stringent, while at the same time allowing us to eliminate from the data set almost all the domains annotated as extracellular.

      Domains Alignments and Conservation Scores

      The sequence ranges corresponding to each domain were aligned on the Hidden Markov Model (HMM) describing the domain using HMMER 3.0 (
      • Eddy S.R.
      A new generation of homology search tools based on probabilistic inference.
      ). These alignments were then used to map the phosphorylation sites and interface residues (see below) derived from each sequence on a common domain reference. We used two different measures of conservation throughout the paper. The conservation of the alignment columns was calculated by taking a sequence as reference and then counting the percentage of residues in each column having a BLOSUM62 substitution score > = 1 with the residue in the reference sequence. In order to compare different alignments we normalized this conservation scores by calculating the empirical percentile of the conservation of each column with respect to the distribution of all the columns in the alignment. The percentile was then used as conservation score. To obtain a single value for each alignment column we calculated the average of all the conservation scores obtained when using each sequence in turn as reference. The conservation of the phosphorylation event was defined as the proportion of phosphorylatable residues that are actually phosphorylated in each column containing at least one phosphosite. Only columns with at least ten aligned sequences were considered in this analysis.

      Paralogs Identification

      We used EnsemblCompara GeneTrees (
      • Vilella A.J.
      • Severin J.
      • Ureta-Vidal A.
      • Heng L.
      • Durbin R.
      • Birney E.
      EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.
      ) to obtain the paralogy relationships between all the human genes. We considered both the homology relationships within_species_paralogs and other_paralogs. Therefore, we clustered the proteins in paralogy groups, according to the relationships contained in the Gene Trees, and obtained 3203 paralogous protein clusters. The number of paralogy groups in which a domain appears provides an estimate of the number of different families (i.e. as opposed to single proteins) in which each domain is present.

      Structure-based Surface Clustering of Phosphorylation Sites

      In order to obtain a representative structure for each domain we used all the sequences in the alignment to perform a BLAST search against the Protein Data Bank. We then extracted the sequences from the matching structure files and identified the domain boundaries using pfam_scan.pl. The phosphorylation sites from each sequence were projected on all the sequences in the domain alignment. We selected as representative the structure that provided the highest coverage in terms of phosphosite positions and domain sequence.
      In order to cluster the phosphorylation sites we calculated the geodesic distance between all the pairs of residues in the structure corresponding to a phosphosite-containing column of the domain alignment. We used UCSF Chimera (
      • Pettersen E.F.
      • Goddard T.D.
      • Huang C.C.
      • Couch G.S.
      • Greenblatt D.M.
      • Meng E.C.
      • Ferrin T.E.
      UCSF Chimera–a visualization system for exploratory research and analysis.
      ) to calculate the molecular surface of the protein, described as a triangle mesh. In order not to assign buried phosphosites to the surface of the protein each site was associated with the closest surface vertex if its distance from it was less than 7.5 Angstrom. A visual inspection of a large number of cases showed that this procedure is effective in assigning phosphosites to surface vertices, while at the same time discarding buried sites. The geodesic distance is then defined as the shortest path in a graph where the nodes represent the vertices and the edges connect adjacent vertices and are weighted according to their distance in Angstroms. The shortest weighted path was calculated using an implementation of Dijkstra's algorithm (
      • Csardi G.
      • Nepusz T.
      The igraph software package for complex network research.
      ). The resulting matrix of residue distances was clustered using affinity propagation (
      • Bodenhofer U.
      • Kothmeier A.
      • Hochreiter S.
      APCluster: an R package for affinity propagation clustering.
      ,
      • Frey B.J.
      • Dueck D.
      Clustering by passing messages between data points.
      ).

      Protein Interfaces and Phosphorylation

      The data set of interface residues was derived by collecting all the pairs of different chains in the same PDB structure that could both be mapped on Uniprot. We only retained pairs of chains that had the same relative orientation in both the asymmetric and biological units. We consider two residues (one from each chain) as interacting if their distance is less than 0.5 Angstrom plus the sum of their Van Der Waals radii. Interfaces consisting of less than five residues on either chain were discarded. The interface residues were mapped on the corresponding domain alignment using the Uniprot accessions.

      Phosphosite Predictor

      We built predictors for pSer, pThr and for pTyr. All predictors are SVM-based classifiers. The training and testing procedures were written in R, using the R package LiblineaR.
      The data sets for each predictor were derived as follows. We extracted a window of −5/+5 residues around the phosphorylation sites in our data set to obtain the positive set. The negative set was derived by extracting the same −5/+5 window around all the phosphorylatable amino acids in the proteome, after excluding known phosphosites. We used 90% of the positive set for training and the remainder for the testing. We used a 50% sequence identity threshold to reduce the redundancy between the training and test sets (both positives and negatives) and also within each of the two sets. We then resized the negative training and test sets in order to have an equal number of negatives and positives.
      For each residue type we built two predictors, one including only the sequence around the phosphosite (in standard orthogonal binary encoding), and the other including the information related to the domain composition of the protein, simply encoded as the domain propensity.

      RESULTS AND DISCUSSION

      Data Set Composition

      We collected 65,239 human phosphorylation sites from the Phospho. ELM (
      • Dinkel H.
      • Chica C.
      • Via A.
      • Gould C.M.
      • Jensen L.J.
      • Gibson T.J.
      • Diella F.
      Phospho.ELM: a database of phosphorylation sites–update 2011.
      ), PhosphositePlus (
      • Hornbeck P.V.
      • Kornhauser J.M.
      • Tkachev S.
      • Zhang B.
      • Skrzypek E.
      • Murray B.
      • Latham V.
      • Sullivan M.
      PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
      ), UniProt (
      • UniProt Consortium
      Reorganizing the protein space at the Universal Protein Resource (UniProt).
      ), and PHOSIDA (
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ) databases (see Methods). We then identified PFAM (
      • Punta M.
      • Coggill P.C.
      • Eberhardt R.Y.
      • Mistry J.
      • Tate J.
      • Boursnell C.
      • Pang N.
      • Forslund K.
      • Ceric G.
      • Clements J.
      • Heger A.
      • Holm L.
      • Sonnhammer E.L.
      • Eddy S.R.
      • Bateman A.
      • Finn R.D.
      The Pfam protein families database.
      ) domains in all the human proteome and investigated the distribution of phosphorylation sites with respect to these protein domains. We found that 6710 sites were either located in proteins with no PFAM domain, or they were assigned to multiple domains because of overlaps in the domain definitions. These sites were discarded and not considered further. We expect extracellular phosphorylation sites to be both more rare, and under-represented in databases because of experimental setup. In order to eliminate this bias we constructed an “intra-cellular proteome” by predicting the cellular localization of all the proteins in our data set (see Methods). Without this step, domains which are predominantly extracellular would appear as depleted in phosphorylation. Our final data set comprises 48,252 phosphorylation sites, of which 29,837 are serines, 9479 threonines, and 8936 tyrosines (supplementary Table S1). These sites map to 6880 proteins (∼56% of the predicted human intracellular proteome). The average number of phosphosites per protein is seven and almost 19% of phosphoproteins contain more than 10 phosphoresidues. Twelve percent of the sites were identified in low-throughput experiments, whereas the remainder was derived from high-throughput data sets.
      The majority of phosphorylatable residues (i.e. Ser/Thr/Tyr) are in N- and C-terminal regions, accounting for 53% of the total, whereas 26% are in domains and 21% in Inter-Domain Regions (IDRs). However, both IDRs and N- and C-terminal regions have the highest phosphorylation density (5.6%)—that is, the proportion of phosphorylatable residues that are actually phosphorylated—compared with domains (3.6%).
      Interestingly, sites from high-throughput experiments are preferentially located outside protein domains (76% versus 64% of low throughput sites, Chi-square test p < 2.2e-16). Protein regions outside globular domains are more exposed to solvent and therefore more likely to be recognized by kinases. Recently a number of authors have suggested that a proportion of phosphorylation sites may result from random encounters between kinase and substrate and have no functional meaning (
      • Landry C.R.
      • Levy E.D.
      • Michnick S.W.
      Weak functional constraints on phosphoproteomes.
      ), representing only the “noise” in the system. In general, we can assume that sites from low-throughput experiments are more likely to have a functional meaning, because they were derived from studies investigating single sites of interest. The observed enrichment would therefore lend support to this hypothesis as mass-spectrometry could be picking up low-stoichiometry nonfunctional phosphorylation events, which are more likely to happen in highly accessible regions of the protein (i.e. outside globular domains).

      The Domain Context of Protein Phosphorylation

      In order to explore the domain-context of phosphorylation we investigated whether specific domain types are significantly enriched or depleted in phosphorylation, that is, whether phosphorylation specifically modulates certain domains. We estimated the average propensity of each residue type (Ser/Thr/Tyr) to be phosphorylated by pooling all the domain types together and calculating the ratio of phosphorylated residues to the total number of phosphoacceptor residues in the proteome. If a specific domain is not more or less frequently phosphorylated than the average domain, we expect this ratio, when calculated for a single domain type, to be similar to the value obtained by pooling all the domains together. Conversely, domains enriched or depleted in phosphorylation will display a higher or lower propensity. The significances of these differences in propensity were evaluated with a statistical test (see Methods).
      Following this analysis we obtained, for each residue type, a list of domains significantly enriched or depleted in phosphorylation. 151 domains were significantly enriched in pSer phosphorylation and 33 were depleted (see Table IA,B); 55 were enriched in pThr and 11 depleted (see Table IIA,B). Finally, for pTyr, we found 39 domains enriched and eight depleted (see Tables IIIA,B). We observed that the significantly enriched domain types represent 6% of the 3131 domain types and the significantly depleted domain types are 1.1%. If we consider the total number of domain instances in the proteome (i.e. accounting for multiple copies of the same domain), the significantly enriched and depleted domains respectively represent 12% and 17% of the total. This difference is mainly because of pS and pT, as shown in Table IV.
      Table IA–B: Domains enriched or depleted in Ser phosphorylation (p value < 0.05): domain types significantly (p value < 0.05) enriched (Table IA) and depleted (Table IB) in Ser phosphorylation, sorted by pSer propensity (for clarity only the first 15 rows are showed, the full data is available in supplementary Table S2). Length indicates the length of the alignment. Adjusted p value is the p value for the Fisher test (see Methods), after False Discovery Rate correction. pSer Propensity is the proportion of serine residues in the alignment that are phosphorylated. Ser Content is the proportion of serine residues in the alignment
      A
      NameDescriptionParalogy groupsLengthAdjusted p valueSer contentpSer propensity
      K167RK167R (NUC007) repeat11111.68E-381.100.49
      SynaptobrevinSynaptobrevin2796.22E-110.870.42
      HistoneCore histone H2A/H2B/H3/H414721.01E-161.110.20
      BEXBrain expressed X-linked like family31321.21E-050.760.18
      Tubulin_CTubulin C-terminal domain51251.32E-080.860.17
      Linker_histoneLinker histone H1 and H5 family3702.22E-031.640.14
      UbiquitinUbiquitin family13651.21E-050.820.13
      Band_3_cytoBand 3 cytoplasmic domain22441.30E-051.260.13
      HSP90Hsp90 protein13078.67E-050.890.12
      VinculinVinculin family13514.29E-051.000.11
      RRM_6RNA recognition motif (a.k.a. RRM, RBD, or RNP domain)19689.48E-030.740.09
      HMG_boxHMG (high mobility group) box11672.60E-020.780.09
      TubulinTubulin/FtsZ family, GTPase domain52123.82E-041.070.09
      HSP70Hsp70 protein25144.83E-030.860.07
      PkinaseProtein kinase domain892552.93E-050.840.05
      B
      PMGPMG protein41502.8E-043.550.00
      DUF1220Repeat of unknown function (DUF1220)1653.4E-081.950.00
      DENNDENN (AEX-3) domain31881.2E-021.250.00
      PI-PLC-XPhosphatidylinositol-specific phospholipase C, X domain41384.0E-021.200.00
      PLATPLAT/LH2 domain51072.2E-020.920.00
      RhoGEFRhoGEF domain201782.4E-090.890.00
      BACKBTB And C-terminal Kelch14988.0E-040.890.00
      PDEase_I3′5′-cyclic nucleotide phosphodiesterase42354.3E-030.840.00
      RabGAP-TBCRab-GTPase-TBC domain81981.7E-060.810.00
      PI3_PI4_kinasePhosphatidylinositol 3- and 4-kinase52333.1E-020.710.00
      BTBBTB/POZ domain321046.2E-111.060.00
      Oxysterol_BPOxysterol-binding protein33321.1E-021.130.00
      MAGEMAGE family31661.1E-020.770.00
      SEASEA domain7968.1E-051.150.00
      NACHTNACHT domain71632.7E-020.980.00
      Table IIA–B: Domains enriched/depleted in Thr phosphorylation (p value < 0.05): domain types significantly (p value < 0.05) enriched or depleted in Thr phosphorylation, sorted by pThr propensity (for clarity only the first 10 rows are showed, the full data is available in supplementary Table S2). See the legend of Table I for a description of the columns
      A
      NameDescriptionParalogy groupsLengthAdjusted p valueThr contentpThr propensity
      K167RK167R (NUC007) repeat11117.1E-512.290.38
      HistoneCore histone H2A/H2B/H3/H414721.7E-090.970.15
      UbiquitinUbiquitin family13654.2E-081.440.12
      GTP_EFTU_D2Elongation factor Tu domain 26712.3E-021.330.10
      Tubulin_CTubulin C-terminal domain51251.6E-031.300.09
      HSP70Hsp70 protein25143.4E-091.270.08
      PkinaseProtein kinase domain892551.1E-720.870.08
      TubulinTubulin/FtsZ family, GTPase domain52124.3E-061.370.08
      VinculinVinculin family13511.4E-021.100.07
      Pkinase_TyrProtein tyrosine kinase252601.5E-020.790.04
      B
      SPRYSPRY domain201173.72E-041.030.00
      RhoGEFRhoGEF domain201781.80E-030.810.00
      RabGAP-TBCRab-GTPase-TBC domain81982.78E-020.730.00
      SEASEA domain7968.64E-041.890.00
      HADHaloacid dehalogenase-like hydrolase34332.77E-021.340.00
      HydrolaseAnkyrin repeats (3 copies)6811.94E-090.910.00
      Hydrolase_3Homeobox domain1571.00E-031.370.00
      Ank_2Neurotransmitter-gated ion-channel transmembrane region661682.77E-021.390.00
      HomeoboxBTB/POZ domain401043.26E-020.900.00
      Neur_chan_membMyosin head (motor domain)96231.50E-020.980.01
      Table IIIA–B: Domains enriched or depleted in Tyr phosphorylation (p value < 0.05): domain types significantly (p value < 0.05) enriched or depleted in Tyr phosphorylation, sorted by pTyr propensity (for clarity only the first eight rows are showed, the full data is available in supplementary Table S2). See the legend of Table I for a description of the columns
      A
      NameDescriptionParalogy groupsLengthAdjusted p valueTyr contentpTyr propensity
      GlobinGlobin21046.02E-030.480.42
      Linker_histoneLinker histone H1 and H5 family3706.79E-030.910.35
      Tubulin_CTubulin C-terminal domain51259.60E-100.900.34
      Myosin_tail_1Myosin tail57722.07E-080.210.30
      Myosin_TH1Cofilin/tropomyosin-type actin-binding protein21205.87E-051.320.29
      Cofilin_ADFCore histone H2A/H2B/H3/H44721.21E-111.410.27
      HistoneUbiquitin family14651.63E-030.590.23
      UbiquitinProtein tyrosine kinase132604.63E-651.190.22
      B
      DUF1220Repeat of unknown function (DUF1220)1651.3E-031.360.00
      adh_shortShort chain dehydrogenase131624.7E-020.630.00
      HECTHECT-domain (ubiquitin-transferase)93053.7E-051.360.00
      Hormone_recepLigand-binding domain of nuclear hormone receptor111803.1E-020.740.00
      BTBBTB/POZ domain321042.1E-051.090.01
      SPRYSPRY domain201175.7E-041.470.01
      KinesinKinesin motor domain93231.9E-021.000.02
      UCHUbiquitin carboxyl-terminal hydrolase124173.7E-051.270.02
      Table IVAbundance of the significantly enriched or depleted domain types or copies in the Human Proteome: This table details the proportion of all the domain types that are significantly enriched or depleted in phosphorylation (Significantly Enriched Domain Types, Significantly Depleted Domain Types). We also calculated the proportion of all the domain occurrences in the proteome that belong to an enriched or depleted type (Significantly enriched domains, Significantly depleted domains). Pospho Propensity is the overall proportion of all Ser/Thr or Tyr that are phosphorylated
      Phospho residuePhospho propensitySignificantly enriched domain typesSignificantly depleted domain typesSignificantly enriched domainsSignificantly depleted domains
      P-Ser0.0350.0480.0110.0640.15
      P-Thr0.0230.0180.00350.0500.078
      P-Tyr0.0550.0120.00260.0660.033
      The different types of kinase domains represent a specific case that deserves to be discussed separately. These proteins often participate in signaling pathways where phosphorylation is used by upstream kinases to regulate the activity of downstream ones. Therefore, this protein family is both responsible for phosphorylation, and finely modulated by it. Furthermore, there is a clear tendency for these domains to be phosphorylated on the same type of residues for which they catalyze the reaction, especially for the Tyr-Kinase domain. Indeed, the Protein Kinase domain (which includes mostly Ser/Thr kinases) shows a significant enrichment for pSer and pThr, while it is not enriched for pTyr. Similarly, the Protein Tyrosine Kinase domain is significantly enriched for pTyr, but not for pSer and pThr.
      The domain showing the highest mean propensity across all the phospho-modifications is the Paxillin Family domain. This domain is found in adaptor proteins and the phosphosites act as docking sites for other proteins (
      • Schaller M.D.
      Paxillin: a focal adhesion-associated adaptor protein.
      ). Some of the other interesting domains which have been described as highly modulated by phosphorylation are Core histone H2A/H2B/H3/H4, which reflects the role of phosphorylation in regulating the cellular response to DNA damage, histone turnover and chromatin architecture, and oncogenesis (
      • Musselman C.A.
      • Lalonde M.E.
      • Côté J.
      • Kutateladze T.G.
      Perceiving the epigenetic landscape through histone readers.
      ,
      • Singh R.K.
      • Gunjan A.
      Histone tyrosine phosphorylation comes of age.
      ). The Ubiquitin family domain is also massively targeted by phosphorylation on all three residue types. Phosphorylation has been reported to influence the degradation of proteins, preventing it via ubiquitination (
      • Ju D.
      • Xu H.
      • Wang X.
      • Xie Y.
      Ubiquitin-mediated degradation of Rpn4 is controlled by a phosphorylation-dependent ubiquitylation signal.
      ,
      • Lin H.K.
      • Wang L.
      • Hu Y.C.
      • Altuwaijri S.
      • Chang C.
      Phosphorylation-dependent ubiquitylation and degradation of androgen receptor by Akt require Mdm2 E3 ligase.
      ,
      • Welcker M.
      • Orian A.
      • Jin J.
      • Grim J.E.
      • Grim J.A.
      • Harper J.W.
      • Eisenman R.N.
      • Clurman B.E.
      The Fbw7 tumor suppressor regulates glycogen synthase kinase 3 phosphorylation-dependent c-Myc protein degradation.
      ,
      • Yada M.
      • Hatakeyama S.
      • Kamura T.
      • Nishiyama M.
      • Tsunematsu R.
      • Imaki H.
      • Ishida N.
      • Okumura F.
      • Nakayama K.
      • Nakayama K.I.
      Phosphorylation-dependent degradation of c-Myc is mediated by the F-box protein Fbw7.
      ). Another evidence of the cross-talk between ubiquitination and phosphorylation, is represented by phosphodegrons—phosphorylation sites recognized by ubiquitin ligases. They serve as markers for the destruction of inhibitors of cyclin-dependent kinases at the initiation of DNA replication (
      • la Cova, de C.
      • Greenwald I.
      SEL-10/Fbw7-dependent negative feedback regulation of LIN-45/Braf signaling in C. elegans via a conserved phosphodegron.
      ,
      • Feldman R.M.
      • Correll C.C.
      • Kaplan K.B.
      • Deshaies R.J.
      A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p.
      ,
      • Jackson P.K.
      Ubiquitinating a phosphorylated Cdk inhibitor on the blades of the Cdc4 beta-propeller.
      ,
      • Lyons N.A.
      • Fonslow B.R.
      • Diedrich J.K.
      • Yates J.R.
      • Morgan D.O.
      Sequential primed kinases create a damage-responsive phosphodegron on Eco1.
      ,
      • Rossi M.
      • Duan S.
      • Jeong Y.T.
      • Horn M.
      • Saraf A.
      • Florens L.
      • Washburn M.P.
      • Antebi A.
      • Pagano M.
      Regulation of the CRL4(Cdt2) ubiquitin ligase and cell-cycle exit by the SCF(Fbxo11) ubiquitin ligase.
      ,
      • Skowyra D.
      • Craig K.L.
      • Tyers M.
      • Elledge S.J.
      • Harper J.W.
      F-box proteins are receptors that recruit phosphorylated substrates to the SCF ubiquitin-ligase complex.
      ,
      • Minguez P.
      • Parca L.
      • Diella F.
      • Mende D.R.
      • Kumar R.
      • Helmer-Citterich M.
      • Gavin A.C.
      • van Noort V.
      • Bork P.
      Deciphering a global network of functionally associated post-translational modifications.
      ).
      In order to evaluate the actual number of different protein families in which a domain appears we calculated the number of paralogy groups (i.e. as opposed to the actual number of proteins) having a specific domain. As shown by the size of the red bubbles in Figs. 1A1C, many depleted domains are widespread in the proteome and occur in a large number of different families. Conversely, as one moves to regions of higher propensity, the domains are more restricted to specific protein families (small bubbles). A notable exception is the kinase domain. The y axis represents the fraction of domain instances that are phosphorylated. Interestingly this figure is very variable, even for domains with comparable propensities.
      Figure thumbnail gr1
      Fig. 1Relationship between phosphorylation propensity of the domain—the proportion of ser/thr/tyr that are phosphorylated—and the proportion of domain copies that are phosphorylated in at least one domain instance. Each circle represents a domain. The size of the point is proportional to the number of different paralogy groups in which a domain is found. A, pSer, B, pThr, C, pTyr. For clarity, only domains with at least ten occurrences in the proteome are shown.
      As stated above, our data set only contains proteins predicted to be intracellular. Accordingly, we do not find among the depleted domains those occurring predominantly in extracellular proteins or in the extracellular portion of membrane proteins (e.g. Cadherin, Fibronectin type III, EGF-like, and Immunoglobulin). On one hand, extracellular proteins are less phosphorylated in general. On the other hand, depending on the experimental setup, these sites might be completely left out of the analysis. For instance if the data have been collected in cell cultures and the medium is discarded, then obviously secreted proteins will not appear in the data. Recently a number of works have investigated the phosphoproteome of several body fluids (
      • Bahl J.M.
      • Jensen S.S.
      • Larsen M.R.
      • Heegaard N.H.
      Characterization of the human cerebrospinal fluid phosphoproteome by titanium dioxide affinity chromatography and mass spectrometry.
      ,
      • Carrascal M.
      • Gay M.
      • Ovelleiro D.
      • Casas V.
      • Gelpí E.
      • Abian J.
      Characterization of the human plasma phosphoproteome using linear ion trap mass spectrometry and multiple search engines.
      ,
      • Stone M.D.
      • Chen X.
      • McGowan T.
      • Bandhakavi S.
      • Cheng B.
      • Rhodus N.L.
      • Griffin T.J.
      Large-scale phosphoproteomics analysis of whole saliva reveals a distinct phosphorylation pattern.
      ), but certainly these sites have received much less attention than the ones in intracellular proteins.

      Inter Domain Regions Significantly Enriched or Depleted in Phosphorylation

      Eighty percent of pSer, 69% of pThr, and 54% of pTyr map outside protein domains (similar figures were reported in (
      • Hornbeck P.V.
      • Kornhauser J.M.
      • Tkachev S.
      • Zhang B.
      • Skrzypek E.
      • Murray B.
      • Latham V.
      • Sullivan M.
      PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
      )). For all three residue types the phosphosites are preferentially located in Inter Domain Regions (IDRs) compared with nonmodified residues of the same type (all Fisher's exact tests p values < 2.2e-16). The increase in preference is more evident for pSer followed by pThr and pTyr (data not shown).
      In order to include these sites in the analysis we analyzed the distribution of phosphorylations with respect to the identity of the two domains flanking the Inter Domain Region (IDR). Thus, similarly to what we did for sites located in domains, we determined whether each IDR is enriched or depleted in phosphorylation. In defining the IDR, we did not take into account the ordering of the two domains, as this would excessively reduce the cardinality of each case. As we did with domains, we excluded from the analysis extracellular IDRs.
      There are 179 IDRs enriched in pSer and 76 depleted, whereas for pThr, 59 are enriched and 11 are depleted. For pTyr there are the 39 enriched IDRs and only five depleted, consistent with the observation that pTyr is less often found in IDRs, compared with pSer/pThr (Table V, Table VI, Table VII).
      Table VIIA–B: Interdomain Regions enriched or depleted in Tyr phosphorylation (p value < 0.05): IDR types significantly (p value < 0.05) depleted in Ser/Thr phosphorylation, sorted by pTyr propensity (for clarity only the first four rows are showed, the full data is available in supplementary Table S3). See the legend of Table V for a description of the columns
      A
      Name 1Name 2Domain 1Domain 2Average lengthAdjusted p valueTyr contentpTyr propensity
      SH2SH2SH2 domainSH2 domain1603.75E-102.310.58
      SH3SH3_2SH3 domainVariant SH3 domain1189.89E-030.660.47
      PHSH2PH domainSH2 domain1432.75E-020.860.46
      PkinaseCNHProtein kinase domainCNH domain5029.98E-090.760.38
      B
      SEASEASEA domainSEA domain1055.5E-031.500.00
      zf-H2C2_2zf-H2C2_2Zinc-finger double domainZinc-finger double domain1944.7E-020.930.01
      KRABzf-H2C2_2KRAB boxZinc-finger double domain1538.5E-091.270.02
      WD40WD40WD domain, G-beta repeatWD domain, G-beta repeat1351.4E-031.210.02
      Table VIA–B: Interdomain Regions enriched or depleted in Thr phosphorylation (p value < 0.05): IDR types significantly (p value < 0.05) depleted in Ser/Thr phosphorylation, sorted by pThr propensity (for clarity only the first six rows are showed, the full data is available in supplementary Table S3). See the legend of Table V for a description of the columns
      A
      Name 1Name 2Domain 1Domain 2Average lengthAdjusted p valueThr contentpThr propensity
      TPR_2TPR_1Tetratricopeptide repeatTetratricopeptide repeat1868.08E-071.260.26
      Ran_BP1GRIPRanBP1 domainGRIP domain2405.82E-041.420.16
      PBDPkinaseP21-Rho-binding domainProtein kinase domain2332.11E-031.220.15
      GTF2IGTF2IGTF2I-like repeatGTF2I-like repeat1213.84E-021.120.14
      BARSH3_2BAR domainVariant SH3 domain1673.83E-021.660.13
      PkinaseCNHProtein kinase domainCNH domain5025.82E-040.800.12
      B
      NACHTLRR_6NACHT domainLeucine Rich repeat4202.00E-030.920.00
      SEASEASEA domainSEA domain1056.34E-092.900.00
      KRABzf-H2C2_2KRAB boxZinc-finger double domain1531.54E-190.940.00
      KRABzf-C2H2KRAB boxZinc finger, C2H2 type1504.88E-020.880.00
      Ion_transIon_transIon transport proteinIon transport protein1832.89E-020.870.01
      WD40WD40WD domain, G-beta repeatWD domain, G-beta repeat1357.72E-071.160.01
      Table VA–B: Interdomain Regions enriched or depleted in Ser phosphorylation (p value < 0.05): IDR types significantly (p value < 0.05) enriched (Table VA)/depleted (Table VB) in Ser phosphorylation, sorted by pSer propensity (for clarity only the first 15 rows are showed, the full data is available in supplementary Table S3). Average Length is the average length of the IDR instances. Adjusted p value is the p value for the Fisher test (see Methods), after False Discovery Rate correction. Ser Content is the proportion of serine residues in the alignment. pSer Propensity is the proportion of serines in IDR occurrences that are actually phosphorylated
      A
      Name 1Name 2Description 1Description 2Average lengthAdjusted p valueSer contentpSer propensity
      C1_1PkinasePhorbol esters/diacylglycerol binding domain (C1 domain)Protein kinase domain784.84E-121.190.52
      PHPHPH domainPH domain1061.03E-071.440.25
      PkinaseCNHProtein kinase domainCNH domain5021.87E-231.030.25
      PHDBromodomainPHD-fingerBromodomain1055.75E-051.460.24
      CAP_GLYCAP_GLYCAP-Gly domainCAP-Gly domain1026.56E-062.120.23
      Ran_BP1Ran_BP1RanBP1 domainRanBP1 domain1766.73E-061.210.23
      TPR_1TPR_2Tetratricopeptide repeatTetratricopeptide repeat1866.05E-040.880.22
      dsrmdsrmDouble-stranded RNA binding motifDouble-stranded RNA binding motif792.67E-020.870.22
      BromodomainBromodomainBromodomainBromodomain1305.80E-071.000.22
      WWWWWW domainWW domain1572.22E-051.160.21
      CHLIMCalponin homology (CH) domainLIM domain2702.08E-051.260.18
      BARSH3_2BAR domainVariant SH3 domain1671.34E-020.930.18
      GTF2IGTF2IGTF2I-like repeatGTF2I-like repeat1217.09E-031.190.17
      KH_1KH_1KH domainKH domain1152.58E-031.020.17
      C1_1MA3MIF4G domainMA3 domain1883.58E-020.910.16
      B
      DUF1053Guanylate_cycDomain of Unknown Function (DUF1053)Adenylate and Guanylate cyclase catalytic domain2266.00E-030.670
      NACHTLRR_6NACHT domainLeucine Rich repeat4207.48E-120.700
      PLATPKD_channelPLAT/LH2 domainPolycystin cation channel5009.09E-060.970
      Na_Ca_exNa_Ca_exSodium/calcium exchanger proteinSodium/calcium exchanger protein2233.35E-020.720
      zf-C2H2zf-C2H2_6Zinc finger, C2H2 typeC2H2-type zinc finger1292.39E-020.880
      zf-C3HC4SPRYZinc finger, C3HC4 type (RING finger)SPRY domain2423.14E-020.750
      Ank_2Ank_2Ankyrin repeats (3 copies)Ankyrin repeats (3 copies)1173.15E-030.790
      PYRINNACHTPAAD/DAPIN/Pyrin domainNACHT domain1053.28E-020.640
      HomeoboxHomeoboxHomeobox domainHomeobox domain1551.13E-031.160
      Calx-betaCalx-betaCalx-beta domainCalx-beta domain2982.54E-050.820
      SEASEASEA domainSEA domain1054.15E-161.230
      WWHECTWW domainHECT-domain (ubiquitin-transferase)1522.39E-020.740
      LRR_6LRR_6Leucine Rich repeatLeucine Rich repeat1193.60E-051.090
      BeachWD40Beige/BEACH domainWD domain, G-beta repeat1448.75E-031.080
      DUF1220DUF1220Repeat of unknown function (DUF1220)Repeat of unknown function (DUF1220)1953.05E-030.680
      Interestingly almost all pTyr-enriched IDRs involve at least one protein-protein interaction domain (SH2, SH3, WW, PDZ, PX, etc.). These are very likely to be high-density regions where multiple signals are integrated.
      Eleven IDRs are simultaneously enriched for pSer, pThr, and pTyr. The IDR with the highest phosphopropensity for all phospho-modifications is flanked by the domains DNA gyrase/topoisomerase IV, subunit A, and DTHCT (NUC029) region.
      Fig. 2 shows the propensity of significantly enriched or depleted IDRs, together with the propensities of the flanking domains (for clarity only IDRs with at least 10 occurrences are shown, moreover the figure does not include regions between a domain and the N- or C-term of the protein).
      Figure thumbnail gr2
      Fig. 2Phosphorylation propensity of IDRs and their flanking domains. The IDR propensity is calculated as the fraction of residues that are phosphorylated on all the phosphorylatable residues in each specific IDR. The color indicates the propensity of the IDR. The figure is symmetric in color because the propensity does not take into account the ordering of the two domains. Once a point is located, the size of the plotting symbol shows the propensity of the domain on the x axis. By moving to the symmetric point along the diagonal, it is possible to determine the propensity of the other domain defining the IDR. (a): pSer, (b): pThr, (c): pTyr. For clarity only significant IDRs (p value < 0.05) with at least ten occurrences are shown for pSer, whereas for pThr and for pTyr only IDRs with at least 5 occurrences. Moreover, the plot does not contain IDRs where one of the two domains is the N- or C-term.
      There are a number of cases where the propensity of the IDR is different from that of the flanking domains. This is represented by small blue dots, indicating low propensity of the flanking domain, and high propensity of the IDR. For instance, even though the SH3 domain has a low domain propensity, the IDRs that are combinations of SH3 with Spectrin and with SH2, have a very high IDR propensity for pSer and pTyr respectively, thus suggesting that IDRs flanking the SH3 domain are highly modulated by phosphorylation.

      Using Domain Information for the Improvement of Phosphosite Prediction

      Following the observation that phosphorylation is influenced by the domain context, we next tested whether this information could be used to improve the prediction of phosphosites. The rationale for this is that it would seem desirable to assign a higher score to sites predicted in a domain that is enriched in phosphorylation and conversely reduce the score of those predicted in a depleted domain.
      We used a machine-learning approach based on Support Vector Machines (SVM) to build three predictors, one each for pSer, pThr, and pTyr.
      For each residue type we built two predictors, one including only the sequence around the phosphosite (in standard orthogonal binary encoding), and the other including also the phosphorylation propensity of the domain or IDR. For Ser, the predictor with all the features obtained an AUC of 0.72, 2% higher than the sequence-only predictor (see Table VIII). For Tyr the inclusion of the domain features affords an improvement of 7%, reaching an AUC of 0.66. For Thr we observe an improvement of 4%, reaching 0.72. It must be noted that we do not include in any way the information on the identity of the domain, as we only gave the propensity in input to the SVM.
      Table VIIIPredictor Performances: performance of the pSer/pThr and pTyr predictors when using the sequence only, or including the additional domain feature
      ResSequenceSequence + domain context
      Y0.580.66
      T0.680.72
      S0.700.73
      Clearly more elaborate encoding schemes are possible, based for instance on the domain signature of the protein (
      • Liu Y.
      • Tozeren A.
      Modular composition predicts kinase/substrate interactions.
      ). However such an encoding would bias more and more the predictor toward recognizing specific families of proteins (defined by their domain signatures), thus providing an unfair advantage. Moreover, our encoding is general enough to be applied to any protein irrespective of whether its domain composition is unique, or any domain is present at all.
      The reason for this improvement lies in the fact that the two sources of information - the sequence of the peptide and the domain propensity—are completely independent yet they are both related to the probability of a site being phosphorylated.
      We want to stress that the data set does not include extracellular proteins, as we filtered out predicted extracellular domains. Therefore, the improvement afforded by the domain information is not trivially due to the fact that the predictor is down-scoring extracellular proteins.

      Conservation of Phosphorylation Sites in Different Instances of the Same Domain

      Different domains vary considerably in the conservation of their phosphorylation sites. Both for pSer/Thr and pTyr a number of domains have a very small number of highly conserved phosphorylation sites. The fact that several of these domains have extremely low propensities means that, even though the alignment column is conserved, a small number of residues is actually phosphorylated (at least in the conditions tested in the experiments from which our data set is derived). Therefore, these residues represent either cases where the phosphorylation has a functional effect that is specific to a limited number of the proteins containing the domain, or possibly nonfunctional phosphorylation events.
      We analyzed the proportion of phosphorylatable residues that are actually phosphorylated in the alignment columns containing at least one phosphosite and at least ten phosphorylatable residues. Interestingly, for a large number of columns, 77% for Ser, 81% for Thr, and 67% for Tyr, this proportion is less than 10%. Undoubtedly, this is partly because of the fact that, by aligning all the copies of a given domain in the human proteome, we are comparing domain instances that are located in proteins with different functions and different regulation. We therefore repeated the analysis by grouping together domains contained in proteins belonging to the same family (see Methods). The proportion of sites with a ratio of phosphorylated/phosphorylatable less than 10% decreases, reaching 48% for Ser, 56% for Thr, and 27% for Tyr. However, these figures still represent a serious caveat against the practice of inferring the phosphorylation of a site on the basis of the observation that the same site is phosphorylated in another domain of the same family. Moreover, these results indicate that, inside protein domains, Tyr phosphorylation is more conserved than Ser/Thr.

      Phosphorylation and Protein-protein Interfaces

      A number of reports have shown that phosphorylation sites are often not conserved in position, although sometimes different phosphorylation sites are clustered in the same region of the alignment (
      • Tan C.S.H.
      • Bodenmiller B.
      • Pasculescu A.
      • Jovanovic M.
      • Hengartner M.O.
      • Jørgensen C.
      • Bader G.D.
      • Aebersold R.
      • Pawson T.
      • Linding R.
      Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases.
      ). In these cases, the exact position of the phosphorylation site may not be important as long as the same region of the protein is phosphorylated. It has been proposed that this phenomenon preferentially occurs at protein-protein interfaces, where phosphorylation of any residue in a given surface region may regulate the formation of the complex (
      • Serber Z.
      • Ferrell J.E.
      Tuning bulk electrostatics to regulate protein function.
      ). Accordingly Tan et al. (
      • Tan C.S.
      • Jørgensen C.
      • Linding R.
      Roles of “junk phosphorylation” in modulating biomolecular association of phosphorylated proteins?.
      ) observed that proteins displaying this pattern of phosphosites conservation are enriched in protein- and DNA-binding annotations and frequently interact with other proteins. We therefore set out to verify this hypothesis with our data set. We mapped all the phosphosites of a domain on a reference domain structure and clustered the sites according to the geodesic distance between the residues on the surface of the protein (i.e. the distance “walking” along the surface and not “cutting” through it).
      Each cluster therefore represents a set of phosphosite positions in the domain that are located in the same surface region, although not necessarily close in sequence.
      We next calculated for each cluster of phosphosites the average conservation and the proportion of phosphorylation sites that are located in a protein-protein interface. Interestingly, we found a negative correlation between these two variables so that clusters of phosphosites localized in interface regions tend to be less conserved (Kendall's correlation test p < 9.4e-9, Wilcoxon test between the two extreme bins in Fig. 3 p < 9.3e-8).
      Figure thumbnail gr3
      Fig. 3Relationship between interface propensity - the proportion of interface residues for each structural cluster of phosphorylation sites and the average phosphosite conservation. Interface propensity was binned in five equally-spaced intervals. Points indicate outliers.
      This observation provides direct and independent evidence in support of the hypothesis that clusters of nonpositionally conserved phosphosites modulate protein-protein interactions. Fig. 4 shows four examples of surface clusters of poorly conserved phosphoresidues that have a good overlap with protein-protein interface regions. A visual inspection of the alignments shows how distant in sequence phosphosites belonging to the same surface cluster can be, which clearly precludes the identification of these cases by sequence analysis only.
      Figure thumbnail gr4
      Fig. 4Example structures showing the overlap between clusters of poorly conserved phosphorylation sites and protein-protein interface regions. All the domains are represented as white molecular surfaces with phosphosites colored according to their original cluster. The interacting structures are represented as transparent ribbons. The phosphosites in the domain alignment are highlighted with the same color used in the structure representation. A, Variant SH3 domain in a homodimeric complex (PDB: 3a98). B, APOBEC-like N-terminal domain from Probable C->U-editing enzyme APOBEC-2 in a homotetrameric complex (PDB: 2nyt). C, SH2 domain (PDB: 1fyr). D, RNA recognition motif (PDB: 3k0j).
      We briefly discuss four examples of phosphorylation sites that are not positionally conserved, but cluster together on the domain structure and are also located in protein-protein interaction interfaces. The first example involves the Variant SH3 domain, a signaling module involved in domain-ligand interactions. We mapped the phosphosites clusters to the Variant SH3 domain of the protein DOCK2 in a structure that describes its interaction with ELMO1 (
      • Hanawa-Suetsugu K.
      • Kukimoto-Niino M.
      • Mishima-Tsumagari C.
      • Akasaka R.
      • Ohsawa N.
      • Sekine S.
      • Ito T.
      • Tochio N.
      • Koshiba S.
      • Kigawa T.
      • Terada T.
      • Shirouzu M.
      • Nishikimi A.
      • Uruno T.
      • Katakai T.
      • Kinashi T.
      • Kohda D.
      • Fukui Y.
      • Yokoyama S.
      Structural basis for mutual relief of the Rac guanine nucleotide exchange factor DOCK2 and its partner ELMO1 from their autoinhibited forms.
      ). Fig. 4A shows the remarkable overlap between the phospho-cluster in orange and the surface region of DOCK2 that interacts with the C-terminal Proline-rich region of ELMO1.
      The family of apolipoprotein B messenger RNA-editing enzyme catalytic (APOBEC) proteins deaminates mRNA and single-stranded DNA (
      • Conticello S.G.
      • Thomas C.J.F.
      • Petersen-Mahrt S.K.
      • Neuberger M.S.
      Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases.
      ) (see Fig. 4B). Ser38 of Activation-induced cytidine deaminase (Q9GZX7) (
      • Pham P.
      • Smolka M.B.
      • Calabrese P.
      • Landolph A.
      • Zhang K.
      • Zhou H.
      • Goodman M.F.
      Impact of phosphorylation and phosphorylation-null mutants on the activity and deamination specificity of activation-induced cytidine deaminase.
      ) and Ser47/72 of APOBEC-1 (P41238) (
      • Chen Z.
      • Eggerman T.L.
      • Patterson A.P.
      Phosphorylation is a regulatory mechanism in apolipoprotein B mRNA editing.
      ) have been characterized as modulators of the enzymatic activity of the respective proteins. The interface shown in the picture is from the structure of APOBEC-2 (
      • Prochnow C.
      • Bransteitter R.
      • Klein M.G.
      • Goodman M.F.
      • Chen X.S.
      The APOBEC-2 crystal structure and functional implications for the deaminase AID.
      ), for which no phosphosites are present in our data set. The structure is a homotetramer and many other APOBEC enzymes have been reported to form multimers. Interestingly this raises the possibility that the phosphosites in this surface cluster may modulate the activity of these proteins by affecting their oligomerization, even though they are not positionally conserved.
      Phosphorylation of the SH2 domain can tune its affinity for phosphotyrosine substrates, and can also affect the localization of SH2-containing proteins (
      • Comb W.C.
      • Hutti J.E.
      • Cogswell P.
      • Cantley L.C.
      • Baldwin A.S.
      p85α SH2 domain phosphorylation by IKK promotes feedback inhibition of PI3K and Akt in response to cellular starvation.
      ,
      • Couture C.
      • Songyang Z.
      • Jascur T.
      • Williams S.
      • Tailor P.
      • Cantley L.C.
      • Mustelin T.
      Regulation of the Lck SH2 domain by tyrosine phosphorylation.
      ,
      • Huang H.
      • Li L.
      • Wu C.
      • Schibli D.
      • Colwill K.
      • Ma S.
      • Li C.
      • Roy P.
      • Ho K.
      • Songyang Z.
      • Pawson T.
      • Gao Y.
      • Li S.S.-C.
      Defining the specificity space of the human SRC homology 2 domain.
      ,
      • Kaneko T.
      • Huang H.
      • Zhao B.
      • Li L.
      • Liu H.
      • Voss C.K.
      • Wu C.
      • Schiller M.R.
      • Li S.S.C.
      Loops govern SH2 domain specificity by controlling access to binding pockets.
      ,
      • Stover D.R.
      • Furet P.
      • Lydon N.B.
      Modulation of the SH2 binding specificity and kinase activity of Src by tyrosine phosphorylation within its SH2 domain.
      ). Fig. 4C shows different phospho-clusters mapped on the surface of Grb2 in a homodimeric complex.
      The last example (Fig. 4D) involves the RNA recognition motif domain here mapped on the structure of the U1 small nuclear ribonucleoprotein A in complex with the E. coli ThiM riboswitch (
      • Kulshina N.
      • Edwards T.E.
      • Ferré-D'Amaré A.R.
      Thermodynamic analysis of ligand binding and ligand binding-induced tertiary structure formation by the thiamine pyrophosphate riboswitch.
      ). The different phospho-clusters map to distinct interaction surfaces and they may modulate the affinity of the protein for RNA as well as other proteins.

      CONCLUSIONS

      In this work, we provide a proteome-wide assessment of the relationship between protein domains and phosphorylation in the human intracellular proteome. Our results show that 7% of the domain types in the proteome are significantly enriched or depleted in phosphorylation. Interestingly we found that a number of these domains, such as Ankyrin repeats, zinc fingers and WD, constitute a significant fraction of all the domain instances in the human proteome.
      We showed that the information about the domain composition of a protein and the specific domain or IDR in which a putative phosphosite is located can be used to improve the prediction of phosphorylation sites. This information was coded as a propensity value defined as the proportion of domains or IDRs of each type that are phosphorylated in the training set. We achieved a 2%, 4%, and 7% improvement in the prediction of pSer, pThr, and pTyr respectively, when compared with a predictor using sequence information only. This improvement is comparable to those reported in other studies including features such as conservation, secondary structure, disorder and local amino acid composition (
      • Gao J.
      • Thelen J.J.
      • Dunker A.K.
      • Xu D.
      Musite: a tool for global prediction of general and kinase-specific phosphorylation sites.
      ,
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ,
      • Iakoucheva L.M.
      • Radivojac P.
      • Brown C.J.
      • O'Connor T.R.
      • Sikes J.G.
      • Obradovic Z.
      • Dunker A.K.
      The importance of intrinsic disorder for protein phosphorylation.
      ,
      • Palmeri A.
      • Gherardini P.F.
      • Tsigankov P.
      • Ausiello G.
      • Späth G.F.
      • Zilberstein D.
      • Helmer-Citterich M.
      PhosTryp: a phosphorylation site predictor specific for parasitic protozoa of the family trypanosomatidae.
      ). Importantly, the domain propensity value we use represents orthogonal information, whereas features such as disorder and secondary structure are already quite effectively captured in the sequence data. Our method does not explicitly encode the domain composition of the protein, which would bias the predictor too much toward the recognition of known examples, and is general enough to be applied to any protein.
      We also used our data set of domain alignments to study the conservation of phosphorylation sites. There are conflicting reports in the literature about this issue with some authors reporting phosphorylation sites as more conserved (
      • Minguez P.
      • Parca L.
      • Diella F.
      • Mende D.R.
      • Kumar R.
      • Helmer-Citterich M.
      • Gavin A.C.
      • van Noort V.
      • Bork P.
      Deciphering a global network of functionally associated post-translational modifications.
      ,
      • Malik R.
      • Nigg E.A.
      • Körner R.
      Comparative conservation analysis of the human mitotic phosphoproteome.
      ), or not (
      • Landry C.R.
      • Levy E.D.
      • Michnick S.W.
      Weak functional constraints on phosphoproteomes.
      ,
      • Beltrao P.
      • Albanèse V.
      • Kenner L.R.
      • Swaney D.L.
      • Burlingame A.
      • Villén J.
      • Lim W.A.
      • Fraser J.S.
      • Frydman J.
      • Krogan N.J.
      Systematic functional prioritization of protein posttranslational modifications.
      ,
      • Jiménez J.L.
      • Hegemann B.
      • Hutchins J.R.
      • Peters J.M.
      • Durbin R.
      A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database.
      ) than corresponding nonmodified residues. The issue is undoubtedly confounded by the different criteria used to score conservation and also by the over-representation of phosphorylation sites in disordered regions. However, when the analysis is restricted to sites which are likely to be functional then a conservation signal definitely emerges (
      • Landry C.R.
      • Levy E.D.
      • Michnick S.W.
      Weak functional constraints on phosphoproteomes.
      ,
      • Nguyen, Ba A.N.
      • Moses A.M.
      Evolution of characterized phosphorylation sites in budding yeast.
      ,
      • Beltrao P.
      • Albanèse V.
      • Kenner L.R.
      • Swaney D.L.
      • Burlingame A.
      • Villén J.
      • Lim W.A.
      • Fraser J.S.
      • Frydman J.
      • Krogan N.J.
      Systematic functional prioritization of protein posttranslational modifications.
      ). These considerations notwithstanding, the possibility that nonconserved sites represent species-specific differences in regulation must not be ruled out. Whatever the answer to this question the inclusion of conservation provides only a very modest increment in phosphorylation site prediction (
      • Gnad F.
      • Ren S.
      • Cox J.
      • Olsen J.V.
      • Macek B.
      • Oroshi M.
      • Mann M.
      PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
      ). This could be explained by the fact that, in testing over a complete data set, one is also trying to predict nonconserved, possibly nonfunctional phosphorylation sites.
      By augmenting domain alignments with structural information we were able to provide a novel and direct evidence to the notion that phosphorylation sites that regulate protein-protein interfaces need not be positionally conserved (
      • Tan C.S.
      • Jørgensen C.
      • Linding R.
      Roles of “junk phosphorylation” in modulating biomolecular association of phosphorylated proteins?.
      ,
      • Serber Z.
      • Ferrell J.E.
      Tuning bulk electrostatics to regulate protein function.
      ). This mechanism can explain a portion, though obviously not all, of the observed “nonconservation”. Moreover, we observe that sites from high-throughput experiments are more likely to be located outside protein domains. We can use as proxy for functionality the fact that a phosphosite has been identified in a low-throughput study, as these sites are extremely likely to be functional, whereas the others may not be. The regions outside domains are more solvent accessible and therefore more likely to be recognized by protein kinases possibly resulting in a higher-proportion of nonfunctional phosphorylation events.
      In terms of conservation of the phosphorylation event (i.e. as opposed to simply the phosphoacceptor residue), even after grouping together paralogous proteins, 48% of pSer, 56% for pThr-, and 27% of pTyr-containing domain alignment columns are phosphorylated on less than 10% of the phosphorylatable residues. Even though any data set is necessarily incomplete, this observation should elicit caution when using sequence conservation to transfer phosphosites between different proteins. This is especially true in light of the fact that these figures refer to alignments of domains, that are more likely to be correct than those of unstructured regions.
      In conclusion, our work offers a new perspective on proteome-wide studies of phosphorylation. By studying the distribution of phosphorylation sites with respect to protein domains, we were able to derive an informative measure for phosphosite prediction that is independent from other features commonly used for this task. Finally, we showed that phosphosites in protein-protein interfaces need not be positionally conserved and shed new light on a number of other issues pertaining to their general characteristics.

      Acknowledgments

      We thank Prof. Gianni Cesareni for critically reading the manuscript, Alessio Colantoni for providing the data set of protein–protein interfaces, Prof. Gianpaolo Scalia Tomba for help with the statistical analysis and Luca Parca for helpful discussion.

      Supplementary Material

      REFERENCES

        • Stirnimann C.U.
        • Petsalaki E.
        • Russell R.B.
        • Müller C.W.
        WD40 proteins propel cellular networks.
        Trends Biochem. Sci. 2010; 35: 565-574
        • Cohen P.
        The regulation of protein function by multisite phosphorylation–a 25 year update.
        Trends Biochem. Sci. 2000; 25: 596-601
        • Manning G.
        • Whyte D.B.
        • Martinez R.
        • Hunter T.
        • Sudarsanam S.
        The protein kinase complement of the human genome.
        Science. 2002; 298: 1912-1934
        • Bridges D.
        • Moorhead G.B.G.
        14-3-3 proteins: a number of functions for a numbered protein.
        Science's STKE. 2004; 2004: re10
        • Miller M.L.
        • Jensen L.J.
        • Diella F.
        • Jørgensen C.
        • Tinti M.
        • Li L.
        • Hsiung M.
        • Parker S.A.
        • Bordeaux J.
        • Sicheritz-Ponten T.
        • Olhovsky M.
        • Pasculescu A.
        • Alexander J.
        • Knapp S.
        • Blom N.
        • Bork P.
        • Li S.
        • Cesareni G.
        • Pawson T.
        • Turk B.E.
        • Yaffe M.B.
        • Brunak S.
        • Linding R.
        Linear motif atlas for phosphorylation-dependent signaling.
        Sci. Signal. 2008; 1: ra2
        • Gao J.
        • Thelen J.J.
        • Dunker A.K.
        • Xu D.
        Musite: a tool for global prediction of general and kinase-specific phosphorylation sites.
        Mol. Cell. Proteomics. 2010; 9: 2586-2600
        • Gnad F.
        • Ren S.
        • Cox J.
        • Olsen J.V.
        • Macek B.
        • Oroshi M.
        • Mann M.
        PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites.
        Genome Biol. 2007; 8: R250
        • Via A.
        • Diella F.
        • Gibson T.J.
        • Helmer-Citterich M.
        From sequence to structural analysis in protein phosphorylation motifs.
        Front. Biosci. 2011; 16: 1261-1275
        • Schweiger R.
        • Linial M.
        Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data.
        Biol. Direct. 2010; 5: 6
        • Iakoucheva L.M.
        • Radivojac P.
        • Brown C.J.
        • O'Connor T.R.
        • Sikes J.G.
        • Obradovic Z.
        • Dunker A.K.
        The importance of intrinsic disorder for protein phosphorylation.
        Nucleic Acids Res. 2004; 32: 1037-1049
        • Palmeri A.
        • Gherardini P.F.
        • Tsigankov P.
        • Ausiello G.
        • Späth G.F.
        • Zilberstein D.
        • Helmer-Citterich M.
        PhosTryp: a phosphorylation site predictor specific for parasitic protozoa of the family trypanosomatidae.
        BMC Genomics. 2011; 12: 614
        • Moses A.M.
        • Hériché J.K.
        • Durbin R.
        Clustering of phosphorylation site recognition motifs can be exploited to predict the targets of cyclin-dependent kinase.
        Genome Biol. 2007; 8: R23
        • Linding R.
        • Jensen L.J.
        • Ostheimer G.J.
        • van Vugt M.A.
        • Jørgensen C.
        • Miron I.M.
        • Diella F.
        • Colwill K.
        • Taylor L.
        • Elder K.
        • Metalnikov P.
        • Nguyen V.
        • Pasculescu A.
        • Jin J.
        • Park J.G.
        • Samson L.D.
        • Woodgett J.R.
        • Russell R.B.
        • Bork P.
        • Yaffe M.B.
        • Pawson T.
        Systematic discovery of in vivo phosphorylation networks.
        Cell. 2007; 129: 1415-1426
        • Lienhard G.E.
        Non-functional phosphorylations?.
        Trends Biochem. Sci. 2008; 33: 351-352
        • Landry C.R.
        • Levy E.D.
        • Michnick S.W.
        Weak functional constraints on phosphoproteomes.
        Trends Genet. 2009; 25: 193-197
        • Nguyen, Ba A.N.
        • Moses A.M.
        Evolution of characterized phosphorylation sites in budding yeast.
        Mol. Biol. Evol. 2010; 27: 2027-2037
        • Beltrao P.
        • Albanèse V.
        • Kenner L.R.
        • Swaney D.L.
        • Burlingame A.
        • Villén J.
        • Lim W.A.
        • Fraser J.S.
        • Frydman J.
        • Krogan N.J.
        Systematic functional prioritization of protein posttranslational modifications.
        Cell. 2012; 150: 413-425
        • Tan C.S.
        • Jørgensen C.
        • Linding R.
        Roles of “junk phosphorylation” in modulating biomolecular association of phosphorylated proteins?.
        Cell Cycle. 2010; 9: 1276-1280
        • Serber Z.
        • Ferrell J.E.
        Tuning bulk electrostatics to regulate protein function.
        Cell. 2007; 128: 441-444
        • Dinkel H.
        • Chica C.
        • Via A.
        • Gould C.M.
        • Jensen L.J.
        • Gibson T.J.
        • Diella F.
        Phospho.ELM: a database of phosphorylation sites–update 2011.
        Nucleic Acids Res. 2011; 39: D261-D267
        • Hornbeck P.V.
        • Kornhauser J.M.
        • Tkachev S.
        • Zhang B.
        • Skrzypek E.
        • Murray B.
        • Latham V.
        • Sullivan M.
        PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse.
        Nucleic Acids Res. 2012; 40: D261-D270
        • UniProt Consortium
        Reorganizing the protein space at the Universal Protein Resource (UniProt).
        Nucleic Acids Res. 2012; 40: D71-D75
        • Punta M.
        • Coggill P.C.
        • Eberhardt R.Y.
        • Mistry J.
        • Tate J.
        • Boursnell C.
        • Pang N.
        • Forslund K.
        • Ceric G.
        • Clements J.
        • Heger A.
        • Holm L.
        • Sonnhammer E.L.
        • Eddy S.R.
        • Bateman A.
        • Finn R.D.
        The Pfam protein families database.
        Nucleic Acids Res. 2012; 40: D290-D301
        • Petersen T.N.
        • Brunak S.
        • von Heijne G.
        • Henrik N.
        SignalP 4.0: discriminating signal peptides from transmembrane regions.
        Nature Methods. 2011; 8: 785-786
        • Hennerdal A.
        • Elofsson A.
        Rapid membrane protein topology prediction.
        Bioinformatics. 2011; 27: 1322-1323
        • Eddy S.R.
        A new generation of homology search tools based on probabilistic inference.
        Genome Inform. 2009; 23: 205-211
        • Vilella A.J.
        • Severin J.
        • Ureta-Vidal A.
        • Heng L.
        • Durbin R.
        • Birney E.
        EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates.
        Genome Res. 2009; 19: 327-335
        • Pettersen E.F.
        • Goddard T.D.
        • Huang C.C.
        • Couch G.S.
        • Greenblatt D.M.
        • Meng E.C.
        • Ferrin T.E.
        UCSF Chimera–a visualization system for exploratory research and analysis.
        J. Comput. Chem. 2004; 25: 1605-1612
        • Csardi G.
        • Nepusz T.
        The igraph software package for complex network research.
        InterJournal Complex Systems. 2006; : 1695
        • Bodenhofer U.
        • Kothmeier A.
        • Hochreiter S.
        APCluster: an R package for affinity propagation clustering.
        Bioinformatics. 2011; 27: 2463-2464
        • Frey B.J.
        • Dueck D.
        Clustering by passing messages between data points.
        Science. 2007; 315: 972-976
        • Schaller M.D.
        Paxillin: a focal adhesion-associated adaptor protein.
        Oncogene. 2001; 20: 6459-6472
        • Musselman C.A.
        • Lalonde M.E.
        • Côté J.
        • Kutateladze T.G.
        Perceiving the epigenetic landscape through histone readers.
        Nat. Struct. Mol. Biol. 2012; 19: 1218-1227
        • Singh R.K.
        • Gunjan A.
        Histone tyrosine phosphorylation comes of age.
        Epigenetics. 2011; 6: 153-160
        • Ju D.
        • Xu H.
        • Wang X.
        • Xie Y.
        Ubiquitin-mediated degradation of Rpn4 is controlled by a phosphorylation-dependent ubiquitylation signal.
        Biochim. Biophys. Acta. 2007; 1773: 1672-1680
        • Lin H.K.
        • Wang L.
        • Hu Y.C.
        • Altuwaijri S.
        • Chang C.
        Phosphorylation-dependent ubiquitylation and degradation of androgen receptor by Akt require Mdm2 E3 ligase.
        EMBO J. 2002; 21: 4037-4048
        • Welcker M.
        • Orian A.
        • Jin J.
        • Grim J.E.
        • Grim J.A.
        • Harper J.W.
        • Eisenman R.N.
        • Clurman B.E.
        The Fbw7 tumor suppressor regulates glycogen synthase kinase 3 phosphorylation-dependent c-Myc protein degradation.
        Proc. Natl. Acad. Sci. U.S.A. 2004; 101: 9085-9090
        • Yada M.
        • Hatakeyama S.
        • Kamura T.
        • Nishiyama M.
        • Tsunematsu R.
        • Imaki H.
        • Ishida N.
        • Okumura F.
        • Nakayama K.
        • Nakayama K.I.
        Phosphorylation-dependent degradation of c-Myc is mediated by the F-box protein Fbw7.
        EMBO J. 2004; 23: 2116-2125
        • la Cova, de C.
        • Greenwald I.
        SEL-10/Fbw7-dependent negative feedback regulation of LIN-45/Braf signaling in C. elegans via a conserved phosphodegron.
        Genes Dev. 2012; 26: 2524-2535
        • Feldman R.M.
        • Correll C.C.
        • Kaplan K.B.
        • Deshaies R.J.
        A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p.
        Cell. 1997; 91: 221-230
        • Jackson P.K.
        Ubiquitinating a phosphorylated Cdk inhibitor on the blades of the Cdc4 beta-propeller.
        Cell. 2003; 112: 142-144
        • Lyons N.A.
        • Fonslow B.R.
        • Diedrich J.K.
        • Yates J.R.
        • Morgan D.O.
        Sequential primed kinases create a damage-responsive phosphodegron on Eco1.
        Nat. Struct. Mol. Biol. 2013; 20: 194-201
        • Rossi M.
        • Duan S.
        • Jeong Y.T.
        • Horn M.
        • Saraf A.
        • Florens L.
        • Washburn M.P.
        • Antebi A.
        • Pagano M.
        Regulation of the CRL4(Cdt2) ubiquitin ligase and cell-cycle exit by the SCF(Fbxo11) ubiquitin ligase.
        Mol. Cell. 2013; 49: 1159-1166
        • Skowyra D.
        • Craig K.L.
        • Tyers M.
        • Elledge S.J.
        • Harper J.W.
        F-box proteins are receptors that recruit phosphorylated substrates to the SCF ubiquitin-ligase complex.
        Cell. 1997; 91: 209-219
        • Minguez P.
        • Parca L.
        • Diella F.
        • Mende D.R.
        • Kumar R.
        • Helmer-Citterich M.
        • Gavin A.C.
        • van Noort V.
        • Bork P.
        Deciphering a global network of functionally associated post-translational modifications.
        Mol. Syst. Biol. 2012; 8: 599
        • Bahl J.M.
        • Jensen S.S.
        • Larsen M.R.
        • Heegaard N.H.
        Characterization of the human cerebrospinal fluid phosphoproteome by titanium dioxide affinity chromatography and mass spectrometry.
        Anal. Chem. 2008; 80: 6308-6316
        • Carrascal M.
        • Gay M.
        • Ovelleiro D.
        • Casas V.
        • Gelpí E.
        • Abian J.
        Characterization of the human plasma phosphoproteome using linear ion trap mass spectrometry and multiple search engines.
        J. Proteome Res. 2010; 9: 876-884
        • Stone M.D.
        • Chen X.
        • McGowan T.
        • Bandhakavi S.
        • Cheng B.
        • Rhodus N.L.
        • Griffin T.J.
        Large-scale phosphoproteomics analysis of whole saliva reveals a distinct phosphorylation pattern.
        J. Proteome Res. 2011; 10: 1728-1736
        • Liu Y.
        • Tozeren A.
        Modular composition predicts kinase/substrate interactions.
        BMC Bioinformatics. 2010; 11: 349
        • Tan C.S.H.
        • Bodenmiller B.
        • Pasculescu A.
        • Jovanovic M.
        • Hengartner M.O.
        • Jørgensen C.
        • Bader G.D.
        • Aebersold R.
        • Pawson T.
        • Linding R.
        Comparative analysis reveals conserved protein phosphorylation networks implicated in multiple diseases.
        Sci. Signal. 2009; 2: ra39
        • Hanawa-Suetsugu K.
        • Kukimoto-Niino M.
        • Mishima-Tsumagari C.
        • Akasaka R.
        • Ohsawa N.
        • Sekine S.
        • Ito T.
        • Tochio N.
        • Koshiba S.
        • Kigawa T.
        • Terada T.
        • Shirouzu M.
        • Nishikimi A.
        • Uruno T.
        • Katakai T.
        • Kinashi T.
        • Kohda D.
        • Fukui Y.
        • Yokoyama S.
        Structural basis for mutual relief of the Rac guanine nucleotide exchange factor DOCK2 and its partner ELMO1 from their autoinhibited forms.
        Proc. Natl. Acad. Sci. U.S.A. 2012; 109: 3305-3310
        • Conticello S.G.
        • Thomas C.J.F.
        • Petersen-Mahrt S.K.
        • Neuberger M.S.
        Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases.
        Mol. Biol. Evol. 2005; 22: 367-377
        • Pham P.
        • Smolka M.B.
        • Calabrese P.
        • Landolph A.
        • Zhang K.
        • Zhou H.
        • Goodman M.F.
        Impact of phosphorylation and phosphorylation-null mutants on the activity and deamination specificity of activation-induced cytidine deaminase.
        J. Biol. Chem. 2008; 283: 17428-17439
        • Chen Z.
        • Eggerman T.L.
        • Patterson A.P.
        Phosphorylation is a regulatory mechanism in apolipoprotein B mRNA editing.
        Biochem. J. 2001; 357: 661-672
        • Prochnow C.
        • Bransteitter R.
        • Klein M.G.
        • Goodman M.F.
        • Chen X.S.
        The APOBEC-2 crystal structure and functional implications for the deaminase AID.
        Nature. 2007; 445: 447-451
        • Comb W.C.
        • Hutti J.E.
        • Cogswell P.
        • Cantley L.C.
        • Baldwin A.S.
        p85α SH2 domain phosphorylation by IKK promotes feedback inhibition of PI3K and Akt in response to cellular starvation.
        Mol. Cell. 2012; 45: 719-730
        • Couture C.
        • Songyang Z.
        • Jascur T.
        • Williams S.
        • Tailor P.
        • Cantley L.C.
        • Mustelin T.
        Regulation of the Lck SH2 domain by tyrosine phosphorylation.
        J. Biol. Chem. 1996; 271: 24880-24884
        • Huang H.
        • Li L.
        • Wu C.
        • Schibli D.
        • Colwill K.
        • Ma S.
        • Li C.
        • Roy P.
        • Ho K.
        • Songyang Z.
        • Pawson T.
        • Gao Y.
        • Li S.S.-C.
        Defining the specificity space of the human SRC homology 2 domain.
        Mol. Cell. Proteomics. 2008; 7: 768-784
        • Kaneko T.
        • Huang H.
        • Zhao B.
        • Li L.
        • Liu H.
        • Voss C.K.
        • Wu C.
        • Schiller M.R.
        • Li S.S.C.
        Loops govern SH2 domain specificity by controlling access to binding pockets.
        Sci. Signal. 2010; 3: ra34
        • Stover D.R.
        • Furet P.
        • Lydon N.B.
        Modulation of the SH2 binding specificity and kinase activity of Src by tyrosine phosphorylation within its SH2 domain.
        J. Biol. Chem. 1996; 271: 12481-12487
        • Kulshina N.
        • Edwards T.E.
        • Ferré-D'Amaré A.R.
        Thermodynamic analysis of ligand binding and ligand binding-induced tertiary structure formation by the thiamine pyrophosphate riboswitch.
        RNA. 2010; 16: 186-196
        • Malik R.
        • Nigg E.A.
        • Körner R.
        Comparative conservation analysis of the human mitotic phosphoproteome.
        Bioinformatics. 2008; 24: 1426-1432
        • Jiménez J.L.
        • Hegemann B.
        • Hutchins J.R.
        • Peters J.M.
        • Durbin R.
        A systematic comparative and structural analysis of protein phosphorylation sites based on the mtcPTM database.
        Genome Biol. 2007; 8: R90