Advertisement

GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy

  • Author Footnotes
    § Both authors contributed equally to this work.
    Yu Xue
    Footnotes
    § Both authors contributed equally to this work.
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
    Search for articles by this author
  • Author Footnotes
    § Both authors contributed equally to this work.
    Jian Ren
    Footnotes
    § Both authors contributed equally to this work.
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
    Search for articles by this author
  • Xinjiao Gao
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
    Search for articles by this author
  • Changjiang Jin
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
    Search for articles by this author
  • Longping Wen
    Correspondence
    To whom correspondence may be addressed. Tel.: 86-551-3600051; Fax: 86-551-3600426
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China
    Search for articles by this author
  • Author Footnotes
    ¶ A Georgia Cancer Coalition Eminent Scholar.
    Xuebiao Yao
    Correspondence
    To whom correspondence may be addressed. Tel.: 86-551-3606304; Fax: 86-551-3607141
    Footnotes
    ¶ A Georgia Cancer Coalition Eminent Scholar.
    Affiliations
    Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China

    Department of Physiology and Cancer Biology Program, Morehouse School of Medicine, Atlanta, Georgia 30310
    Search for articles by this author
  • Author Footnotes
    § Both authors contributed equally to this work.
    ¶ A Georgia Cancer Coalition Eminent Scholar.
Open AccessPublished:May 06, 2008DOI:https://doi.org/10.1074/mcp.M700574-MCP200
      Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes. Although nearly 10 kinase-specific prediction programs have been developed, numerous PKs have been casually classified into subgroups without a standard rule. For large scale predictions, the false positive rate has also never been addressed. In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. In addition, we developed a simple approach to estimate the theoretically maximal false positive rates. The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for 408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by GPS 2.0 was exhibited with great performance and remarkable accuracy. Using Aurora-B as an example, we also conducted a proteome-wide search and provided systematic prediction of Aurora-B-specific substrates including protein-protein interaction information. Thus, the GPS 2.0 is a useful tool for predicting protein phosphorylation sites and their cognate kinases and is freely available on line.
      Post-translational modification of proteins provides reversible means to regulate the function of a protein in space and time. Recently computational studies of post-translational modifications (PTMs)
      The abbreviations used are: PTM, post-translational modification; PK, protein kinase; FPR, false positive rate; GPS, Group-based Prediction System; Sn, sensitivity; Sp, specificity; Pr, precision; LOO, leave-one-out validation; PSP, phosphorylation site peptide; PKA, protein kinase A; PKB, protein kinase B; BLAST, Basic Local Alignment Search Tool; CDK, cycle-dependent kinase; MAPK, mitogen-activated protein kinase; AUR, Aurora; GRK, G-protein-coupled receptor kinase; CaMK, Ca2+/calmodulin-dependent protein kinase; TK, tyrosine kinase; PIKK, phosphoinositide 3-kinase-related kinase; ATM, ataxia telangiectasia mutated; PEK, pancreatic eukaryotic initiation factor-2a kinase; sub., substrate; PPI, protein-protein interaction; OS, operating system; TP, true positive; TN, true negative; FP, false positive; FN, false negative; PPSP, prediction of PK-specific phosphorylation; AGC, protein kinase A, G and C family; CGMC, CDKs, G-SKs, MAPKs and CLKs kinase family.
      1The abbreviations used are: PTM, post-translational modification; PK, protein kinase; FPR, false positive rate; GPS, Group-based Prediction System; Sn, sensitivity; Sp, specificity; Pr, precision; LOO, leave-one-out validation; PSP, phosphorylation site peptide; PKA, protein kinase A; PKB, protein kinase B; BLAST, Basic Local Alignment Search Tool; CDK, cycle-dependent kinase; MAPK, mitogen-activated protein kinase; AUR, Aurora; GRK, G-protein-coupled receptor kinase; CaMK, Ca2+/calmodulin-dependent protein kinase; TK, tyrosine kinase; PIKK, phosphoinositide 3-kinase-related kinase; ATM, ataxia telangiectasia mutated; PEK, pancreatic eukaryotic initiation factor-2a kinase; sub., substrate; PPI, protein-protein interaction; OS, operating system; TP, true positive; TN, true negative; FP, false positive; FN, false negative; PPSP, prediction of PK-specific phosphorylation; AGC, protein kinase A, G and C family; CGMC, CDKs, G-SKs, MAPKs and CLKs kinase family.
      of proteins have attracted much attention. Various PTMs regulate the functions and dynamics of proteins through specific modifications and are implicated in almost all cellular processes. In contrast to the labor-intensive and expensive experimental methods, in silico prediction of PTM-specific substrates with their sites has emerged as a popular alternative approach. To date, more than 32 computational prediction tools have been developed (
      • Zhou F.F.
      • Xue Y.
      • Yao X.
      • Xu Y.
      A general user interface for prediction servers of proteins' post-translational modification sites.
      ).
      In the field of computational PTMs, protein phosphorylation is the most studied example. To predict general phosphorylation sites, several tools have been developed, such as DISPHOS (
      • Iakoucheva L.M.
      • Radivojac P.
      • Brown C.J.
      • O'Connor T.R.
      • Sikes J.G.
      • Obradovic Z.
      • Dunker A.K.
      The importance of intrinsic disorder for protein phosphorylation.
      ), NetPhos (
      • Blom N.
      • Gammeltoft S.
      • Brunak S.
      Sequence and structure-based prediction of eukaryotic protein phosphorylation sites.
      ), NetPhosYeast (
      • Ingrell C.R.
      • Miller M.L.
      • Jensen O.N.
      • Blom N.
      NetPhosYeast: prediction of protein phosphorylation sites in yeast.
      ), and GANNPhos (
      • Tang Y.R.
      • Chen Y.Z.
      • Canchaya C.A.
      • Zhang Z.
      GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network.
      ). As the need for performing large scale predictions and constructing reliable phosphorylation networks evolves, robust prediction of kinase-specific phosphorylation sites has become necessary and challenging. For example, Neuberger et al. (
      • Neuberger G.
      • Schneider G.
      • Eisenhaber F.
      pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
      ) used pkaPS to predict potential protein kinase A (PKA) sites in the human proteome directly. With Predikin, Brinkworth et al. (
      • Brinkworth R.I.
      • Munn A.L.
      • Kobe B.
      Protein kinases associated with the yeast phosphoproteome.
      ) predicted cognate PKs for 383 unannotated phosphorylation sites of 216 peptide sequences in yeast. Chang et al. (
      • Chang E.J.
      • Begum R.
      • Chait B.T.
      • Gaasterland T.
      Prediction of cyclin-dependent kinase phosphorylation substrates.
      ) predicted 91 highly probable CDK substrates in budding yeast using the position-specific scoring matrix motif approach. Recently Linding et al. (
      • Linding R.
      • Jensen L.J.
      • Ostheimer G.J.
      • van Vugt M.A.
      • Jorgensen C.
      • Miron I.M.
      • Diella F.
      • Colwill K.
      • Taylor L.
      • Elder K.
      • Metalnikov P.
      • Nguyen V.
      • Pasculescu A.
      • Jin J.
      • Park J.G.
      • Samson L.D.
      • Woodgett J.R.
      • Russell R.B.
      • Bork P.
      • Yaffe M.B.
      • Pawson T.
      Systematic discovery of in vivo phosphorylation networks.
      ) developed NetworKIN and constructed a human phosphorylation network, which has gained diversified interest not only for human phosphorylation network prediction but also for general implication in cell biology. To predict kinase-specific phosphorylation sites, several on-line Web services have been implemented using various algorithms, including our previous work of GPS (
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ) and PPSP (
      • Xue Y.
      • Li A.
      • Wang L.
      • Feng H.
      • Yao X.
      PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory.
      ), NetPhosK (
      • Blom N.
      • Sicheritz-Ponten T.
      • Gupta R.
      • Gammeltoft S.
      • Brunak S.
      Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
      ), ScanSite (
      • Obenauer J.C.
      • Cantley L.C.
      • Yaffe M.B.
      Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs.
      ), KinasePhos (
      • Huang H.D.
      • Lee T.Y.
      • Tzeng S.W.
      • Horng J.T.
      KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
      ,
      • Wong Y.H.
      • Lee T.Y.
      • Liang H.K.
      • Huang C.M.
      • Wang T.Y.
      • Yang Y.H.
      • Chu C.H.
      • Huang H.D.
      • Ko M.T.
      • Hwang J.K.
      KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
      ), PredPhospho (
      • Kim J.H.
      • Lee J.
      • Oh B.
      • Kimm K.
      • Koh I.
      Prediction of phosphorylation sites using SVMs.
      ), Predikin (
      • Brinkworth R.I.
      • Breinl R.A.
      • Kobe B.
      Structural basis and prediction of substrate specificity in protein serine/threonine kinases.
      ), PhoScan (
      • Li T.
      • Li F.
      • Zhang X.
      Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach.
      ), pkaPS (
      • Neuberger G.
      • Schneider G.
      • Eisenhaber F.
      pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
      ), etc.
      Although ∼10 predictors are already available, two essential issues have remained elusive. In the previous work, there was no standard rule for protein kinase (PK) classification. We and others clustered PKs into subgroups casually by sequence similarity from BLAST results (
      • Linding R.
      • Jensen L.J.
      • Ostheimer G.J.
      • van Vugt M.A.
      • Jorgensen C.
      • Miron I.M.
      • Diella F.
      • Colwill K.
      • Taylor L.
      • Elder K.
      • Metalnikov P.
      • Nguyen V.
      • Pasculescu A.
      • Jin J.
      • Park J.G.
      • Samson L.D.
      • Woodgett J.R.
      • Russell R.B.
      • Bork P.
      • Yaffe M.B.
      • Pawson T.
      Systematic discovery of in vivo phosphorylation networks.
      ,
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ,
      • Xue Y.
      • Li A.
      • Wang L.
      • Feng H.
      • Yao X.
      PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory.
      ,
      • Blom N.
      • Sicheritz-Ponten T.
      • Gupta R.
      • Gammeltoft S.
      • Brunak S.
      Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
      ,
      • Huang H.D.
      • Lee T.Y.
      • Tzeng S.W.
      • Horng J.T.
      KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
      ,
      • Wong Y.H.
      • Lee T.Y.
      • Liang H.K.
      • Huang C.M.
      • Wang T.Y.
      • Yang Y.H.
      • Chu C.H.
      • Huang H.D.
      • Ko M.T.
      • Hwang J.K.
      KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
      ,
      • Kim J.H.
      • Lee J.
      • Oh B.
      • Kimm K.
      • Koh I.
      Prediction of phosphorylation sites using SVMs.
      ,
      • Li T.
      • Li F.
      • Zhang X.
      Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach.
      ). The thresholds of PK classification varied in the previous publications, and the final PK subgroups were also quite different. Another issue is control of false positive rate (FPR) for large scale predictions. Usually the bona fide phosphorylation sites are only a small proportion of total Ser/Thr or Tyr residues present within a protein sequence. Thus, many false positive hits in the total prediction results could be generated even for a small FPR.
      In this work, we refined the GPS software (Group-based Prediction System, version 2.0) for predicting kinase-specific phosphorylation sites in hierarchy. We adopted a PK classification established by Manning et al. (
      • Manning G.
      • Whyte D.B.
      • Martinez R.
      • Hunter T.
      • Sudarsanam S.
      The protein kinase complement of the human genome.
      ) as the standard rule to cluster the human PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. The training data were taken from Phospho.ELM 6.0 (
      • Diella F.
      • Cameron S.
      • Gemund C.
      • Linding R.
      • Via A.
      • Kuster B.
      • Sicheritz-Ponten T.
      • Blom N.
      • Gibson T.J.
      Phospho. ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.
      ), and the modified version of the Group-based Phosphorylation Scoring algorithm (
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ) was used. Also we defined a simple rule to calculate the theoretically maximal FPRs. Three cutoffs of high, medium, and low thresholds were established with FPRs of 2, 6, and 10% for serine/threonine kinases and 4, 9, and 15% for tyrosine kinases, respectively. The performance and robustness of the prediction system were extensively evaluated by self-consistency, leave-one-out validation, and 4-, 6-, 8-, and 10-fold cross-validations. Compared with other existing tools, GPS 2.0 carries a greater computational power with superior performance. The on-line Web server version and local packages of GPS 2.0 were implemented in Java and can predict kinase-specific phosphorylation sites for 408 PKs in human. Moreover we used GPS 2.0 to conduct a large scale prediction of more than 13,000 mammalian phosphorylation sites in which GPS 2.0 exhibited remarkable performance. Finally we demonstrated the accuracy of GPS 2.0 prediction based on a proteome-wide search for Aurora-B cognate substrates. Taken together, GPS 2.0 offers greater precision and computing power on predicting protein phosphorylation and enzyme-substrate relationship.

      DISCUSSION

      In this work, we refined our previous established protein phosphorylation predication program GPS 1.10 (Group-based Phosphorylation Scoring) into a higher version, 2.0. In addition, the software was renamed as Group-based Prediction System because numerous PKs were clustered into a hierarchical structure with four levels, including group, family, subfamily, and single PK (
      • Manning G.
      • Whyte D.B.
      • Martinez R.
      • Hunter T.
      • Sudarsanam S.
      The protein kinase complement of the human genome.
      ). Then the on-line server and local packages of GPS 2.0 were implemented in Java with a modified version of the Group-based Phosphorylation Scoring algorithm (
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ). The GPS 2.0 Web server was tested on several Internet browsers, including Internet Explorer 6.0, Netscape Browser 8.1.3, and Firefox 2 under the Windows XP operating system (OS), Mozilla Firefox 1.5 of Fedora Core 6 OS (Linux), and Safari 3.0 of Apple Mac OS X 10.4 (Tiger) and 10.5 (Leopard). For Windows and Linux systems, the latest version of the Java Runtime Environment (JRE) package (Java 1.4.2 or later versions) of Sun Microsystems should be preinstalled for using GPS 2.0 program. However, for Mac OS, GPS 2.0 could be directly used without any additional packages. Furthermore users could directly install the local packages of GPS 2.0 on their own computers. Again the local packages of GPS 2.0 support three major OSs, including Windows, Unix/Linux, and Mac.
      The performance and robustness of the prediction system were extensively evaluated by self-consistency, leave-one-out validation, and 4-, 6-, 8-, and 10-fold cross-validations. Then we compared the prediction performances of GPS 2.0 with several other existing tools, including ScanSite (
      • Obenauer J.C.
      • Cantley L.C.
      • Yaffe M.B.
      Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs.
      ), KinasePhos (1.0 and 2.0) (
      • Huang H.D.
      • Lee T.Y.
      • Tzeng S.W.
      • Horng J.T.
      KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
      ,
      • Wong Y.H.
      • Lee T.Y.
      • Liang H.K.
      • Huang C.M.
      • Wang T.Y.
      • Yang Y.H.
      • Chu C.H.
      • Huang H.D.
      • Ko M.T.
      • Hwang J.K.
      KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
      ), NetPhosK (
      • Blom N.
      • Sicheritz-Ponten T.
      • Gupta R.
      • Gammeltoft S.
      • Brunak S.
      Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
      ), and pkaPS (
      • Neuberger G.
      • Schneider G.
      • Eisenhaber F.
      pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
      ). ScanSite constructs a position-specific scoring matrix for each kinase based on its known phosphorylation sites (
      • Obenauer J.C.
      • Cantley L.C.
      • Yaffe M.B.
      Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs.
      ). And KinasePhos 1.0 uses a maximal dependence decomposition strategy and constructs a profile hidden Markov model for each kinase (
      • Huang H.D.
      • Lee T.Y.
      • Tzeng S.W.
      • Horng J.T.
      KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
      ), whereas KinasePhos 2.0 retrieves the coupling patterns (XdZ where amino acid types X and Z are separated by d amino acids) from the known phosphorylation sites and uses the Support Vector Machines algorithm to train the model (
      • Wong Y.H.
      • Lee T.Y.
      • Liang H.K.
      • Huang C.M.
      • Wang T.Y.
      • Yang Y.H.
      • Chu C.H.
      • Huang H.D.
      • Ko M.T.
      • Hwang J.K.
      KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
      ). Also NetPhosK uses an artificial neural network method for training (
      • Blom N.
      • Sicheritz-Ponten T.
      • Gupta R.
      • Gammeltoft S.
      • Brunak S.
      Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
      ). These tools first retrieve the information from each position flanking the modified residue (Ser/Thr or Tyr). A hidden hypothesis in their model is that the information/function/evolution of each position is independent from its nearby residues. However, the information/function/evolution of each position is not entirely independent. GPS 1.0 and 1.10 (
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ), GPS 2.0, and pkaPS (
      • Neuberger G.
      • Schneider G.
      • Eisenhaber F.
      pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
      ) hypothesize that if two PSPs share high sequence homology they may also bear similar three-dimensional structures and biological functions. Thus, the information of the PSPs was considered rather than single positions. In this regard, the methods used in GPS 1.0 and 1.10 (
      • Xue Y.
      • Zhou F.
      • Zhu M.
      • Ahmed K.
      • Chen G.
      • Yao X.
      GPS: a comprehensive www server for phosphorylation sites prediction.
      ,
      • Zhou F.F.
      • Xue Y.
      • Chen G.L.
      • Yao X.
      GPS: a novel group-based phosphorylation predicting and scoring method.
      ), GPS 2.0, and pkaPS (
      • Neuberger G.
      • Schneider G.
      • Eisenhaber F.
      pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
      ) will be superior to other strategies. Also the prediction performances will be enhanced with a larger training data set. And the training data set of GPS 2.0 was much larger than that for the other tools. Furthermore we noticed that the prediction performances based on different amino acids matrices were not identical. The BLOSUM62 and other matrices are optimized to evaluate the similarity between homologous proteins but may not be optimized for the similarity of two PSPs. To find an optimal or near-optimal matrix for each PK group to improve the system stability without influencing the prediction performance significantly, we developed a simple method to automatically mutate BLOSUM62 into a near-optimal matrix for each PK group. The prediction performances of GPS 2.0 were further improved by this approach. By comparison, the method of GPS 2.0 was better or at least comparable with previous approaches on several well studied PKs. However, GPS 2.0 could predict kinase-specific phosphorylation sites for 408 human PKs, demonstrating a great comprehensive capacity and computational power.
      Previously control and calculation of FPR were never addressed. Here we developed a simple approach to estimate the theoretically maximal FPR for each PK cluster. We also defined the Pr factor to estimate the proportion of real phosphorylation sites in predicted results. Previously the precision was defined as TP/(TP + FP) (
      • Huang H.D.
      • Lee T.Y.
      • Tzeng S.W.
      • Horng J.T.
      KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
      ). However, the TP is usually unknown when an unknown data set is used for prediction. Thus, a hidden hypothesis for such a precision is that the ratio of calculated TP:FP is not changed in any given data set. The precision will be precalculated based on the training data set. However, when the composition of a given data set is changed and different from the training data set, such a precision will not be useful and valid any more. In this regard, the Pr value should be flexible and reflect the enrichment of substrates of the subject kinase in any given data sets. Given a data set for prediction (N sites), if all of the sites were true negative sites, we can easily calculate the theoretically maximal false positive hits as N × FPR. Then Pr value could be calculated by (M − (N × FPR))/M where M is the total predicted hits. Because there might be real phosphorylation sites contained in the data set, our approach will underestimated the real precision.
      As an application to depict the computational power, we performed a large scale prediction of more than 13,000 phosphorylation sites in mammals with high precisions. The high threshold was chosen with an FPR of 2% for serine/threonine kinases and 4% for tyrosine kinases. In addition, we provided a proteome-wide prediction for Aurora-B-specific substrates including protein-protein interaction information. As the first stand alone software for computational phosphorylation, GPS 2.0 will accelerate experimentation for delineating a kinase-coupled phosphoregulatory network and pathways underlying cellular plasticity and dynamics.

      Acknowledgments

      We thank the anonymous reviewer, whose suggestion has greatly improved the presentation of this manuscript.

      REFERENCES

        • Zhou F.F.
        • Xue Y.
        • Yao X.
        • Xu Y.
        A general user interface for prediction servers of proteins' post-translational modification sites.
        Nat. Protocol. 2006; 1: 1318-1321
        • Iakoucheva L.M.
        • Radivojac P.
        • Brown C.J.
        • O'Connor T.R.
        • Sikes J.G.
        • Obradovic Z.
        • Dunker A.K.
        The importance of intrinsic disorder for protein phosphorylation.
        Nucleic Acids Res. 2004; 32: 1037-1049
        • Blom N.
        • Gammeltoft S.
        • Brunak S.
        Sequence and structure-based prediction of eukaryotic protein phosphorylation sites.
        J. Mol. Biol. 1999; 294: 1351-1362
        • Ingrell C.R.
        • Miller M.L.
        • Jensen O.N.
        • Blom N.
        NetPhosYeast: prediction of protein phosphorylation sites in yeast.
        Bioinformatics. 2007; 23: 895-897
        • Tang Y.R.
        • Chen Y.Z.
        • Canchaya C.A.
        • Zhang Z.
        GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network.
        Protein Eng. Des. Sel. 2007; 20: 405-412
        • Neuberger G.
        • Schneider G.
        • Eisenhaber F.
        pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
        Biol. Direct. 2007; 2: 1
        • Brinkworth R.I.
        • Munn A.L.
        • Kobe B.
        Protein kinases associated with the yeast phosphoproteome.
        BMC Bioinformatics. 2006; 7: 47
        • Chang E.J.
        • Begum R.
        • Chait B.T.
        • Gaasterland T.
        Prediction of cyclin-dependent kinase phosphorylation substrates.
        PLoS ONE. 2007; 2: e656
        • Linding R.
        • Jensen L.J.
        • Ostheimer G.J.
        • van Vugt M.A.
        • Jorgensen C.
        • Miron I.M.
        • Diella F.
        • Colwill K.
        • Taylor L.
        • Elder K.
        • Metalnikov P.
        • Nguyen V.
        • Pasculescu A.
        • Jin J.
        • Park J.G.
        • Samson L.D.
        • Woodgett J.R.
        • Russell R.B.
        • Bork P.
        • Yaffe M.B.
        • Pawson T.
        Systematic discovery of in vivo phosphorylation networks.
        Cell. 2007; 129: 1415-1426
        • Xue Y.
        • Zhou F.
        • Zhu M.
        • Ahmed K.
        • Chen G.
        • Yao X.
        GPS: a comprehensive www server for phosphorylation sites prediction.
        Nucleic Acids Res. 2005; 33: W184-W187
        • Zhou F.F.
        • Xue Y.
        • Chen G.L.
        • Yao X.
        GPS: a novel group-based phosphorylation predicting and scoring method.
        Biochem. Biophys. Res. Commun. 2004; 325: 1443-1448
        • Xue Y.
        • Li A.
        • Wang L.
        • Feng H.
        • Yao X.
        PPSP: prediction of PK-specific phosphorylation site with Bayesian decision theory.
        BMC Bioinformatics. 2006; 7: 163
        • Blom N.
        • Sicheritz-Ponten T.
        • Gupta R.
        • Gammeltoft S.
        • Brunak S.
        Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence.
        Proteomics. 2004; 4: 1633-1649
        • Obenauer J.C.
        • Cantley L.C.
        • Yaffe M.B.
        Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs.
        Nucleic Acids Res. 2003; 31: 3635-3641
        • Huang H.D.
        • Lee T.Y.
        • Tzeng S.W.
        • Horng J.T.
        KinasePhos: a web tool for identifying protein kinase-specific phosphorylation sites.
        Nucleic Acids Res. 2005; 33: W226-W229
        • Wong Y.H.
        • Lee T.Y.
        • Liang H.K.
        • Huang C.M.
        • Wang T.Y.
        • Yang Y.H.
        • Chu C.H.
        • Huang H.D.
        • Ko M.T.
        • Hwang J.K.
        KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns.
        Nucleic Acids Res. 2007; 35: W588-W594
        • Kim J.H.
        • Lee J.
        • Oh B.
        • Kimm K.
        • Koh I.
        Prediction of phosphorylation sites using SVMs.
        Bioinformatics. 2004; 20: 3179-3184
        • Brinkworth R.I.
        • Breinl R.A.
        • Kobe B.
        Structural basis and prediction of substrate specificity in protein serine/threonine kinases.
        Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 74-79
        • Li T.
        • Li F.
        • Zhang X.
        Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach.
        Proteins. 2008; 70: 404-414
        • Manning G.
        • Whyte D.B.
        • Martinez R.
        • Hunter T.
        • Sudarsanam S.
        The protein kinase complement of the human genome.
        Science. 2002; 298: 1912-1934
        • Diella F.
        • Cameron S.
        • Gemund C.
        • Linding R.
        • Via A.
        • Kuster B.
        • Sicheritz-Ponten T.
        • Blom N.
        • Gibson T.J.
        Phospho. ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins.
        BMC Bioinformatics. 2004; 5: 79
        • Caenepeel S.
        • Charydczak G.
        • Sudarsanam S.
        • Hunter T.
        • Manning G.
        The mouse kinome: discovery and comparative genomics of all mouse protein kinases.
        Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 11707-11712
        • Puntervoll P.
        • Linding R.
        • Gemund C.
        • Chabanis-Davidson S.
        • Mattingsdal M.
        • Cameron S.
        • Martin D.M.
        • Ausiello G.
        • Brannetti B.
        • Costantini A.
        • Ferreè F.
        • Maselli V.
        • Via A.
        • Cesareni G.
        • Diella F.
        • Superti-Furga G.
        • Wyrwicz L.
        • Ramu C.
        • McGuigan C.
        • Gudavalli R.
        • Letunic I.
        • Bork P.
        • Rychlewski L.
        • Kuüster B.
        • Helmer-Citterich M.
        • Hunter W.N.
        • Aasland R.
        • Gibson T.J.
        ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins.
        Nucleic Acids Res. 2003; 31: 3625-3630
        • Gorbsky G.J.
        Mitosis: MCAK under the aura of Aurora B.
        Curr. Biol. 2004; 14: R346-R348
        • Lan W.
        • Zhang X.
        • Kline-Smith S.L.
        • Rosasco S.E.
        • Barrett-Wilt G.A.
        • Shabanowitz J.
        • Hunt D.F.
        • Walczak C.E.
        • Stukenberg P.T.
        Aurora B phosphorylates centromeric MCAK and regulates its localization and microtubule depolymerization activity.
        Curr. Biol. 2004; 14: 273-286
        • Honda R.
        • Korner R.
        • Nigg E.A.
        Exploring the functional interactions between Aurora B, INCENP, and survivin in mitosis.
        Mol. Biol. Cell. 2003; 14: 3325-3341
        • Kawajiri A.
        • Yasui Y.
        • Goto H.
        • Tatsuka M.
        • Takahashi M.
        • Nagata K.
        • Inagaki M.
        Functional significance of the specific sites phosphorylated in desmin at cleavage furrow: Aurora-B may phosphorylate and regulate type III intermediate filaments during cytokinesis coordinatedly with Rho-kinase.
        Mol. Biol. Cell. 2003; 14: 1489-1500
        • Biondi R.M.
        • Nebreda A.R.
        Signalling specificity of Ser/Thr protein kinases through docking-site-mediated interactions.
        Biochem. J. 2003; 372: 1-13
        • Holland P.M.
        • Cooper J.A.
        Protein modification: docking sites for kinases.
        Curr. Biol. 1999; 9: R329-R331
        • Yaffe M.B.
        • Leparc G.G.
        • Lai J.
        • Obata T.
        • Volinia S.
        • Cantley L.C.
        A motif-based profile scanning approach for genome-wide prediction of signaling pathways.
        Nat. Biotechnol. 2001; 19: 348-353
        • Salwinski L.
        • Miller C.S.
        • Smith A.J.
        • Pettit F.K.
        • Bowie J.U.
        • Eisenberg D.
        The Database of Interacting Proteins: 2004 update.
        Nucleic Acids Res. 2004; 32: D449-D451
        • Stark C.
        • Breitkreutz B.J.
        • Reguly T.
        • Boucher L.
        • Breitkreutz A.
        • Tyers M.
        BioGRID: a general repository for interaction datasets.
        Nucleic Acids Res. 2006; 34: D535-D539
        • Zanzoni A.
        • Montecchi-Palazzi L.
        • Quondam M.
        • Ausiello G.
        • Helmer-Citterich M.
        • Cesareni G.
        MINT: a Molecular INTeraction database.
        FEBS Lett. 2002; 513: 135-140
        • Alfarano C.
        • Andrade C.E.
        • Anthony K.
        • Bahroos N.
        • Bajec M.
        • Bantoft K.
        • Betel D.
        • Bobechko B.
        • Boutilier K.
        • Burgess E.
        • Buzadzija K.
        • Cavero R.
        • D'Abreo C.
        • Donaldson I.
        • Dorairajoo D.
        • Dumontier M.J.
        • Dumontier M.R.
        • Earles V.
        • Farrall R.
        • Feldman H.
        • Garderman E.
        • Gong Y.
        • Gonzaga R.
        • Grytsan V.
        • Gryz E.
        • Gu V.
        • Haldorsen E.
        • Halupa A.
        • Haw R.
        • Hrvojic A.
        • Hurrell L.
        • Isserlin R.
        • Jack F.
        • Juma F.
        • Khan A.
        • Kon T.
        • Konopinsky S.
        • Le V.
        • Lee E.
        • Ling S.
        • Magidin M.
        • Moniakis J.
        • Montojo J.
        • Moore S.
        • Muskat B.
        • Ng I.
        • Paraiso J.P.
        • Parker B.
        • Pintilie G.
        • Pirone R.
        • Salama J.J.
        • Sgro S.
        • Shan T.
        • Shu Y.
        • Siew J.
        • Skinner D.
        • Snyder K.
        • Stasiuk R.
        • Strumpf D.
        • Tuekam B.
        • Tao S.
        • Wang Z.
        • White M.
        • Willis R.
        • Wolting C.
        • Wong S.
        • Wrong A.
        • Xin C.
        • Yao R.
        • Yates B.
        • Zhang S.
        • Zheng K.
        • Pawson T.
        • Ouellette B.F.
        • Hogue C.W.
        The Biomolecular Interaction Network Database and related tools 2005 update.
        Nucleic Acids Res. 2005; 33: D418-D424
        • Mishra G.R.
        • Suresh M.
        • Kumaran K.
        • Kannabiran N.
        • Suresh S.
        • Bala P.
        • Shivakumar K.
        • Anuradha N.
        • Reddy R.
        • Raghavan T.M.
        • Menon S.
        • Hanumanthu G.
        • Gupta M.
        • Upendran S.
        • Gupta S.
        • Mahesh M.
        • Jacob B.
        • Mathew P.
        • Chatterjee P.
        • Arun K.S.
        • Sharma S.
        • Chandrika K.N.
        • Deshpande N.
        • Palvankar K.
        • Raghavnath R.
        • Krishnakanth R.
        • Karathia H.
        • Rekha B.
        • Nayak R.
        • Vishnupriya G.
        • Kumar H.G.
        • Nagini M.
        • Kumar G.S.
        • Jose R.
        • Deepthi P.
        • Mohan S.S.
        • Gandhi T.K.
        • Harsha H.C.
        • Deshpande K.S.
        • Sarker M.
        • Prasad T.S.
        • Pandey A.
        Human protein reference database—2006 update.
        Nucleic Acids Res. 2006; 34: D411-D414
        • von Mering C.
        • Jensen L.J.
        • Snel B.
        • Hooper S.D.
        • Krupp M.
        • Foglierini M.
        • Jouffre N.
        • Huynen M.A.
        • Bork P.
        STRING: known and predicted protein-protein associations, integrated and transferred across organisms.
        Nucleic Acids Res. 2005; 33: D433-D437
        • Mollinari C.
        • Reynaud C.
        • Martineau-Thuillier S.
        • Monier S.
        • Kieffer S.
        • Garin J.
        • Andreassen P.R.
        • Boulet A.
        • Goud B.
        • Kleman J.P.
        • Margolis R.L.
        The mammalian passenger protein TD-60 is an RCC1 family member with an essential role in prometaphase to metaphase progression.
        Dev. Cell. 2003; 5: 295-307
        • Obuse C.
        • Iwasaki O.
        • Kiyomitsu T.
        • Goshima G.
        • Toyoda Y.
        • Yanagida M.
        A conserved Mis12 centromere complex is linked to heterochromatic HP1 and outer kinetochore protein Zwint-1.
        Nat. Cell Biol. 2004; 6: 1135-1141
        • Arnaud L.
        • Pines J.
        • Nigg E.A.
        GFP tagging reveals human Polo-like kinase 1 at the kinetochore/centromere region of mitotic chromosomes.
        Chromosoma. 1998; 107: 424-429