Predicting Protein Post-translational Modifications Using Meta-analysis of Proteome Scale Data Sets*S

  1. Daniel Schwartz§,
  2. Michael F. Chou and
  3. George M. Church
  1. From the Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115
  1. §To whom correspondence should be addressed: Dept. of Genetics, New Research Bldg., Rm. 238, Harvard Medical School, 77 Ave. Louis Pasteur, Boston, MA 02115. Tel.: 617-432-6510; Fax: 617-432-6513; E-mail: dschwartz{at}genetics.med.harvard.edu

Abstract

Protein post-translational modifications are an important biological regulatory mechanism, and the rate of their discovery using high throughput techniques is rapidly increasingly. To make use of this wealth of sequence data, we introduce a new general strategy designed to predict a variety of post-translational modifications in several organisms. We used the motif-x program to determine phosphorylation motifs in yeast, fly, mouse, and man and lysine acetylation motifs in man. These motifs were then scanned against proteomic sequence data using a newly developed tool called scan-x to globally predict other potential modification sites within these organisms. 10-fold cross-validation was used to determine the sensitivity and minimum specificity for each set of predictions, all of which showed improvement over other available tools for phosphoprediction. New motif discovery is a byproduct of this approach, and the phosphorylation motif analyses provide strong evidence of evolutionary conservation of both known and novel kinase motifs.

Footnotes

  • Published, MCP Papers in Press, October 28, 2008, DOI 10.1074/mcp.M800332-MCP200

  • 1 The abbreviations used are: PTM, post-translational modification; PKA, protein kinase A; PWM, position weight matrix; CK II, casein kinase II; CK I, casein kinase I; TP, true positive; FP, false positive; TN, true negative; FN, false negative; ROC, receiver operating characteristic; PSSM, position-specific scoring matrix; RalBP1, RalA-binding protein 1; MAPK, mitogen-activated protein kinase; CDK, cyclin-dependent kinase; SGD, Saccharomyces genome database; IPI, International Protein Index.

  • 2 M. F. Chou, D. Schwartz, and G. M. Church, manuscript in preparation.

  • 3 It is impossible at this time to truly determine actual negative data sets, and we have tried a number of approaches to this problem, but ultimately any method is an overestimate of the number of actual negatives. This is readily apparent based upon the fact that each new mass spectrometry study reveals a significant number of new modifications in each organism under study. Therefore, all specificity numbers are underestimates of the actual specificity and should not be taken to be absolutely quantitative. Nevertheless they allow relative comparisons between algorithms and parameter choices for a given algorithm.

  • 4 Empirical studies showed that increasing the stringency of the so-called residual motif (the catchall motif for all peptides that cannot otherwise be deconvoluted into a motif class) by adding a constant offset of +30 to its threshold cutoff value yielded more specific predictions than when using a threshold identical to that of the other motifs. Therefore, the range of thresholds for residual motifs actually ranged from −70 to +129, and the threshold for all other motifs ranged from −100 to +99. Thus when interpreting data in supplemental Table 8, implicitly a row in the table for threshold value t should really be considered as the threshold value t for all motifs except for the residual motif for which the threshold value was instead set to t + 30.

  • * This work was supported, in whole or in part, by National Institutes of Health Grant GM068763 and Grant EY07110-17 from the NEI. This work was also supported by the United States Department of Energy Genomes to Life program. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

  • S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.

  • Both authors contributed equally to this work.

    • Received July 22, 2008.
    • Revision received October 27, 2008.
« Previous | Next Article »Table of Contents
  • Advertisement
  • Advertisement
Advertisement