Advertisement

Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data*

  • Desislava Boyanova
    Footnotes
    Affiliations
    From the Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
    Search for articles by this author
  • Santosh Nilla
    Footnotes
    Affiliations
    From the Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
    Search for articles by this author
  • Gunnar W. Klau
    Affiliations
    Life Sciences, Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands
    Search for articles by this author
  • Thomas Dandekar
    Affiliations
    From the Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
    Search for articles by this author
  • Tobias Müller
    Affiliations
    From the Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
    Search for articles by this author
  • Marcus Dittrich
    Correspondence
    To whom correspondence should be addressed: Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany. Tel.:+49 931-84557; E-mail: .
    Affiliations
    From the Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
    Search for articles by this author
  • Author Footnotes
    * This work was supported by Bundesministerium für Bildung und Forschung (Federal Ministry of Education and Research)/Systems Biology of ADP Receptor Activition (D.B. and S.N.) and the DFG (Transregio 124, TP B2).
    This article contains supplemental Figs. S1 to S4 and Tables S1 to S6.
    ‖ Both authors contributed equally to this work.
Open AccessPublished:May 07, 2014DOI:https://doi.org/10.1074/mcp.M113.032839
      The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system.
      Proteome studies using mass spectrometry are a widely used technique for the analysis of protein samples and investigation of signaling pathways (
      • Choudhary C.
      • Mann M.
      Decoding signalling networks by mass spectrometry-based proteomics.
      ,
      • Preisinger C.
      • von Kriegsheim A.
      • Matallanas D.
      • Kolch W.
      Proteomics and phosphoproteomics for the mapping of cellular signalling networks.
      ). Albeit quantitative measurements are possible (
      • Preisinger C.
      • von Kriegsheim A.
      • Matallanas D.
      • Kolch W.
      Proteomics and phosphoproteomics for the mapping of cellular signalling networks.
      ,
      • Bachi A.
      • Bonaldi T.
      Quantitative proteomics as a new piece of the systems biology puzzle.
      ), they are technically demanding and more costly than simple qualitative approaches, which usually produce a list of proteins identified in the analyzed sample. These lists are often hard to interpret and may include incomplete data because of the insufficient coverage of MS-based proteomics, for example for low-abundance proteins or proteins with specific physical characteristics (
      • Goh W.W.
      • Lee Y.H.
      • Chung M.
      • Wong L.
      How advancement in biological network analysis methods empowers proteomics.
      ). For subsequent analysis, single proteins or small subsets of proteins of interest are usually selected from the data sets on the basis of biological background knowledge, mostly because of their known association with particular biological pathways.
      Protein-protein interaction (PPI) networks, which contain experimentally validated interactions between proteins, can supply a basic structure for the network context in and between pathways. Using PPI information to create biological networks of the analyzed data and searching for functional modules can therefore be a powerful tool for the identification of key pathways in cellular systems. In this context, a functional module is defined as a group or cluster of proteins having similar biological function in the same network neighborhood (
      • Barabasi A.L.
      • Gulbahce N.
      • Loscalzo J.
      Network medicine: a network-based approach to human disease.
      ), which act in a concerted manner to accomplish specialized cellular functions. Much effort has been spent in recent years to optimize methods for identification of proteins ensuring the validity of measurements (
      • Reiter L.
      • Claassen M.
      • Schrimpf S.P.
      • Jovanovic M.
      • Schmidt A.
      • Buhmann J.M.
      • Hengartner M.O.
      • Aebersold R.
      Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.
      ,
      • Xiao C.L.
      • Chen X.Z.
      • Du Y.L.
      • Sun X.
      • Zhang G.
      • He Q.Y.
      Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.
      ,
      • Haas W.
      • Faherty B.
      • Gerber S.
      • Elias J.
      • Beausoleil S.
      • Bakalarski C.
      • Li X.
      • Villén J.
      • Gygi S.
      Optimization and use of peptide mass measurement accuracy in shotgun proteomics.
      ). The development of functional analysis methods and further biological interpretation of results, however, has just started. Previously, some approaches have already focused on the network investigation of proteomic data (
      • Antonov A.V.
      • Dietmann S.
      • Rodchenkov I.
      • Mewes H.W.
      PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.
      ).
      Data-driven approaches to analyze large-scale data sets can produce an unbiased view on the entire data set and help to focus on small subsets of functionally relevant proteins (
      • Beisser D.
      • Klau G.W.
      • Dandekar T.
      • Muller T.
      • Dittrich M.T.
      BioNet: an R-Package for the functional analysis of biological networks.
      ,
      • Dittrich M.T.
      • Klau G.W.
      • Rosenwald A.
      • Dandekar T.
      • Müller T.
      Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
      ,
      • Huang S.S.
      • Fraenkel E.
      Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks.
      ,
      • Ideker T.
      • Ozier O.
      • Schwikowski B.
      • Siegel A.F.
      Discovering regulatory and signalling circuits in molecular interaction networks.
      ,
      • Scott M.S.
      • Perkins T.
      • Bunnell S.
      • Pepin F.
      • Thomas D.Y.
      • Hallett M.
      Identifying regulatory subnetworks for a set of genes.
      ,
      • Zheng S.
      • Zhao Z.
      GenRev: Exploring functional relevance of genes in molecular networks.
      ). Further methods for investigating functional and disease-related subnetworks of genes or proteins have also been introduced in recent years (
      • Zhao X.M.
      • Wang R.S.
      • Chen L.
      • Aihara K.
      Uncovering signal transduction networks from high-throughput data by integer linear programming.
      ,
      • Ulitsky I.
      • Krishnamurthy A.
      • Karp R.M.
      • Shamir R.
      DEGAS: de novo discovery of dysregulated pathways in human diseases.
      ,
      • Vandin F.
      • Upfal E.
      • Raphael B.J.
      Algorithms for detecting significantly mutated pathways in cancer.
      ,
      • Qiu Y.Q.
      • Zhang S.
      • Zhang X.S.
      • Chen L.
      Detecting disease associated modules and prioritizing active genes based on high throughput data.
      ,
      • Su J.
      • Yoon B.J.
      • Dougherty E.R.
      Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network.
      ,

      Chowdhury, S. A., Koyuturk, M., (2010) Identification of coordinately dysregulated subnetworks in complex phenotypes. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 133–144,

      ,
      • Dao P.
      • Wang K.
      • Collins C.
      • Ester M.
      • Lapuk A.
      • Sahinalp S.C.
      Optimally discriminative subnetwork markers predict response to chemotherapy.
      ). However, these approaches are based on protein/gene information on the nodes and the mere topology of the PPI network and although some of them use edge weights (
      • Huang S.S.
      • Fraenkel E.
      Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks.
      ,
      • Vandin F.
      • Upfal E.
      • Raphael B.J.
      Algorithms for detecting significantly mutated pathways in cancer.
      ,
      • Qiu Y.Q.
      • Zhang S.
      • Zhang X.S.
      • Chen L.
      Detecting disease associated modules and prioritizing active genes based on high throughput data.
      ), functional information for the interaction connecting these nodes is not used.
      Large-scale bioinformatics analyses profit heavily from the development of a standardized functional annotation of the gene ontology (GO)
      The abbreviations used are: GO, Gene Ontology; PPI, Protein-Protein Interactions; BP, Biological Process; MF, Molecular Function; CC, Cellular Component; DAG, Directed Acyclic Graph; MICA, Most Informative Common Ancestor; BUM, Beta Uniform Mixture; ILP, Integer Linear Programming; PBMC, Peripheral Blood Mononuclear Cell.
      1The abbreviations used are: GO, Gene Ontology; PPI, Protein-Protein Interactions; BP, Biological Process; MF, Molecular Function; CC, Cellular Component; DAG, Directed Acyclic Graph; MICA, Most Informative Common Ancestor; BUM, Beta Uniform Mixture; ILP, Integer Linear Programming; PBMC, Peripheral Blood Mononuclear Cell.
      (
      • Ashburner M.
      • Ball C.A.
      • Blake J.A.
      • Botstein D.
      • Butler H.
      • Cherry J.M.
      • Davis A.P.
      • Dolinski K.
      • Dwight S.S.
      • Eppig J.T.
      • Harris M.A.
      • Hill D.P.
      • Issel-Tarver L.
      • Kasarskis A.
      • Lewis S.
      • Matese J.C.
      • Richardson J.E.
      • Ringwald M.
      • Rubin G.M.
      • Sherlock G.
      Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
      ). GO is a hierarchical structure containing functional terms, which are grouped into so-called ontologies. There are three main ontologies: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) containing a set of terms grouped hierarchically. Proteins are assigned biological terms according to their function in an either manual or automated way. Interestingly, two genes can be compared using the information from their assigned functional GO annotations, which is called semantic similarity. Several similarity measures have been defined between GO terms, which have been extended to similarity measures between two or more proteins (
      • Guzzi P.H.
      • Mina M.
      • Guerra C.
      • Cannataro M.
      Semantic similarity analysis of protein data: assessment with biological features and issues.
      ). Functional similarity between two proteins is an adequate measurement for how well they interact in a biological network based on the hypothesis that proteins with related function tend to interact with each other (
      • Barabasi A.L.
      • Gulbahce N.
      • Loscalzo J.
      Network medicine: a network-based approach to human disease.
      ).
      Using the GO semantic similarity of two genes as a basis and integrating this knowledge to a predefined human PPI network of experimentally detected interactions from literature, we have developed an exact functional module detection algorithm for the analysis of qualitative proteomics data sets. Our investigation involves five important steps: a) the development of a functional interaction score for each edge based on the GO semantic similarity of the interacting proteins; b) the testing and validation of the information signal contained in the functionally enhanced human PPI network; c) the application of the new algorithm to a small proteomic data set of H9N2 virus-infected gastric cells (
      • Liu N.
      • Song W.
      • Wang P.
      • Lee K.
      • Chan W.
      • Chen H.
      • Cai Z.
      Proteomics analysis of differential expression of cellular proteins in response to avian H9N2 virus infection in human cells.
      ); d) the extension of the method for the decomposition of networks into functional modules; and finally e) the characterization of different blood constituents based on a large proteome data set (
      • Haudek V.J.
      • Slany A.
      • Gundacker N.C.
      • Wimmer H.
      • Drach J.
      • Gerner C.
      Proteome maps of the main human peripheral blood constituents.
      ). Thus we demonstrate the benefits of combining proteome data with functionally scored PPI networks and illustrate multiple options for network analysis of qualitative proteomic data sets. To make this method available to all proteome researchers, we have created a website (http://gosim.bioapps.biozentrum.uni-wuerzburg.de/gosim.php) which gives a detailed tutorial of the algorithm including all scripts and programs required. This will help users to understand the method by examples and allows to apply the procedure step by step using own sample proteins.

      DISCUSSION

      Fast evolving mass spectrometry techniques generate large proteome data sets and only systematic analysis methods can reveal new insights into functional properties of the cellular system (
      • Pan C.
      • Olsen J.V.
      • Daub H.
      • Mann M.
      Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics.
      ,
      • Zanivan S.
      • Meves A.
      • Behrendt K.
      • Schoof E.M.
      • Neilson L.J.
      • Cox J.
      • Tang H.R.
      • Kalna G.
      • van Ree J.H.
      • van Deursen J.M.
      • Trempus C.S.
      • Machesky L.M.
      • Linding R.
      • Wickstrom S.A.
      • Fassler R.
      • Mann M.
      In Vivo SILAC-Based Proteomics Reveals Phosphoproteome Changes during Mouse Skin Carcinogenesis.
      ). In contrast to transcriptomics, where often large numbers of biological replicates from genome-wide measurements are available along with a set of well-established analytical methods, proteomics still faces challenges with genome-wide quantitative measurements and low numbers of replicates. This is particularly true for qualitative proteomics, where only the presence or absence of proteins is determined without quantifying the abundance or change of protein expression. The biological interpretation of lists containing identified proteins in the sample is often challenging. Moreover, because of technical difficulties in protein detection (e.g. transmembrane proteins in gel-based systems) the probability of false negative results (i.e. missed proteins) is relatively high, which makes the meaningful interpretation of these results even more difficult.
      Here, integrated network analysis can be particularly useful to overcome some of the experimental and statistical limitations. Interaction networks provide a wealth of information, which can be exploited by integrated network analyses. In particular, these molecular networks typically provide a functional scaffold on which the systemic analysis and interpretation of proteome data can be founded. Integrated network analysis can pick up proteins, which have not been directly detected in a sample (potentially false negatives) but are part of functional complexes or pathways; the presence of these proteins is inferred by their functional network context. This is one of the central points, where integrated network analysis also provides the potential to improve the protein detection error rate (
      • Goh W.W.
      • Lee Y.H.
      • Chung M.
      • Wong L.
      How advancement in biological network analysis methods empowers proteomics.
      ). For the analysis of a proteome data set it is therefore highly desirable to integrate and examine the interaction partners and network properties (
      • Antonov A.V.
      • Dietmann S.
      • Rodchenkov I.
      • Mewes H.W.
      PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.
      ,
      • Gwinner F.
      • Acosta-Martin A.E.
      • Boytard L.
      • Chwastyniak M.
      • Beseme O.
      • Drobecq H.
      • Duban-Deweer S.
      • Juthier F.
      • Jude B.
      • Amouyel P.
      • Pinet F.
      • Schwikowski B.
      Identification of additional proteins in differential proteomics using protein interaction networks.
      ).
      However, merely mapping the identified proteins in a sample to a protein-protein interaction network is not enough for profound biological investigation and interpretation of the data. Plainly, a minimal analysis could simply focus on the connected components of the proteins in the network. However, if the number of identified proteins is large, the connected network components derived may become huge and thus will not be amendable for biological interpretation. Moreover, the modules might become “functionally diluted” in the sense that they comprise different parts of many diverse pathways. Furthermore, because of the high connectivity of interactome components often more than one possible path can be found to connect the sample proteins. The resulting modules might not be unique, as there might be more than one possibility for connecting the positive nodes with a maximum-scoring subgraph.
      One solution to this problem is the integration of quantitative edge weights. A weighting score of the edges allows concentrating the resulting submodule onto paths with higher functional resemblance of interacting proteins. Here we introduced a scoring based on GO similarity, which ensures that proteins are connected by functionally similar and thus biologically more relevant interactions and it can also serve as an indicator of interaction confidence in the network. This is additionally supported by the observation that high scoring interactions have a higher literature support (supplemental Fig. S3).
      Protein interactions are weighted based on GO annotations, which comprise information on three main ontologies. Based on this annotation functional similarity between proteins can be quantified on each of the three ontologies separately (
      • Schlicker A.
      • Lengauer T.
      • Albrecht M.
      Improving disease gene prioritization using the semantic similarity of Gene Ontology terms.
      ). However, to obtain a combined score it is necessary to aggregate the similarity of the three ontologies into one similarity measure. Previously published methods essentially average the score over all three categories for an aggregated score (
      • Schlicker A.
      • Domingues F.S.
      • Rahnenfuhrer J.
      • Lengauer T.
      A new measure for functional similarity of gene products based on Gene Ontology.
      ,
      • Schlicker A.
      • Albrecht M.
      FunSimMat: a comprehensive functional similarity database.
      ). The main problem with this aggregation is that if GO information is missing even for one ontology, the final score is not defined. To circumvent this problem we introduce a new aggregation statistic based on an order statistic which automatically handles missing ontology information intrinsically (
      • Dittrich M.T.
      • Klau G.W.
      • Rosenwald A.
      • Dandekar T.
      • Müller T.
      Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
      ). Furthermore, this scoring is adapted to the context of network analysis based on a statistical model that measures the joint signal content of the network topology and GO similarity.
      As GO annotation is an important measure to validate PPIs detected in high throughput interaction studies, and vice versa PPI information contributes to GO annotation the usage of GO as basis for functional interaction scores may possibly lead to a bias through redundancy.
      This might partially contribute to the signal content in terms of functional similarity for the interacting proteins in the network (see Fig. 2). Interestingly, a recent study analogously found that the number of PPI interactions between genes within a GO-term is significantly higher than would be expected for random networks (
      • Dutkowski J.
      • Kramer M.
      • Surma M.A.
      • Balakrishnan R.
      • Cherry J.M.
      • Krogan N.J.
      • Ideker T.
      A gene ontology inferred from molecular networks.
      ).
      Integrating these scores into the proteomic network, it can then be decomposed into functional modules using an exact algorithm (
      • Dittrich M.T.
      • Klau G.W.
      • Rosenwald A.
      • Dandekar T.
      • Müller T.
      Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
      ). Thus our algorithm provably identifies the maximal scoring subgraph in networks, where nodes and edges are scored (
      • Beisser D.
      • Klau G.W.
      • Dandekar T.
      • Muller T.
      • Dittrich M.T.
      BioNet: an R-Package for the functional analysis of biological networks.
      ,
      • Dittrich M.T.
      • Klau G.W.
      • Rosenwald A.
      • Dandekar T.
      • Müller T.
      Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
      ,
      • Beisser D.
      • Brunkhorst S.
      • Dandekar T.
      • Klau G.W.
      • Dittrich M.T.
      • Muller T.
      Robustness and accuracy of functional modules in integrated network analysis.
      ). This optimal solution contains the maximum positive scoring nodes (proteins) and interactions connecting them. Modules identified by the algorithm can represent parts of protein complexes or pathways with the same function making this approach highly suitable to identify protein complexes or pathways.
      Although several methods of module detection have been published in recent years, most of them focus on the analysis of gene expression data in the context of PPI networks pioneered by the work of Ideker et al. (
      • Ideker T.
      • Ozier O.
      • Schwikowski B.
      • Siegel A.F.
      Discovering regulatory and signalling circuits in molecular interaction networks.
      ). Most of these works use heuristic algorithms which generally only give approximate instead of exact optimal solutions, often resulting in different modules (for an evaluation see (
      • Beisser D.
      • Brunkhorst S.
      • Dandekar T.
      • Klau G.W.
      • Dittrich M.T.
      • Muller T.
      Robustness and accuracy of functional modules in integrated network analysis.
      )). As these approaches usually require quantitative measurements at the network nodes they do not easily transfer to proteome data where often only proteins detection calls are available. Several methods implemented in the MATISSE framework (
      • Ulitsky I.
      • Krishnamurthy A.
      • Karp R.M.
      • Shamir R.
      DEGAS: de novo discovery of dysregulated pathways in human diseases.
      ,
      • Ulitsky I.
      • Shamir R.
      Identification of functional modules using network topology and high-throughput data.
      ,
      • Ulitsky I.
      • Shamir R.
      Identifying functional modules using expression profiles and confidence-scored protein interactions.
      ) search for modules in PPI networks using gene expression data. Based on a heuristic search algorithm submodules in a given PPI network showing highly correlated expression profiles are identified (
      • Ulitsky I.
      • Shamir R.
      Identification of functional modules using network topology and high-throughput data.
      ). Although probabilities for individual interactions might be integrated (
      • Ulitsky I.
      • Shamir R.
      Identifying functional modules using expression profiles and confidence-scored protein interactions.
      ) the explicit use of GO similarity as an additional metric for scoring interactions is not available. Methods directly investigating Proteome data (without the requirement for quantitative measurements) include the PPI-spider algorithm approach by Antonov et al. (
      • Antonov A.V.
      • Dietmann S.
      • Rodchenkov I.
      • Mewes H.W.
      PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.
      ) which incrementally adds interconnecting network nodes to the list of identified input proteins in order to link the entire set of input nodes together. This approach does not use edge weights to prefer edges with a potential higher functional relevance and consequently does not try to identify an “optimal” module in terms a clearly defined objective function. Thus, a particular advantage of our algorithm is the combination of optimized and improved functional scores with an exact algorithm for the identification of provably optimal solutions. In fact, the method not only uses GO scores but it optimizes these scores by adapting them to the network context after a signal and noise decomposition, thereby focusing primarily on relevant interactions.
      Using this integrated network approach we first studied a small data set derived from virus infected human gastric cells and compared it to an approach without functional score information. Clear differences in the topology and biological interpretation of the two resulting networks became apparent: Although the main functional complex of keratin proteins was maintained in both, the connecting path to the other proteins in the sample was different.
      A detailed comparison between some of the given interactions is presented in the supplement (supplemental Table S6). The most prominent example for a different path chosen by the functionally enhanced algorithm is the interaction between the keratin complex (KRT18) with YWHAB (a 14–3-3 protein) and the kinase RAF1. The solution without interaction scores contains the Protein kinase C (PRKCE) as link to the keratin complex. Although the phosphorylation of KRT18 at Ser53 by PRKCE is already known, the functional role of this phosphorylation has not yet been confirmed. It has been speculated that it may play an in vivo role in filament reorganization (
      • Ku N.O.
      • Omary M.B.
      Identification of the major physiologic phosphorylation site of human keratin 18: potential kinases and a role in filament reorganization.
      ,
      • Omary M.B.
      • Baxter G.T.
      • Chou C.F.
      • Riopel C.L.
      • Lin W.Y.
      • Strulovici B.
      PKC epsilon-related kinase associates with and phosphorylates cytokeratin 8 and 18.
      ). Focusing on the solution with functional scores, the association of keratin 18 with 14–3-3 proteins and thereby with Raf1 kinase is known to regulate cell signaling (
      • Ku N.O.
      • Fu H.
      • Omary M.B.
      Raf-1 activation disrupts its binding to keratins during cell stress.
      ). During cell stress the binding between Raf1 and the keratin cluster is disrupted (
      • Ku N.O.
      • Fu H.
      • Omary M.B.
      Raf-1 activation disrupts its binding to keratins during cell stress.
      ), which might be a useful hint, as a virus infection causes stress to the gastric cells. KRT18 phosphorylation is reported to regulate various keratin functions including the binding to 14–3-3 proteins, involvement in the modulation of cell cycle progression and organizing keratin filaments (
      • Ku N.O.
      • Azhar S.
      • Omary M.B.
      Keratin 8 phosphorylation by p38 kinase regulates cellular keratin filament reorganization: modulation by a keratin 1-like disease causing mutation.
      ,
      • Ku N.O.
      • Michie S.
      • Resurreccion E.Z.
      • Broome R.L.
      • Omary M.B.
      Keratin binding to 14–3-3 proteins modulates keratin filaments and hepatocyte mitotic progression.
      ) along with a role in keratin protein turnover by ubiquitination (
      • Ku N.O.
      • Omary M.B.
      Keratins turn over by ubiquitination in a phosphorylation-modulated fashion.
      ) or during apoptosis (
      • Ku N.O.
      • Omary M.B.
      Effect of mutation and phosphorylation of type I keratins on their caspase-mediated degradation.
      ). There is also evidence of a possible phosphorylation of KRT18 by Raf1, which causes the disruption in the complex (
      • Ku N.O.
      • Fu H.
      • Omary M.B.
      Raf-1 activation disrupts its binding to keratins during cell stress.
      ). Hence, including functional scores enhances the biological insights of the resulting functional modules as shown here for gastric cells.
      Based on functional modules we characterized a large-scale proteomic data set of blood constituents. An interesting aspect was the enrichment of proteasome and ubiquitin processes in particularly three cell types: T cells, monocytes and PBMC. The existence and role of the proteasome in immune as well as non-immune cells has recently been reviewed by Ebstein et al. (
      • Ebstein F.
      • Kloetzel P.M.
      • Kruger E.
      • Seifert U.
      Emerging roles of immunoproteasomes beyond MHC class I antigen processing.
      ). Interestingly, these three cell types constitutively express three proteasome subunits (PSMB8, PSMB10, and PSMB9). These are normally not included in the proteasome but induced after stimulation building the so called immunoprotesome, which plays an important role in antigen presentation by MHC class I molecules, cell proliferation, cell signaling and cytokine production (
      • Ebstein F.
      • Kloetzel P.M.
      • Kruger E.
      • Seifert U.
      Emerging roles of immunoproteasomes beyond MHC class I antigen processing.
      ). This type of proteasome has been identified mainly in professional antigen-presenting cells such as macrophages (
      • Haorah J.
      • Heilman D.
      • Diekmann C.
      • Osna N.
      • Donohue Jr., T.M.
      • Ghorpade A.
      • Persidsky Y.
      Alcohol and HIV decrease proteasome and immunoproteasome function in macrophages: implications for impaired immune function during disease.
      ), activated or resting B-cells (
      • Frisan T.
      • Levitsky V.
      • Masucci M.G.
      Variations in proteasome subunit composition and enzymatic activity in B-lymphoma lines and normal B cells.
      ) and monocyte-derived dendritic cells (
      • Macagno A.
      • Gilliet M.
      • Sallusto F.
      • Lanzavecchia A.
      • Nestle F.O.
      • Groettrup M.
      Dendritic cells up-regulate immunoproteasomes and the proteasome regulator PA28 during maturation.
      ). In our results monocytes are enriched for the GO term “proteasome.” As monocytes replenish the pool of macrophages and dendritic cells in the body, we can assume a presence of immunoproteasome in these cells as well. T cells show a constitutive expression of all three molecules and the immunoproteasome facilitates protein homeostasis and cell proliferation in these cells (
      • Zaiss D.M.
      • de Graaf N.
      • Sijts A.J.
      The proteasome immunosubunit multicatalytic endopeptidase complex-like 1 is a T-cell-intrinsic factor influencing homeostatic expansion.
      ). Because PBMCs are mainly a mixture of macrophages, monocytes, and lymphocytes we can expect that we would identify the immunoproteasome here as well. The exclusive expression of the immunoproteasome subunits in the monocyte, T cell and PBMC samples could explain the relative overrepresentation of “proteasome” related GO terms when compared with the profile of the other blood constituents.
      Although qualitative proteomic data sets only measure the presence of proteins in a sample, quantitative proteomics can reveal exact changes in abundance of signaling proteins. As technologies are constantly evolving more and more quantitative data is becoming available. Hence, the focus of our future work will be on the extension of the algorithm to the analysis of quantitative proteome data and particularly the more complex analysis of phosphoproteome data.
      In summary, we have demonstrated in this study that a detailed investigation of integrated protein-protein interaction networks can be accomplished by using a functional module-based network algorithm. The presented approach allows a functional decomposition of complex proteomic data sets and offers a systems biological perspective on proteome data in the context of cellular interaction networks. It provides insights into pathway structures beyond the scope of traditional analysis approaches, in particular hints in large-scale proteomics and description of networks, modules, and submodules with specific function.

      REFERENCES

        • Choudhary C.
        • Mann M.
        Decoding signalling networks by mass spectrometry-based proteomics.
        Nat. Rev. Mol. Cell Biol. 2010; 11: 427-439
        • Preisinger C.
        • von Kriegsheim A.
        • Matallanas D.
        • Kolch W.
        Proteomics and phosphoproteomics for the mapping of cellular signalling networks.
        Proteomics. 2008; 8: 4402-4415
        • Bachi A.
        • Bonaldi T.
        Quantitative proteomics as a new piece of the systems biology puzzle.
        J. Proteomics. 2008; 71: 357-367
        • Goh W.W.
        • Lee Y.H.
        • Chung M.
        • Wong L.
        How advancement in biological network analysis methods empowers proteomics.
        Proteomics. 2012; 12: 550-563
        • Barabasi A.L.
        • Gulbahce N.
        • Loscalzo J.
        Network medicine: a network-based approach to human disease.
        Nat. Rev. Genet. 2011; 12: 56-68
        • Reiter L.
        • Claassen M.
        • Schrimpf S.P.
        • Jovanovic M.
        • Schmidt A.
        • Buhmann J.M.
        • Hengartner M.O.
        • Aebersold R.
        Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry.
        Mol. Cell. Proteomics. 2009; 8: 2405-2417
        • Xiao C.L.
        • Chen X.Z.
        • Du Y.L.
        • Sun X.
        • Zhang G.
        • He Q.Y.
        Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.
        J. Proteome Res. 2013; 12: 328-335
        • Haas W.
        • Faherty B.
        • Gerber S.
        • Elias J.
        • Beausoleil S.
        • Bakalarski C.
        • Li X.
        • Villén J.
        • Gygi S.
        Optimization and use of peptide mass measurement accuracy in shotgun proteomics.
        Mol. Cell. Proteomics. 2006; 5: 1326-1337
        • Antonov A.V.
        • Dietmann S.
        • Rodchenkov I.
        • Mewes H.W.
        PPI spider: a tool for the interpretation of proteomics data in the context of protein-protein interaction networks.
        Proteomics. 2009; 9: 2740-2749
        • Beisser D.
        • Klau G.W.
        • Dandekar T.
        • Muller T.
        • Dittrich M.T.
        BioNet: an R-Package for the functional analysis of biological networks.
        Bioinformatics. 2010; 26: 1129-1130
        • Dittrich M.T.
        • Klau G.W.
        • Rosenwald A.
        • Dandekar T.
        • Müller T.
        Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
        Bioinformatics. 2008; 24: i223-i231
        • Huang S.S.
        • Fraenkel E.
        Integrating proteomic, transcriptional, and interactome data reveals hidden components of signaling and regulatory networks.
        Sci. Signal. 2009; 2: ra40
        • Ideker T.
        • Ozier O.
        • Schwikowski B.
        • Siegel A.F.
        Discovering regulatory and signalling circuits in molecular interaction networks.
        Bioinformatics. 2002; 18: S233-S240
        • Scott M.S.
        • Perkins T.
        • Bunnell S.
        • Pepin F.
        • Thomas D.Y.
        • Hallett M.
        Identifying regulatory subnetworks for a set of genes.
        Mol. Cell. Proteomics. 2005; 4: 683-692
        • Zheng S.
        • Zhao Z.
        GenRev: Exploring functional relevance of genes in molecular networks.
        Genomics. 2011; 99: 183-188
        • Zhao X.M.
        • Wang R.S.
        • Chen L.
        • Aihara K.
        Uncovering signal transduction networks from high-throughput data by integer linear programming.
        Nucleic Acids Res. 2008; 36: e48
        • Ulitsky I.
        • Krishnamurthy A.
        • Karp R.M.
        • Shamir R.
        DEGAS: de novo discovery of dysregulated pathways in human diseases.
        PloS one. 2010; 5: e13367
        • Vandin F.
        • Upfal E.
        • Raphael B.J.
        Algorithms for detecting significantly mutated pathways in cancer.
        J. Comput. Biol. 2011; 18: 507-522
        • Qiu Y.Q.
        • Zhang S.
        • Zhang X.S.
        • Chen L.
        Detecting disease associated modules and prioritizing active genes based on high throughput data.
        BMC Bioinformatics. 2010; 11: 26
        • Su J.
        • Yoon B.J.
        • Dougherty E.R.
        Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network.
        BMC Bioinformatics. 2010; 6: S8
      1. Chowdhury, S. A., Koyuturk, M., (2010) Identification of coordinately dysregulated subnetworks in complex phenotypes. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 133–144,

        • Dao P.
        • Wang K.
        • Collins C.
        • Ester M.
        • Lapuk A.
        • Sahinalp S.C.
        Optimally discriminative subnetwork markers predict response to chemotherapy.
        Bioinformatics. 2011; 27: i205-i213
        • Ashburner M.
        • Ball C.A.
        • Blake J.A.
        • Botstein D.
        • Butler H.
        • Cherry J.M.
        • Davis A.P.
        • Dolinski K.
        • Dwight S.S.
        • Eppig J.T.
        • Harris M.A.
        • Hill D.P.
        • Issel-Tarver L.
        • Kasarskis A.
        • Lewis S.
        • Matese J.C.
        • Richardson J.E.
        • Ringwald M.
        • Rubin G.M.
        • Sherlock G.
        Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
        Nat. Genet. 2000; 25: 25-29
        • Guzzi P.H.
        • Mina M.
        • Guerra C.
        • Cannataro M.
        Semantic similarity analysis of protein data: assessment with biological features and issues.
        Briefings in bioinformatics. 2012; 13: 569-585
        • Liu N.
        • Song W.
        • Wang P.
        • Lee K.
        • Chan W.
        • Chen H.
        • Cai Z.
        Proteomics analysis of differential expression of cellular proteins in response to avian H9N2 virus infection in human cells.
        Proteomics. 2008; 8: 1851-1858
        • Haudek V.J.
        • Slany A.
        • Gundacker N.C.
        • Wimmer H.
        • Drach J.
        • Gerner C.
        Proteome maps of the main human peripheral blood constituents.
        J. Proteome Res. 2009; 8: 3834-3843
        • Keshava Prasad T.S.
        • Goel R.
        • Kandasamy K.
        • Keerthikumar S.
        • Kumar S.
        • Mathivanan S.
        • Telikicherla D.
        • Raju R.
        • Shafreen B.
        • Venugopal A.
        • Balakrishnan L.
        • Marimuthu A.
        • Banerjee S.
        • Somanathan D.S.
        • Sebastian A.
        • Rani S.
        • Ray S.
        • Harrys Kishore C.J.
        • Kanth S.
        • Ahmed M.
        • Kashyap M.K.
        • Mohmood R.
        • Ramachandra Y.L.
        • Krishna V.
        • Rahiman B.A.
        • Mohan S.
        • Ranganathan P.
        • Ramabadran S.
        • Chaerkady R.
        • Pandey A.
        Human Protein Reference Database–2009 update.
        Nucleic Acids Res. 2009; 37: D767-D772
        • Hornbeck P.V.
        • Chabra I.
        • Kornhauser J.M.
        • Skrzypek E.
        • Zhang B.
        PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation.
        Proteomics. 2004; 4: 1551-1561
        • Maere S.
        • Heymans K.
        • Kuiper M.
        BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks.
        Bioinformatics. 2005; 21: 3448-3449
        • Shannon P.
        • Markiel A.
        • Ozier O.
        • Baliga N.S.
        • Wang J.T.
        • Ramage D.
        • Amin N.
        • Schwikowski B.
        • Ideker T.
        Cytoscape: a software environment for integrated models of biomolecular interaction networks.
        Genome Res. 2003; 13: 2498-2504
        • Benjamini Y.
        • Hochberg Y.
        Controlling the false discovery rate: a practical and powerful approach to multiple testing.
        J. Roy. Stat. Soc. B Met. 1995; : 289-300
        • Frohlich H.
        • Speer N.
        • Poustka A.
        • Beissbarth T.
        GOSim–an R-package for computation of information theoretic GO similarities between terms and gene products.
        BMC Bioinformatics. 2007; 8: 166
        • Schlicker A.
        • Domingues F.S.
        • Rahnenfuhrer J.
        • Lengauer T.
        A new measure for functional similarity of gene products based on Gene Ontology.
        BMC Bioinformatics. 2006; 7: 302
        • R Development Core Team
        R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria2011
        • Pounds S.
        • Morris S.W.
        Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values.
        Bioinformatics. 2003; 19: 1236-1242
        • Dittrich M.T.
        • Klau G.W.
        • Rosenwald A.
        • Dandekar T.
        • Muller T.
        Identifying functional modules in protein-protein interaction networks: an integrated exact approach.
        Bioinformatics. 2008; 24: i223-i231
        • Ljubić I.
        • Weiskircher R.
        • Pferschy U.
        • Klau G.W.
        • Mutzel P.
        • Fischetti M.
        An algorithmic framework for the exact solution of the prize-collecting steiner tree problem.
        Mathematical Programming. 2006; : 427-449
        • Tuncbag N.
        • McCallum S.
        • Huang S.S.
        • Fraenkel E.
        SteinerNet: a web server for integrating ‘omic’ data to discover hidden components of response pathways.
        Nucleic Acids Res. 2012; 40: W505-W509
        • Ku N.O.
        • Fu H.
        • Omary M.B.
        Raf-1 activation disrupts its binding to keratins during cell stress.
        J. Cell Biol. 2004; 166: 479-485
        • Avruch J.
        • Khokhlatchev A.
        • Kyriakis J.M.
        • Luo Z.
        • Tzivion G.
        • Vavvas D.
        • Zhang X.F.
        Ras activation of the Raf kinase: tyrosine kinase recruitment of the MAP kinase cascade.
        Recent Prog. Horm. Res. 2001; 56: 127-155
        • Serhan C.N.
        • Savill J.
        Resolution of inflammation: the beginning programs the end.
        Nat. Immunol. 2005; 6: 1191-1197
        • Nathan C.
        Neutrophils and immunity: challenges and opportunities.
        Nat. Rev. Immunol. 2006; 6: 173-182
        • Boyanova D.
        • Nilla S.
        • Birschmann I.
        • Dandekar T.
        • Dittrich M.
        PlateletWeb: a systems biologic analysis of signaling networks in human platelets.
        Blood. 2012; 119: e22-e34
        • Pan C.
        • Olsen J.V.
        • Daub H.
        • Mann M.
        Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics.
        Mol. Cell. Proteomics. 2009; 8: 2796-2808
        • Zanivan S.
        • Meves A.
        • Behrendt K.
        • Schoof E.M.
        • Neilson L.J.
        • Cox J.
        • Tang H.R.
        • Kalna G.
        • van Ree J.H.
        • van Deursen J.M.
        • Trempus C.S.
        • Machesky L.M.
        • Linding R.
        • Wickstrom S.A.
        • Fassler R.
        • Mann M.
        In Vivo SILAC-Based Proteomics Reveals Phosphoproteome Changes during Mouse Skin Carcinogenesis.
        Cell reports. 2013; 3: 552-566
        • Gwinner F.
        • Acosta-Martin A.E.
        • Boytard L.
        • Chwastyniak M.
        • Beseme O.
        • Drobecq H.
        • Duban-Deweer S.
        • Juthier F.
        • Jude B.
        • Amouyel P.
        • Pinet F.
        • Schwikowski B.
        Identification of additional proteins in differential proteomics using protein interaction networks.
        Proteomics. 2013; 13: 1065-1076
        • Schlicker A.
        • Lengauer T.
        • Albrecht M.
        Improving disease gene prioritization using the semantic similarity of Gene Ontology terms.
        Bioinformatics. 2010; 26: i561-i567
        • Schlicker A.
        • Albrecht M.
        FunSimMat: a comprehensive functional similarity database.
        Nucleic Acids Res. 2008; 36: D434-D439
        • Dutkowski J.
        • Kramer M.
        • Surma M.A.
        • Balakrishnan R.
        • Cherry J.M.
        • Krogan N.J.
        • Ideker T.
        A gene ontology inferred from molecular networks.
        Nat. Biotech. 2013; 31: 38-45
        • Beisser D.
        • Brunkhorst S.
        • Dandekar T.
        • Klau G.W.
        • Dittrich M.T.
        • Muller T.
        Robustness and accuracy of functional modules in integrated network analysis.
        Bioinformatics. 2012; 28: 1887-1894
        • Ulitsky I.
        • Shamir R.
        Identification of functional modules using network topology and high-throughput data.
        BMC Syst. Biol. 2007; 1: 8
        • Ulitsky I.
        • Shamir R.
        Identifying functional modules using expression profiles and confidence-scored protein interactions.
        Bioinformatics. 2009; 25: 1158-1164
        • Ku N.O.
        • Omary M.B.
        Identification of the major physiologic phosphorylation site of human keratin 18: potential kinases and a role in filament reorganization.
        J. Cell Biol. 1994; 127: 161-171
        • Omary M.B.
        • Baxter G.T.
        • Chou C.F.
        • Riopel C.L.
        • Lin W.Y.
        • Strulovici B.
        PKC epsilon-related kinase associates with and phosphorylates cytokeratin 8 and 18.
        J. Cell Biol. 1992; 117: 583-593
        • Ku N.O.
        • Azhar S.
        • Omary M.B.
        Keratin 8 phosphorylation by p38 kinase regulates cellular keratin filament reorganization: modulation by a keratin 1-like disease causing mutation.
        J. Biol. Chem. 2002; 277: 10775-10782
        • Ku N.O.
        • Michie S.
        • Resurreccion E.Z.
        • Broome R.L.
        • Omary M.B.
        Keratin binding to 14–3-3 proteins modulates keratin filaments and hepatocyte mitotic progression.
        Proc. Natl. Acad. Sci. U.S.A. 2002; 99: 4373-4378
        • Ku N.O.
        • Omary M.B.
        Keratins turn over by ubiquitination in a phosphorylation-modulated fashion.
        J. Cell Biol. 2000; 149: 547-552
        • Ku N.O.
        • Omary M.B.
        Effect of mutation and phosphorylation of type I keratins on their caspase-mediated degradation.
        J. Biol. Chem. 2001; 276: 26792-26798
        • Ebstein F.
        • Kloetzel P.M.
        • Kruger E.
        • Seifert U.
        Emerging roles of immunoproteasomes beyond MHC class I antigen processing.
        Cell. Mol. Life Sci. 2012;
        • Haorah J.
        • Heilman D.
        • Diekmann C.
        • Osna N.
        • Donohue Jr., T.M.
        • Ghorpade A.
        • Persidsky Y.
        Alcohol and HIV decrease proteasome and immunoproteasome function in macrophages: implications for impaired immune function during disease.
        Cell. Immunol. 2004; 229: 139-148
        • Frisan T.
        • Levitsky V.
        • Masucci M.G.
        Variations in proteasome subunit composition and enzymatic activity in B-lymphoma lines and normal B cells.
        Int. J. Cancer. 2000; 88: 881-888
        • Macagno A.
        • Gilliet M.
        • Sallusto F.
        • Lanzavecchia A.
        • Nestle F.O.
        • Groettrup M.
        Dendritic cells up-regulate immunoproteasomes and the proteasome regulator PA28 during maturation.
        Eur. J. Immunol. 1999; 29: 4037-4042
        • Zaiss D.M.
        • de Graaf N.
        • Sijts A.J.
        The proteasome immunosubunit multicatalytic endopeptidase complex-like 1 is a T-cell-intrinsic factor influencing homeostatic expansion.
        Infect. Immun. 2008; 76: 1207-1213