Advertisement

Graph Algorithms for Condensing and Consolidating Gene Set Analysis Results*

  • Author Footnotes
    ‖ These authors contributed equally to this work.
    Sara R. Savage
    Footnotes
    ‖ These authors contributed equally to this work.
    Affiliations
    Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
    Search for articles by this author
  • Author Footnotes
    ‖ These authors contributed equally to this work.
    Zhiao Shi
    Footnotes
    ‖ These authors contributed equally to this work.
    Affiliations
    Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
    Search for articles by this author
  • Yuxing Liao
    Affiliations
    Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas
    Search for articles by this author
  • Bing Zhang
    Correspondence
    To whom correspondence should be addressed:Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX. Tel.:713-798-1443;
    Affiliations
    Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, Texas

    Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
    Search for articles by this author
  • Author Footnotes
    * This work was supported by grants U24CA210954 from the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC), by grant CPRIT RR160027 from the Cancer Prevention & Research Institutes of Texas, and by funding from the McNair Medical Institute at The Robert and Janice McNair Foundation.
    This article contains supplemental material.
    ‖ These authors contributed equally to this work.
Open AccessPublished:May 29, 2019DOI:https://doi.org/10.1074/mcp.TIR118.001263
      Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52–77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNA-Seq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github.com/bzhanglab/sumer.

      Graphical Abstract

      The generation of large omics datasets is increasingly popular for studying biological and pathological systems. Analysis of these datasets frequently involves the identification of biological pathways, or more broadly defined gene sets, that are associated with the biological or clinical features of interest. To perform this analysis, predefined gene sets can be downloaded from a variety of databases, such as the Gene Ontology (GO)
      The abbreviations used are: BLCA, bladder urothelial cancer; BRCA, breast invasive carcinoma; COADREAD, colorectal adenocarcinoma; CPTAC, Clinical Proteomic Tumor Analysis Consortium; FDR, false discovery rate; GO, gene ontology; GSEA, gene set enrichment analysis; KIPAN, pan kidney cancer cohort; LAML, acute myeloid leukemia; LUAD, lung adenocarcinoma; ORA, over-representation analysis; PID, Pathway Interaction Database; TCGA, The Cancer Genome Atlas; UCEC, uterine endometrial carcinoma.
      1The abbreviations used are: BLCA, bladder urothelial cancer; BRCA, breast invasive carcinoma; COADREAD, colorectal adenocarcinoma; CPTAC, Clinical Proteomic Tumor Analysis Consortium; FDR, false discovery rate; GO, gene ontology; GSEA, gene set enrichment analysis; KIPAN, pan kidney cancer cohort; LAML, acute myeloid leukemia; LUAD, lung adenocarcinoma; ORA, over-representation analysis; PID, Pathway Interaction Database; TCGA, The Cancer Genome Atlas; UCEC, uterine endometrial carcinoma.
      (
      • Ashburner M.
      • Ball C.A.
      • Blake J.A.
      • Botstein D.
      • Butler H.
      • Cherry J.M.
      • Davis A.P.
      • Dolinski K.
      • Dwight S.S.
      • Eppig J.T.
      • Harris M.A.
      • Hill D.P.
      • Issel-Tarver L.
      • Kasarskis A.
      • Lewis S.
      • Matese J.C.
      • Richardson J.E.
      • Ringwald M.
      • Rubin G.M.
      • Sherlock G.
      Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
      ,
      • The Gene Ontology Consortium
      Expansion of the Gene Ontology knowledgebase and resources.
      ), KEGG (
      • Kanehisa M.
      • Furumichi M.
      • Tanabe M.
      • Sato Y.
      • Morishima K.
      KEGG: new perspectives on genomes, pathways, diseases and drugs.
      ), WikiPathways (
      • Slenter D.N.
      • Kutmon M.
      • Hanspers K.
      • Riutta A.
      • Windsor J.
      • Nunes N.
      • Mélius J.
      • Cirillo E.
      • Coort S.L.
      • Digles D.
      • Ehrhart F.
      • Giesbertz P.
      • Kalafati M.
      • Martens M.
      • Miller R.
      • Nishida K.
      • Rieswijk L.
      • Waagmeester A.
      • Eijssen L.M.T.
      • Evelo C.T.
      • Pico A.R.
      • Willighagen E.L.
      WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.
      ), Reactome (
      • Fabregat A.
      • Jupe S.
      • Matthews L.
      • Sidiropoulos K.
      • Gillespie M.
      • Garapati P.
      • Haw R.
      • Jassal B.
      • Korninger F.
      • May B.
      • Milacic M.
      • Roca C.D.
      • Rothfels K.
      • Sevilla C.
      • Shamovsky V.
      • Shorser S.
      • Varusai T.
      • Viteri G.
      • Weiser J.
      • Wu G.
      • Stein L.
      • Hermjakob H.
      • D'Eustachio P.
      The Reactome Pathway Knowledgebase.
      ), and the Pathway Interaction Database (PID) (
      • Schaefer C.F.
      • Anthony K.
      • Krupa S.
      • Buchoff J.
      • Day M.
      • Hannay T.
      • Buetow K.H.
      PID: the Pathway Interaction Database.
      ). In addition, meta-databases combining multiple databases to generate large collections of gene sets, such as the MSigDB (
      • Liberzon A.
      • Subramanian A.
      • Pinchback R.
      • Thorvaldsdóttir H.
      • Tamayo P.
      • Mesirov J.P.
      Molecular signatures database (MSigDB) 3.0.
      ), have also been developed. The two most popular gene set analysis methods are over-representation analysis (ORA) (
      • Zhang B.
      • Kirov S.
      • Snoddy J.
      WebGestalt: an integrated system for exploring gene sets in various biological contexts.
      ) and gene set enrichment analysis (GSEA) (
      • Subramanian A.
      • Tamayo P.
      • Mootha V.K.
      • Mukherjee S.
      • Ebert B.L.
      • Gillette M.A.
      • Paulovich A.
      • Pomeroy S.L.
      • Golub T.R.
      • Lander E.S.
      • Mesirov J.P.
      Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.
      ). Application of these methods to data generated from a single omics experiment is well standardized with many available tools (
      • Wang J.
      • Vasaikar S.
      • Shi Z.
      • Greer M.
      • Zhang B.
      WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit.
      ,
      • Huang D.W.
      • Sherman B.T.
      • Lempicki R.A.
      Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.
      ,
      • Kuleshov M.V.
      • Jones M.R.
      • Rouillard A.D.
      • Fernandez N.F.
      • Duan Q.
      • Wang Z.
      • Koplev S.
      • Jenkins S.L.
      • Jagodnik K.M.
      • Lachmann A.
      • McDermott M.G.
      • Monteiro C.D.
      • Gundersen G.W.
      • Ma'ayan A.
      Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
      ). However, integrating gene set analysis results from multiple experiments performed on the same or different omics platforms, such as multi-omics or pan-cancer studies, remains an open challenge. This problem is further complicated by redundancy in gene set databases.
      Gene set redundancy is common both within a single database and across databases. Within a single database, some gene sets may be more specific subsets of larger gene sets. This is particularly evident in GO, which is set up as hierarchical sets with increasing functional specialization. Crosstalk between biological processes and pathways can also result in shared genes between different gene sets within a single database. Across databases, redundancy can occur when the same gene set is included in multiple databases, or similar but non-identical sets of genes were associated with the same pathway in different database annotations, typically because of different perspectives in defining pathway boundaries. For example, the Reactome and KEGG databases contain two overlapping but different gene sets for the apoptosis pathway, and both of these gene sets are included in the MSigDB C2 collection of curated pathways.
      Several methods have been developed to handle gene set redundancy. Pathcards and ReCiPa both use algorithms to combine similar sets into larger super-sets (
      • Belinky F.
      • Nativ N.
      • Stelzer G.
      • Zimmerman S.
      • Iny Stein T.
      • Safran M.
      • Lancet D.
      PathCards: multi-source consolidation of human biological pathways.
      ,
      • Vivar J.C.
      • Pemu P.
      • McPherson R.
      • Ghosh S.
      Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and “Big data” biology.
      ). However, this method must be performed before enrichment analysis and may favor large, general pathways over highly specialized functional sets, although the latter provide more precise biological mechanisms. Recently, Stoney et al. developed a method using a modified set cover algorithm to select gene sets without changing the genes in the sets (
      • Stoney R.A.
      • Schwartz J.-M.
      • Robertson D.L.
      • Nenadic G.
      Using set theory to reduce redundancy in pathway sets.
      ). Set cover is an algorithm that tries to identify the smallest sub-collection of sets that covers the elements in the entire collection, but this method is biased toward selecting the largest subset (
      • Stoney R.A.
      • Schwartz J.-M.
      • Robertson D.L.
      • Nenadic G.
      Using set theory to reduce redundancy in pathway sets.
      ). Stoney et al. modified the set cover algorithm to select gene sets in order of increasing significance until all genes are covered. Because gene set prioritization is driven only by statistical significance, this method is not optimized for removing redundancy. In this study, we used a weighted set cover algorithm, which allows simultaneous consideration of both gene set size and significance.
      The integration of enrichment analysis results from multiple experiments is a further challenge. PaintOmics 3 uses a joint p value to combine results from different omics platforms (
      • Hernández-de-Diego R.
      • Tarazona S.
      • Martínez-Mira C.
      • Balzano-Nogueira L.
      • Furió-Tarí P.
      • Pappas G.J.
      • Conesa A.
      PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data.
      ). RAMONA can compare two enrichment analysis results using Bayesian networks (
      • Sass S.
      • Buettner F.
      • Mueller N.S.
      • Theis F.J.
      RAMONA: a Web application for gene set analysis on multilevel omics data.
      ). These methods focus on common rather than platform-specific gene sets. Moreover, their implementations only work for a certain type of enrichment analysis method or specific gene set databases. Network-based methods, such as ClueGO and Enrichment Map (
      • Bindea G.
      • Mlecnik B.
      • Hackl H.
      • Charoentong P.
      • Tosolini M.
      • Kirilovsky A.
      • Fridman W.-H.
      • Pagès F.
      • Trajanoski Z.
      • Galon J.
      ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks.
      ,
      • Merico D.
      • Isserlin R.
      • Stueker O.
      • Emili A.
      • Bader G.D.
      Enrichment map: A network-based method for gene-set enrichment visualization and interpretation.
      ), connect similar gene sets into a network and then rely on network clustering to consolidate enrichment analysis results. Using Enrichment Map as an example, it connects gene sets from any number and type of enrichment analyses into a network based on the Jaccard or overlap similarity and colors gene sets (i.e. nodes in the network) based on the results from each experiment (
      • Merico D.
      • Isserlin R.
      • Stueker O.
      • Emili A.
      • Bader G.D.
      Enrichment map: A network-based method for gene-set enrichment visualization and interpretation.
      ). Although network visualization and the gene set grouping achieved by the graph layout algorithms are very helpful, clusters of functionally related gene sets need to be manually identified and interpreted. This quickly becomes infeasible if many significant gene sets need to be consolidated. In this study, we used the affinity propagation algorithm (
      • Frey B.J.
      • Dueck D.
      Clustering by passing messages between data points.
      ), which not only groups functionally related gene sets identified from multiple experiments or omics platforms into clusters but also automatically determines the most representative gene set for each cluster.
      We implemented both weighted set cover and affinity propagation algorithms into an R package named Sumer. Sumer first reduces annotation redundancy in the results from an individual enrichment analysis using weighted set cover. It then clusters the results from multiple enrichment analyses using affinity propagation and provides tables, static and interactive plots, and downloadable results for exploration and publication. Sumer is flexible in allowing results from any gene set and any type of enrichment analysis. We use multiple examples to demonstrate its efficiency in gene set redundancy removal and its application to multi-omics and pan-cancer studies.

      DISCUSSION

      We have shown here a method to reduce gene set redundancy after enrichment analysis and to consolidate results from multiple enrichment analyses. First, removing annotation redundancy vastly reduces the overwhelming number of significant sets from a single enrichment analysis when the original database contained significant gene set redundancy and allows focused analyses on the most interesting gene sets. Affinity propagation clustering can then identify common themes from the remaining gene sets both in a single enrichment analysis and across multiple experiments. Although other related algorithms require manual interpretation of each gene set cluster, affinity propagation automatically recommends one exemplar for each gene set cluster.
      We demonstrated the utility of Sumer to integrate enrichment analyses from multi-omics data in a single study. Breast cancer can be separated into several subtypes based on gene expression. The basal and luminal subtypes have very different prognoses, treatment options, and outcomes. The enrichment analyses identified the well-known differences between the groups. The luminal A subtype tends to slowly proliferate and has better outcomes, whereas the basal subtype has an impaired DNA damage response and poor clinical outcomes (
      • Bertucci F.
      • Finetti P.
      • Cervera N.
      • Charafe-Jauffret E.
      • Buttarelli M.
      • Jacquemier J.
      • Chaffanet M.
      • Maraninchi D.
      • Viens P.
      • Birnbaum D.
      How different are luminal A and basal breast cancers?.
      ). This was recapitulated by the enrichment of cell cycle-related pathways (i.e. KEGG DNA Replication, Reactome Cell Cycle, KEGG Cell Cycle, and PID E2F Pathway) and the DNA damage-related pathways (i.e. PID Fanconi Pathway, PID ATR pathway, and Reactome DNA Repair) in the basal subtype. Integrating enrichment results from proteomics data emphasized the common up-regulation of cell cycle genes and DNA repair genes at both the transcription and translation levels. Finally, pathways specific to an omics type were highlighted by the clustering analysis. The Core Matrisome pathway, which contains core extracellular matrix genes, was more highly enriched in luminal samples solely in the proteomic data. This might suggest post-translational regulation of the proteins in this pathway.
      Sumer can further be used to integrate results from multiple studies. Poor survival in several cancer types correlated with high expression of ECM-related genes, indicating this may be a common mechanism across cancer. However, there were also differences among cancer types. Interestingly, the Retinoblastoma (RB) in Cancer Pathway was correlated with poor survival in lung adenocarcinoma and kidney cancer, but it correlated with good survival in colorectal cancer. The retinoblastoma gene product, RB, is a classic tumor suppressor and master cell cycle regulator. The gene is frequently mutated or deleted in cancer, including lung adenocarcinoma (
      • Greulich H.
      The Genomics of Lung Adenocarcinoma.
      ). However, the RB gene is frequently amplified in colorectal cancer and the protein is often overexpressed (
      • Yamamoto H.
      • Soh J.W.
      • Monden T.
      • Klein M.G.
      • Zhang L.M.
      • Shirin H.
      • Arber N.
      • Tomita N.
      • Schieren I.
      • Stein C.A.
      • Weinstein I.B.
      Paradoxical increase in retinoblastoma protein in colorectal carcinomas may protect cells from apoptosis.
      ,
      • Oliveira D.M.
      • Santamaria G.
      • Laudanna C.
      • Migliozzi S.
      • Zoppoli P.
      • Quist M.
      • Grasso C.
      • Mignogna C.
      • Elia L.
      • Faniello M.C.
      • Marinaro C.
      • Sacco R.
      • Corcione F.
      • Viglietto G.
      • Malanga D.
      • Rizzuto A.
      Identification of copy number alterations in colon cancer from analysis of amplicon-based next generation sequencing data.
      ). This may indicate differing function of the RB pathway in these different cancer types.
      A unique strength of affinity propagation is to automatically identify an exemplar for each gene set cluster. Our analysis in Fig. 6 demonstrated the statistical appropriateness of the selected exemplars. Nevertheless, the chosen exemplars may not always have the most biologically relevant names. For example, an exemplar in the pan-cancer survival study was the Human Immune Response to Tuberculosis pathway enriched from breast cancer. The response to tuberculosis may not describe the function of those genes in breast cancer. However, the other sets in the cluster clarify that the genes in that pathway are likely related to interferon signaling, which has been linked to cancer prognosis and survival (
      • Research AAfor, C
      Type I IFN Signaling in Cancer Cells Enhances Chemotherapy Responses.
      ).
      Importantly, Sumer allows for significant customization based on the user's preferences. We demonstrated the case of using Sumer to consolidate and aggregate gene sets based on user-defined gene sets. This provides focused analysis of the genes most significantly associated with gene sets, such as using only the leading edge genes from GSEA or the overlapping genes between the submitted list and the gene set from ORA analysis. Furthermore, Sumer accepts a user-defined weight for the weighted set cover algorithm to prioritize sets for consolidation and aggregation. This provides a significant advantage over the original set cover algorithm (i.e. weighted set cover with uniform weights) which prioritizes the largest gene sets (
      • Stoney R.A.
      • Schwartz J.-M.
      • Robertson D.L.
      • Nenadic G.
      Using set theory to reduce redundancy in pathway sets.
      ). However, the choice is left to the user to decide the best prioritization for their analysis. Furthermore, Sumer provides downloadable figures and result tables, allowing users to perform additional analyses or figure customization. Because Sumer simply takes tables of scores associated with gene sets and corresponding GMT files as input, it is compatible with different enrichment analysis tools.
      In summary, Sumer is a flexible tool for condensing and consolidating gene set analysis results from multi-omics or other types of integrative studies.

      REFERENCES

        • Ashburner M.
        • Ball C.A.
        • Blake J.A.
        • Botstein D.
        • Butler H.
        • Cherry J.M.
        • Davis A.P.
        • Dolinski K.
        • Dwight S.S.
        • Eppig J.T.
        • Harris M.A.
        • Hill D.P.
        • Issel-Tarver L.
        • Kasarskis A.
        • Lewis S.
        • Matese J.C.
        • Richardson J.E.
        • Ringwald M.
        • Rubin G.M.
        • Sherlock G.
        Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.
        Nat. Genet. 2000; 25: 25-29
        • The Gene Ontology Consortium
        Expansion of the Gene Ontology knowledgebase and resources.
        Nucleic Acids Res. 2017; 45: D331-D338
        • Kanehisa M.
        • Furumichi M.
        • Tanabe M.
        • Sato Y.
        • Morishima K.
        KEGG: new perspectives on genomes, pathways, diseases and drugs.
        Nucleic Acids Res. 2017; 45: D353-D361
        • Slenter D.N.
        • Kutmon M.
        • Hanspers K.
        • Riutta A.
        • Windsor J.
        • Nunes N.
        • Mélius J.
        • Cirillo E.
        • Coort S.L.
        • Digles D.
        • Ehrhart F.
        • Giesbertz P.
        • Kalafati M.
        • Martens M.
        • Miller R.
        • Nishida K.
        • Rieswijk L.
        • Waagmeester A.
        • Eijssen L.M.T.
        • Evelo C.T.
        • Pico A.R.
        • Willighagen E.L.
        WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.
        Nucleic Acids Res. 2018; 46: D661-D667
        • Fabregat A.
        • Jupe S.
        • Matthews L.
        • Sidiropoulos K.
        • Gillespie M.
        • Garapati P.
        • Haw R.
        • Jassal B.
        • Korninger F.
        • May B.
        • Milacic M.
        • Roca C.D.
        • Rothfels K.
        • Sevilla C.
        • Shamovsky V.
        • Shorser S.
        • Varusai T.
        • Viteri G.
        • Weiser J.
        • Wu G.
        • Stein L.
        • Hermjakob H.
        • D'Eustachio P.
        The Reactome Pathway Knowledgebase.
        Nucleic Acids Res. 2018; 46: D649-D655
        • Schaefer C.F.
        • Anthony K.
        • Krupa S.
        • Buchoff J.
        • Day M.
        • Hannay T.
        • Buetow K.H.
        PID: the Pathway Interaction Database.
        Nucleic Acids Res. 2009; 37: D674-D679
        • Liberzon A.
        • Subramanian A.
        • Pinchback R.
        • Thorvaldsdóttir H.
        • Tamayo P.
        • Mesirov J.P.
        Molecular signatures database (MSigDB) 3.0.
        Bioinformatics. 2011; 27: 1739-1740
        • Zhang B.
        • Kirov S.
        • Snoddy J.
        WebGestalt: an integrated system for exploring gene sets in various biological contexts.
        Nucleic Acids Res. 2005; 33: W741-W748
        • Subramanian A.
        • Tamayo P.
        • Mootha V.K.
        • Mukherjee S.
        • Ebert B.L.
        • Gillette M.A.
        • Paulovich A.
        • Pomeroy S.L.
        • Golub T.R.
        • Lander E.S.
        • Mesirov J.P.
        Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.
        Proc. Natl. Acad. Sci. U.S.A. 2005; 102: 15545-15550
        • Wang J.
        • Vasaikar S.
        • Shi Z.
        • Greer M.
        • Zhang B.
        WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit.
        Nucleic Acids Res. 2017; 45: W130-W137
        • Huang D.W.
        • Sherman B.T.
        • Lempicki R.A.
        Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.
        Nucleic Acids Res. 2009; 37: 1-13
        • Kuleshov M.V.
        • Jones M.R.
        • Rouillard A.D.
        • Fernandez N.F.
        • Duan Q.
        • Wang Z.
        • Koplev S.
        • Jenkins S.L.
        • Jagodnik K.M.
        • Lachmann A.
        • McDermott M.G.
        • Monteiro C.D.
        • Gundersen G.W.
        • Ma'ayan A.
        Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.
        Nucleic Acids Res. 2016; 44: W90-W97
        • Belinky F.
        • Nativ N.
        • Stelzer G.
        • Zimmerman S.
        • Iny Stein T.
        • Safran M.
        • Lancet D.
        PathCards: multi-source consolidation of human biological pathways.
        Database. 2015; (pii): bav006
        • Vivar J.C.
        • Pemu P.
        • McPherson R.
        • Ghosh S.
        Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and “Big data” biology.
        OMICS. 2013; 17: 414-422
        • Stoney R.A.
        • Schwartz J.-M.
        • Robertson D.L.
        • Nenadic G.
        Using set theory to reduce redundancy in pathway sets.
        BMC Bioinformatics. 2018; 19: 386
        • Hernández-de-Diego R.
        • Tarazona S.
        • Martínez-Mira C.
        • Balzano-Nogueira L.
        • Furió-Tarí P.
        • Pappas G.J.
        • Conesa A.
        PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data.
        Nucleic Acids Res. 2018; 46: W503-W509
        • Sass S.
        • Buettner F.
        • Mueller N.S.
        • Theis F.J.
        RAMONA: a Web application for gene set analysis on multilevel omics data.
        Bioinformatics. 2015; 31: 128-130
        • Bindea G.
        • Mlecnik B.
        • Hackl H.
        • Charoentong P.
        • Tosolini M.
        • Kirilovsky A.
        • Fridman W.-H.
        • Pagès F.
        • Trajanoski Z.
        • Galon J.
        ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks.
        Bioinformatics. 2009; 25: 1091-1093
        • Merico D.
        • Isserlin R.
        • Stueker O.
        • Emili A.
        • Bader G.D.
        Enrichment map: A network-based method for gene-set enrichment visualization and interpretation.
        PLoS ONE. 2010; 5: e13984
        • Frey B.J.
        • Dueck D.
        Clustering by passing messages between data points.
        Science. 2007; 315: 972-976
        • Jourquin J.
        • Duncan D.
        • Shi Z.
        • Zhang B.
        GLAD4U: deriving and prioritizing gene lists from PubMed literature.
        BMC Genomics. 2012; 13: S20
        • Vasaikar S.V.
        • Straub P.
        • Wang J.
        • Zhang B.
        LinkedOmics: analyzing multi-omics data within and across 32 cancer types.
        Nucleic Acids Res. 2018; 46: D956-D963
        • Hochbaum D.S.
        Hochbaum D.S. PWS Publishing Co., Boston, MA1997: 94-143
      1. Golab, L., Korn, F., Li, F., Saha, B., and Srivastava, D., (2015) in 2015 IEEE 31st International Conference on Data Engineering, pp 879–890, IEEE, Seoul, South Korea,

        • Bertucci F.
        • Finetti P.
        • Birnbaum D.
        Basal breast cancer: A complex and deadly molecular subtype.
        Curr. Mol. Med. 2012; 12: 96-110
        • Bertucci F.
        • Finetti P.
        • Cervera N.
        • Charafe-Jauffret E.
        • Buttarelli M.
        • Jacquemier J.
        • Chaffanet M.
        • Maraninchi D.
        • Viens P.
        • Birnbaum D.
        How different are luminal A and basal breast cancers?.
        Int. J. Cancer. 2009; 124: 1338-1348
        • Greulich H.
        The Genomics of Lung Adenocarcinoma.
        Genes Cancer. 2010; 1: 1200-1210
        • Yamamoto H.
        • Soh J.W.
        • Monden T.
        • Klein M.G.
        • Zhang L.M.
        • Shirin H.
        • Arber N.
        • Tomita N.
        • Schieren I.
        • Stein C.A.
        • Weinstein I.B.
        Paradoxical increase in retinoblastoma protein in colorectal carcinomas may protect cells from apoptosis.
        Clin. Cancer Res. 1999; 5: 1805-1815
        • Oliveira D.M.
        • Santamaria G.
        • Laudanna C.
        • Migliozzi S.
        • Zoppoli P.
        • Quist M.
        • Grasso C.
        • Mignogna C.
        • Elia L.
        • Faniello M.C.
        • Marinaro C.
        • Sacco R.
        • Corcione F.
        • Viglietto G.
        • Malanga D.
        • Rizzuto A.
        Identification of copy number alterations in colon cancer from analysis of amplicon-based next generation sequencing data.
        Oncotarget. 2018; 9: 20409-20425
        • Research AAfor, C
        Type I IFN Signaling in Cancer Cells Enhances Chemotherapy Responses.
        Cancer Discov. 2014; 4: 1365

      Linked Article