A Web-based Tool for in Silico Biomarker Discovery Based on Tissue-specific Protein Profiles in Normal and Cancer Tissues*S

Here we report the dev elopment of a publicly available Web-based analysis tool for exploring proteins expressed in a tissue- or cancer-specific manner. The search queries are based on the human tissue profiles in normal and cancer cells in the Human Protein Atlas portal and rely on the individual annotation performed by pathologists of images representing immunohistochemically stained tissue sections. Approximately 1.8 million images representing more than 3000 antibodies directed toward human proteins were used in the study. The search tool allows for the systematic exploration of the protein atlas to discover potential protein biomarkers. Such biomarkers include tissue-specific markers, cell type-specific markers, tumor type-specific markers, markers of malignancy, and prognostic or predictive markers of cancers. Here we show examples of database queries to generate sets of candidate biomarker proteins for several of these different categories. Expression profiles of candidate proteins can then subsequently be validated by examination of the underlying high resolution images. The present study shows examples of search strategies revealing several potential protein biomarkers, including proteins specifically expressed in normal cells and in cancer cells from specified tumor types. The lists of candidate proteins can be used as a starting point for further validation in larger patient cohorts using both immunological approaches and technologies utilizing more classical proteomics tools.

given tumor has (6).In typical cases a basic phenotype can be defined by using crude differentiation markers, e.g.cytokeratins for epithelial cells, vimentin for mesenchymal cells, and the leukocyte common antigen for cells of hematopoietic origin.A more in-depth analysis can furthermore be performed using available diagnostic antibodies; however, only rare cell typespecific antibodies have as of today been developed.The most widely used cell type-specific antibody detects the prostatespecific antigen (PSA) protein and is an excellent marker for tumors originating from the prostate (7).Other examples include tyrosinase-related proteins for melanocytic tumors (8), thyroglobulin as a marker for thyroid carcinomas (9), and chromogranin as a marker for various endocrine tumors (10).
Although today there exists a multitude of antibodies directed toward different protein targets, the actual number of antibodies used in clinical diagnostics is very limited.For most common cell types and corresponding tumors there are no available antibodies.To compensate for this, a panel of antibodies is often used to exclude certain tumors and to narrow down possible differential diagnosis, e.g.various keratin typespecific antibodies to differentiate between epithelial tumors (6).To determine a more precise diagnose and facilitate optimal therapeutic intervention, IHC is used in clinical diagnostics also when the type of tumor is known.This has been furthermost developed within the field of hematology where patterns of immunoreactivity from panels of antibodies are crucial for lymphoma and leukemia diagnostics.The knowledge base for solid tumors is more limited, and in clinical practice IHC-based diagnostics has been best developed for breast cancer, e.g.proliferation markers, hormonal receptors, and so-called predictive-type markers as HER-2neu status (11).
It can be anticipated that many more proteins are over-or underexpressed in various cancer cells and that the knowledge of specific protein expression patterns in a cancer would enable more precise and stratified diagnoses for most forms of cancer.The new Web-based tool described here is therefore aimed to aid in this process by providing a possibility to find proteins over-or underexpressed in specified normal or tumor cell populations using the protein atlas portal.In addition, it might be possible to search for proteins with differential immunoreactivity within a specified type of cancer, i.e. potential prognostic or predictive markers.The search functionality will be publicly available through the HPA portal, and this will be released together with a substantial increase of the number of proteins and images in the underlying database.

EXPERIMENTAL PROCEDURES
Antibodies and Immunohistochemistry-3015 antibodies (Supplemental Table 1) were used for immunohistochemical staining on sections from eight different TMAs including 576 tissue cores (8 ϫ 72).All spots of immunostained tissue are published as a publically available database, The Human Protein Atlas (www.proteinatlas.org).All antibodies were tested and validated with respect to performance for immunohistochemical staining on formalin-fixed and paraffin-embedded tissues.TMAs were generated, and tissue sections were immunostained as described previously (4).To provide a broad basis for the analysis, a total of 48 different normal tissue samples from 144 individuals and 432 tumor samples from 216 different cancer patients, with tumors representing the 20 most common cancer types, were included in the study.Following tests for antigen retrieval and antibody dilution, immunohistochemical staining was performed using a system for automated immunohistochemistry (Autostainer, DakoCytomation, Glostrup, Denmark).
Data Collection-To determine the level of expression for each protein in this study, antibodies were used to immunohistochemically stain human tissues assembled in TMA blocks.The stained TMA sections were scanned in high resolution scanners (ScanScope T2, Aperio Technologies, Vista, CA) and separated into individual spot images representing the different cores in the TMAs.All spot images were viewed and evaluated by pathologists using a Web-based annotation system.Parameters annotated included (i) intensity of immunoreactivity, (ii) fraction of positive cells, (iii) subcellular localization of the staining, and (iv) a free text box allowing for comments on the particular staining pattern.Annotation was performed for defined cell populations present in the different tissues, e.g.neurons and glial cells in brain tissues and squamous epithelial cells, germinal center cells, and other lymphoid cells in tonsil.All included normal cell types are listed in Table I.Cancer cells were annotated in all tumor tissues.Information regarding tissues, annotations, and images was stored in an MySQL database.
Data Extraction-The annotation parameters for protein expression levels corresponding to each cell type were extracted from the annotation database together with parameters from the biobank database regarding the age and sex of the patient, tissue type, and disease type of the tissue blocks.The annotation parameters for intensity and quantity (fraction of positive cells) were combined into a four-grade scale represented by the colors white for negative, yellow for weak, orange for moderate, and red for strong level of protein expression (Table II).The protein expression levels transformed into color codes for each cell type for each antibody were finally assembled in a new database optimized for queries on expression patterns.
Search Functions in the Database-The new advanced search function in the protein atlas was developed for users to enter queries based on protein expression levels in all the included normal and cancerous tissues.The users can also enter multiple criteria to find proteins with a high expression level in one tissue type but low or negative expression level in another tissue type.The end user's selection of criteria is converted into a structured query language (SQL) query that is sent to the MySQL database.The result of the SQL query (hits) is presented to the end user as a list of antibodies matching the requested pattern.

Extended Content in the Human Protein Atlas Database
The protein atlas portal (www.proteinatlas.org) has been extended to include more than double the amount of images and antibodies as compared with version 2.0 of the atlas, which contained 1,238,760 annotated, high resolution images corresponding to 1514 different antibodies.The new version 3.0 includes 3015 antibodies directed toward 2618 human protein targets and comprises altogether 2,827,440 annotated images.Approximately 50% of the antibodies have been generated through the HPA program as affinity-purified (monospecific) antibodies, whereas the remaining antibodies have been obtained from commercial antibody providers including both monoclonal and polyclonal antibodies.The number of gene/protein targets present on each human chromosome in the new version is displayed in Supplemental Fig. 1.

Development of a Web-based Search Tool
The new search tool was developed using an open source architecture (Java, PHP, and MySQL).A single criterion search would give the end users the possibility to query for e.g."which proteins are strongly expressed in breast cancer," and such a query would present a list of proteins matching this criterion.However, a multiple criteria search vitalizes in silico biomarker discovery because users can query for e.g."which proteins are strongly expressed in breast cancer AND negative in normal breast" resulting in a list of proteins up-regulated in breast cancer as compared with benign breast tissue.The advanced search functionality was developed to facilitate biomarker discovery-related queries and accepts queries based on any number of criteria combinations.All annotated normal and tumor cell types can be specified and combined with queries regarding level of expression (negative to strong immunoreactivity).To identify proteins overexpressed in a given type of tumor, the number of individual tumors showing a defined level of expression can also be specified.Several such searches can furthermore be combined using either "and" or "and not" to create a search string.In addition, the "tissue search" can be combined with search strings representing gene name, description, and words from text summaries of annotated antibodies ("free text search").A query generates a list of proteins fitting the criteria for the search string, and the underlying images can subsequently be explored using the links to the corresponding expression profiles.In the following paragraphs, results from various examples of different search strategies aiming to identify proteins that are overexpressed in specific normal cells or defined tumor types are described.All search strings and corresponding results are displayed in Table III.

Tissue-specific Expression in Normal Cells
The search function can be used to identify proteins specifically expressed in a defined cell type in normal tissues.As  truly cell or tissue type-specific proteins are rare, most such queries yield proteins predominantly but not exclusively expressed in the specified cell type.
Proteins Expressed in Glomeruli-In a first example, we searched for proteins highly expressed in the glomeruli of the kidney.The result yielded 11 hits (Fig. 1a) and included several proteins of interest to study further (Table III).A subset of these proteins were proteins known to be expressed in blood vessels, e.g.CD31, CD34, and VEGFR-2.The complement receptor 1 (CD35, encoded by TDGF1) previously known to expressed in glomeruli was also identified.CD35 showed a strong and specific expression pattern in glomeruli (Fig. 1b) consistent with findings in earlier studies showing tissue distribution in kidney glomerular podocytes (12).Distinct and strong immunoreactivity restricted to the glomeruli was evident already in the medium-sized images representing normal kidney tissue (Fig. 1c).The finer details of CD35 immunoreactivity can be studied in a higher magnification view, and the pattern of positive staining was consistent with a distribution of the complement receptor 1 protein in podocytes (Fig. 1d).In addition to CD35, the list of glomerular proteins also included proteins known to be involved in other organs and less well characterized for their role in glomerular function, e.g.CRIM1, a protein known for its role in the development of motor neurons and only recently described in kidney function (13).
Proteins Expressed in Hematopoietic Cells-A search for proteins with enhanced expression in the bone marrow resulted in 14 hits (Table III).Four examples, including three proteins previously known to be expressed in hematopoietic cells, showed distinct staining in different cell populations of the bone marrow (Fig. 2a).Aquaporin 1 (AQP1), a waterspecific channel protein that provides the plasma membranes of red cells and kidney proximal tubules with high permeability to water (14), CCAAT/enhancer-binding protein (CEBPE), a DNA-binding protein critical for myeloid differentiation (15), and erythroid transcription factor (GATA1), a transcriptional activator involved in erythroid and megakaryocytic differentiation ( 16) are examples of proteins involved in the normal function of bone marrow cells.The fourth example represents an uncharacterized protein (KIAA2022) not previously described to be expressed in the bone marrow.Immunostaining showed that the KIAA2022 protein appears to be expressed in megakaryocytes and in other rare hematopoietic cells as well as in streaklike, extracellular deposits.
Proteins Expressed in Endocrine Cells-The search function was tested to search for proteins expressed in cells with endocrine differentiation.This search yielded 18 hits including several known biomarkers for neuroendocrine differentiation, e.g.chromogranin-A, synaptophysin, and transthyretin (Table III).Chromogranin-A is a well known marker for cells with endocrine differentiation and is routinely used in clinical diagnostics to identify tumors with endocrine differentiation.Immunohistochemical staining patterns using antibodies directed toward chromogranin-A (CHGA) and tetraspanin-7 (TSPAN7) showed strong, mainly cytoplasmic positivity in cells constituting the pancreatic islets of Langerhans.Interestingly the unknown KIAA0323 protein and the reticulon-1 (RTN1) protein showed strong positivity in only a subset of endocrine cells islet cells.The distribution pattern was consistent with protein expression in alpha cells (glucagon-producing cells) and/or delta cells (somatostatin-producing cells).All these four proteins showed a distinct and mainly cytoplasmic positivity in carcinoid tumor cells (Fig. 2b).The reticulon family of proteins are localized primarily in the endoplasmic reticulum membrane, in particular tubular endoplasmic reticulum, and mainly expressed in neuroendocrine cells.Reticulons have been implied in cellular processes involving apoptosis and vesicle trafficking events, including regulated exocytosis.The restricted positivity, with a distribution pattern consistent with that of delta cells, of reticulon-1 in islet cells of Langerhans is intriguing as somatostatin is an inhibitor of growth hormone (somatotropin), and overexpression of a reticulon-1-C fragment has previously been shown to enhance the secretion of human growth hormone (17).Interestingly a member of the tetraspanin family (TSPAN7, CD231) showed a distinct and strong cytoplasmic immunoreactivity, with a hinted enhancement of positivity along cell membranes, in islet cells as well as in carcinoids.Tetraspanins serve as molecular organizers of multiprotein microdomains in cell membranes and regulate cell migration, fusion, and signaling events (18).Although the tetraspanin family of proteins is abundant and expressed in most cell types, most tetraspanin proteins have not been studied in detail.The distinct expression of both tetraspanin-7 and a previously uncharacterized protein KIAA0323 in endocrine cells requires further studies to establish their function and potential role in the endocrine system.

Protein Expression in Cancer Cells
The search function can also be used to identify proteins overexpressed in cancer cells.As cancers are heterogeneous,

TABLE III A summary of all search criteria used in this study
The tested search string is given together with the retrieved hits for respective search.

In Silico Biomarker Discovery Using a Web-based Tool
Molecular & Cellular Proteomics 7.5 829

In Silico Biomarker Discovery Using a Web-based Tool
individually different tumors will show different patterns of protein expression.Certain proteins, e.g.proteins related to cell proliferation, are expected to be overexpressed in virtually all forms of cancer, whereas other protein profiles are expected to be common within a given tumor type.Different strategies can be used to identify various "cancer-specific" proteins: (i) proteins expressed in a specific tumor type as compared with other tumor types, (ii) proteins expressed in both a specified type of cancer and the corresponding normal cell type, (iii) proteins overexpressed in a tumor as compared with the normal cell counterpart, (iv) proteins with general overexpression in malignant cells as compared with normal cells, and (v) proteins differentially expressed in defined tumor types.
Prostate Cancer-specific Expression-In a first example, we queried proteins overexpressed in prostate cancer as compared with breast and colorectal cancer (Table III).Nine proteins were found (Fig. 3), including the well known biomarkers kalikrein-3 (KLK-3; also known as PSA), prostatic acid phos-  1.A search for proteins expressed in the glomeruli of the human kidney.a, the figure shows a search string with three criteria that yields a "hit list" with 11 proteins.Each identified protein is shown with gene name, a short description, chromosomal location, links to other databases, antibody identity, and a validation score.b, a schematic overview of one of the identified proteins, the complement receptor 1 phatase (ACPP), and folate hydrolase-1 (FOLH1; also known as prostate-specific membrane antigen).Interestingly glutamic acid decarboxylase 1 (GAD-1), identified as a major autoantigen in insulin-dependent diabetes, showed selective expression in a high fraction of the prostatic carcinomas.The role of GAD-1 in prostate cancer is unclear; however, a marked expression of GAD-1 in prostates of cancer patients with metastasis compared with patients without metastasis has been shown in a previous study (19).Other proteins with marked overexpression in prostate cancer and benign prostatic glands included FRAP-1, a target for the cell cycle arrest and immunosuppressive effects of rapamycin, and an essentially unknown transcription factor HOXB13.The results should provoke scientific efforts to establish the eventual role of proteins such as GAD-1, FRAP-1, and HOXB13 as clinically useful biomarkers for prostate cancer patients.
Glioma-specific Expression-With the aim to search for proteins overexpressed in both cancer and the corresponding normal cell counterpart we explored proteins expressed in normal brain tissues and malignant brain tumors (glioma).This search yielded 15 proteins, including several well known markers of neural differentiation, e.g.glial fibrillary acidic protein (GFAP), S100, and neural adhesion molecule (NCAM) proteins (Table III).The list also included several proteins known to be involved in normal development and brain function, e.g.synaptosomal protein SNAP-25, RTN4, and the ELAV-like protein 4. Reticulon-4 (NOGO-A protein), a member of the reticulon family of integral membrane proteins, is predominantly located in the endoplasmic reticulum, although expression of this protein on the cell surface also appears to play a critical role as an inhibitory molecule for axonal growth and regeneration in humans and rodents.RTN4 has also been shown to be involved in neurodegenerative disease and autoimmune-mediated demyelination (20).The ELAV-like protein 4 is involved in neuron-specific RNA processing, is expressed in brain tissues, and has recently also been implicated in the commitment and differentiation of neuronal precursors (21).Furthermore ELAV-like protein 4 expression has also been found in small cell lung cancer where it is believed to represent an independent marker or determinant of neuroendocrine differentiation (22).Examples of protein expression patterns using antibodies recognizing ELAV-like protein 4, GFAP, NCAM-1, and RTN4 showed a distinct high level of protein expression in tumor cells from malignant gliomas and in various normal cells in tissues representing frontal cerebral cortex (Fig. 4a).
Lymphoma-specific Expression-Another example of a search aiming to find proteins overexpressed in both benign and malignant lymphoid cells was tested and resulted in five hits (Table III).One of these proteins was the well known and clinically used leukocyte common antigen CD45 (PTPRC).Proteins involved in T-cell development, e.g. the lymphocyte cytosolic protein 2 (LCP2) and the linker for activation of T-cells (LAT) as well as the B-cell-activating factor receptor (CD268), were also identified.On the list was also the receptor-associated protein-tyrosine kinase LYN known to be expressed in primary neuroblastoma tumors.LYN is an oncogene that encodes an intracellular signaling molecule that has recently been described as a protein expressed in normal and neoplastic human leukocytes (23).Examples of different protein expression patterns in malignant lymphoma and normal lymph nodes using antibodies directed toward LAT, LCP2, LYN, and CD45 are displayed in Fig. 4b.
Colorectal Cancer-specific Expression-To search for proteins that are expressed at higher levels in malignant cells as compared with the benign counterparts, we tested a search for proteins overexpressed in tumor cells from colorectal cancer as compared with normal glandular cells of the gastrointestinal tract.This search resulted in 11 hits and included proteins expressed in proliferating cells, e.g.cyclin B1 (CCNB1) and a serine/threonine-protein kinase involved in chromosomal segregation during mitosis (AURKB) (Table III).The list also included other proteins previously shown to be involved in host defense as well as in tumorigenesis, e.g. the macrophage migration-inhibitory factor (MIF), a protein that plays an important role in the immune system but also has been shown to promote tumor invasion and metastasis (24), and myeloperoxidase, a heme protein synthesized during myeloid differentiation that constitutes a major component of neutrophil granules (25).On the list there was also a nucleolar protein, dyskerin (DKC1), that showed strong nucleolar positivity in a large fraction of colorectal carcinomas.Dyskerin functions as a pseudouridine synthase and has been implicated in both telomerase function and ribosomal RNA processing.The DKC1 gene has been implicated in a rare congenital skin disorder, X-linked dyskeratosis congenita (26), and patients with dyskeratosis congenita show an increased susceptibility to cancer.However, data regarding dyskerin changes in human tumors is scarce, although altered levels of dyskerin expression have been associated with tumor progression (27).The DNA-binding transcription factor SOX-9, a fundamental testis differentiation gene common to all verte-(TDGF1).The color codes represent different levels of expression, i.e. intensity and extent of immunoreactivity: red, strong staining; orange, moderate staining; yellow, weak staining; white, negative staining.The color code black is used to mark missing or non-representative tissues.c, the immunohistochemistry images for three different individuals are shown for TDGF1.Strong immunostaining (brown color) is observed for two individuals with positive staining present in glomeruli from the renal cortex, whereas the third, negative sample represents an area within the renal cortex lacking glomeruli.d, a higher resolution image showing that the TDGF1 expression is restricted to the glomerulus where positivity appears in a lacelike pattern consistent with the TDGF1 protein being expressed in podocytes.The proximal renal tubules surrounding the glomerulus are negative and only show the blue color of hematoxylin counterstaining.III for search criteria.a, four examples of immunohistochemistry images of proteins expressed in human bone marrow.The AQP1 protein shows cytoplasmic positivity in clusters of cells in the bone marrow (top) with strong positivity in immature hematopoietic cells.The distribution pattern of positive cells in bone marrow is consistent with immature hematopoietic cells, e.g. cells involved in erythropoiesis or myelopoiesis.The week positivity in red blood cells is consistent with the function of aquaporin-1 involved in permeability of water in red blood cells.The CEBPE protein (second from top) shows strong, nuclear positivity in clusters of hematopoietic cells with a distribution pattern similar to that of aquaporin-1, although CEBPE appears positive in more mature forms of myelopoietic cells and negative in red blood cells.The overall picture is consistent with CEBPE protein being expressed in both early and late stages of myelopoiesis.The erythroid transcription factor GATA-1 (second from bottom) displays a nuclear positivity in a clustered distribution pattern consistent with that of erythropoietic cells in a normal bone marrow sample.Mature, non-nucleated red blood cells appear negative as would be expected for a transcription factor.The previously uncharacterized protein, KIAA2022 (bottom), shows strong cytoplasmic positivity in mature megakaryocytes.In addition, there are thin streaks of extracellular positivity and positive rare dispersed mature and immature cells in the bone marrow.b, four examples of proteins expressed in both in pancreatic islets of Langerhans (left column) and in carcinoid tumors (right column).The CHGA protein (top) is an established marker for neuroendocrine differentiation and here shows strong cytoplasmic positivity in all cells within an islet of Langerhans as well as rare cells dispersed within the exocrine glands of the pancreas (top, left).Tumor cells from a carcinoid tumor also show strong cytoplasmic positivity consistent with that of a tumor showing endocrine differentiation (top, right).The uncharacterized protein, KIAA0323 (second from top), shows strong cytoplasmic positivity in a subset of the cells in pancreatic islets of Langerhans (second from top, left).The amount and distribution of the KIAA0323-positive cells resemble that of alpha cells (15-20%) or delta cells (3-10%).A much weaker and partly granular cytoplasmic immunoreactivity is also present in the other islet cells.In the exemplified carcinoid tumor, all tumor cells show strong cytoplasmic positivity (second from top, right).The RTN1 protein (second from bottom) shows a similar pattern of cytoplasmic positivity in a few cells within an islet of Langerhans.In addition, the whole islet appears to be very weakly positive.The amount and distribution of the strongly positive cells are consistent with that of delta cells.Carcinoid tumor cells show a distinct and strong cytoplasmic positivity (second from bottom, right).The TSPAN7 protein (bottom) shows a general and strong cytoplasmic and membranous immunoreactivity in the islets of Langerhans.Similar to that of chromogranin-A, all endocrine cells in the pancreas appear strongly stained (bottom, left).The carcinoid tumor cells also show strong cytoplasmic staining with a tendency to enhanced membranous positivity (bottom, right).

FIG. 3. Examples of proteins specifically expressed in prostate cancer as compared with other forms of cancer.
A schematic summary view of expression levels in all analyzed cancers (216 different tumors), with a black arrow highlighting the 12 prostate cancers, is shown for six proteins found to be highly expressed in prostate cancer as compared with breast and colorectal cancer.See Table III  brates, also showed strong nuclear positivity in colorectal cancer as compared with normal differentiated colonic epithelia.Similar to findings in normal prostatic glands and pros-tate carcinoma (28), basal cells in normal colonic crypts also showed a relatively high level of SOX-9 expression.SOX-9 was also identified in a more generalized search for proteins is similar to that of CD3 (a routinely used marker for T-cells)-positive cells in a lymph node.The malignant lymphoma shows strong positivity for LAT in a large majority of lymphoma cells, whereas LCP2 protein shows strong positivity in a subset of cells in a malignant lymphoma.The receptor-associated protein-tyrosine kinase LYN antibody (second from bottom) and the CD45 antibody (recognizing the leukocyte common antigen or PTPRC) (bottom) display strong cytoplasmic and membranous positivity within the lymph node with a vast majority of cells positive both within and outside germinal centers.All analyzed malignant lymphomas were also positive for both of these proteins.III for search criteria) were created to identify proteins differentially expressed in cancer.a, examples of immunohistochemistry images for negative (left) and positive (right) lung adenocarcinoma tumor cells for overexpressed in cancer cells as compared with normal cells (see below).Examples of protein expression patterns, where tumor cells have more abundant positivity compared with corresponding normal cells, using antibodies toward cyclin B1, dyskerin, and SOX-9 proteins are shown in Fig. 5a.

FIG. 6. Examples of proteins with a potential to be markers of prognostic or predictive significance for lung cancer (a), colorectal cancer (b), and breast cancer (c). The underlying search strings (see Table
Epithelial Cancer-specific Expression-In a more generalized search for markers of epithelial tumors, we found 14 proteins with enhanced expression in cancer (Table III).Among these proteins were well known markers of malignancy, e.g. the tumor suppressor p53 (TP53) and the proliferation marker Ki-67.The p53 gene is a well known cancerrelated gene, and a multitude of studies have shown p53 mutations and overexpression of the mutated protein in a large fraction and variety of different human cancers.Ki-67 (MKI67), a clinically used marker for cells active in the cell cycle, was not unexpectedly found to be overexpressed in cancer.On the list was also the ␥2 subunit of laminin-5 (LAMC2).Laminins belong to a family of extracellular matrix glycoproteins, a major non-collagenous constituent of basement membranes, which have been implicated in a wide variety of biological processes including cell adhesion, differentiation, migration, signaling, neurite outgrowth, and metastasis.Although basement membranes appear as barriers to invasive growth of a tumor, basement membrane molecules such as laminins may also act as important autocrine factors produced by cancer cells to promote tumorigenesis (29).Two transcript variants of LAMC2 have been described; however, the biological significance of the two forms is not known.In addition less well studied transcription factors, representing proteins that may be involved in the development and growth of cancer, appeared on the list, e.g.ZNF452, TAF7, and Q9HCY4.
A few examples of protein expression patterns in tumor cells from urothelial carcinomas and normal surface urothelium from urinary bladder using antibodies directed toward the LAMC2, MK167, and TP53 proteins are shown in Fig. 5b.
Potential Prognostic Markers of Cancer-Finally we used the search function to query for potential prognostic or predictive cancer markers.In this case, we searched for proteins highly expressed in a subset of tumors within a given cancer type but absent or with only weak expression in other tumors of the same cancer type.As the protein atlas database contains expression data and IHC images from 12 individually different tumors for each distinct tumor type, it is possible to search for proteins differentially expressed in a defined tumor type.However, because the number of analyzed tumors is relatively low, 12, identified proteins must be validated in larger patient cohorts to establish their potential role as prognostic or predictive markers.
First lung cancer was probed with a search for differentially expressed proteins (Table III).This resulted in 20 hits, including proteins involved in the immune system and host defense, e.g.C-reactive protein, a C-type lectin (CLEC4A), and an HLA class II histocompatability protein (HLA-DPB1).On this list were also secreted proteins related to normal lung function, e.g.pulmonary surfactant proteins (SFTPa1 and SFTPB), proteins related to cell proliferation (cyclin D1 and RAD17), and proteins predominantly expressed in the brain (synaptic vesicle protein 2 and a ␥-aminobutyric acid receptor-associated protein).The latter proteins appeared to be expressed in lung tumors showing neuroendocrine features.Interestingly there were also a few less well characterized proteins that showed differential expression in lung cancer.These included an RNA-binding signal transduction-associated protein (KH-DRBS3) that showed strong nuclear positivity in a subset of adenocarcinomas of the lung and a tyrosine kinase activator protein (SLC9A3R2) that showed strong cytoplasmic positivity in certain lung adenocarcinomas (Fig. 6a).
In an attempt to find differentially expressed proteins in an RNA-binding (KHDRBS3) protein (top), a pulmonary surfactant (SFTPB) protein (middle), and a tyrosine kinase activator (SLC9A3R2) protein (bottom).For KHDRBS3, one adenocarcinoma is negative for all tumor cell nuclei, whereas the other moderately differentiated lung adenocarcinoma shows strong nuclear positivity for a majority of the tumor cells.For SFTPB, one lung adenocarcinoma is negative, whereas the other shows strong cytoplasmic positivity of the tumor cells.For SLC9A3R2, one example of a negative squamous cell carcinoma is shown, whereas the other moderately differentiated lung adenocarcinoma shows a heterogeneous pattern of cytoplasmic positivity in tumor cells.b, examples of immunohistochemistry images for negative (left) and positive (right) colorectal carcinomas cells for tyrosine-protein kinase CSK (top), MIF (middle), and PIGR (bottom).For the CSK protein (top) many of the adenocarcinomas were negative, whereas other tumors were positive.In an example of a colon adenocarcinoma, strong cytoplasmic positivity is found in a coarse, granular pattern with immunoreactivity mainly localized in the apical, luminal areas of the atypical glandular-like tumor nests (top, right).The MIF protein showed a very weak and indistinct staining pattern in a few tumors exemplified by one moderately differentiated adenocarcinoma (middle, left).Other cases of adenocarcinomas of the colon showed stronger positivity with more widespread cytoplasmic immunoreactivity in tumor cells and in occasional inflammatory cells in the tumor stroma (middle, right).The PIGR protein showed clearly different staining patterns in analyzed colorectal carcinomas.Approximately 50% of the analyzed tumors were negative, exemplified by an adenocarcinoma showing totally negative tumor cells (bottom, left), whereas the remaining tumors showed strong positivity, exemplified by an adenocarcinoma in the colon showing strong cytoplasmic immunoreactivity in a vast majority of the tumor cells (bottom, right).c, two examples of immunohistochemistry images for negative (left) and positive (right) breast cancer cells for the BCL-2-binding protein BAG1 (top) and the matrix-associated galectin-1 protein (LGALS1) (bottom).The BAG1 protein is negative in a case of ductal adenocarcinoma of the breast (top, left).Other breast carcinomas were positive, exemplified by a ductal breast carcinoma showing strong nuclear positivity in a large majority of the tumor cells (top, right).Immunoreactivity using the galectin-1 antibody was mainly found in extracellular locations including tumor stroma, whereas tumor cells were negative (bottom, left).A subset of the breast cancers also showed a positive staining in tumor cells.In the given example, moderate to strong cytoplasmic immunoreactivity is evident in tumor cells from a ductal breast adenocarcinoma (bottom, right).

In Silico Biomarker Discovery Using a Web-based Tool
Molecular & Cellular Proteomics 7.5 841 colorectal cancer, we performed a search resulting in four hits (Table III).Two of these were proteins found also in other searches for colon cancer markers, e.g.MIF expressed in the cytoplasm of tumor cells (Fig. 6b) and the cancer-related protein p53.The two additional proteins included the polymeric immunoglobulin receptor precursor (PIGR) known to be expressed in several glandular epithelia, and the Src family tyrosine kinase regulatory protein tyrosine-protein kinase CSK implicated in human cancer.Both these proteins were expressed in the cytoplasm of tumor cells from a subset of the analyzed colorectal carcinomas (Fig. 6b).
A similar search was performed for differentially expressed proteins in breast cancer.The search resulted in nine hits including several proteins not previously implicated in breast cancer (Table III).The list did, however, not include well known markers that are used within the field of breast cancer diagnostics, e.g. the estrogen receptor and the HER-2 protein, because the respective expression profiles in the analyzed 12 breast cancers for these two markers fell outside the tested search criteria (Ն5 of 12 tumors showing strong expression combined with Ն4 of 12 tumors showing weak or negative expression).Two differentially expressed proteins included the BAG1 and the galectin-1 (LGALS1) protein, and examples of protein expression patterns in different tumors representing ductal breast cancer are shown in Fig. 6c.The finding of BAG1, a co-chaperone for the heat-shock protein Hsp70 involved in regulation of cell proliferation, apoptosis, and stress response through interactions with C-Raf, B-Raf, Akt, Bcl-2, steroid receptors, and other proteins, is intriguing.BAG1 has been shown to play an important role in survival of neurons and hematopoietic cells and has also been implicated in breast cancer (30); however, the more precise function of this protein in breast cancer is unknown.

DISCUSSION
Here we describe a new Web-based tool for in silico biomarker discovery using the protein atlas portal.The advanced search function enables database queries where search strings can be combined to identify proteins with specific expression patterns.This allows flexible searches to discover proteins differentially expressed in various normal and/or cancer tissues.The possibility to assemble complex queries enables a focus on proteins of interest for a particular biological question at issue.The present study provides examples of search strings to discover potential protein biomarkers, such as tissue-specific markers, cell type-specific markers, tumor type-specific markers, markers of malignancy, and prognostic markers of cancers.
In conjunction with the launch of this search tool, more than 1500 new antibodies are published at the Human Protein Atlas portal bringing the total number of protein targets to 2618.Although the content of the atlas has doubled from the previous version 2.0, it is worth pointing out that the new extended content only represents 11.5% of all the non-redun-dant human proteins, consisting at present of 22,680 genes (Ensembl version 46.36).It is of course straightforward to add new proteins to the analysis as more validated protein probes are being generated by various international efforts (31,32).Despite the limitation of content in the protein atlas, the examples of search strings described here yielded several interesting proteins possible as targets for future studies of both normal cell functions and cancer development.It is reassuring that many of the search strings for cancer-related proteins resulted in lists including proteins with an established role in cancer, e.g. the tumor suppressor p53 and the proliferation marker Ki-67, in addition to proteins not previously described to be implicated in a particular type of tumor or to be expressed in a cancer-specific manner.
These examples demonstrate the usefulness of the in silico search approach of the protein atlas portal or screening.However, it is important to point out that the underlying data of the portal is based on staining patterns from immunohistochemistry and that such data is semiquantitative and depends on several parameters including tissue processing and antigen retrieval protocols as well as binding characteristics and concentration of the tested antibody.Therefore, as for all screening strategies, the identified proteins need to be further analyzed to scrutinize their potential role and function in respective normal and cancer cells.For the potential cancer biomarkers, the most obvious validation is to extend the expression analysis to larger patient cohorts with well characterized clinical data.To evaluate the potential prognostic markers it is necessary to assemble a cohort of patients stratified according to histopathological criteria known to be of significance for the prognosis of a given cancer, e.g.Elston scores for breast cancer and Gleason grades for prostate cancer (33,34).For certain tissue-specific markers, studies could be extended using more quantitative biochemical or immunological methods to initiate functional studies to extend the value of the discovery phase described here.
It is also interesting to compare the results from the protein expression data with the numerous studies based on up-and down-regulation of RNA in different tissues and cells.In this context, microarray-based transcriptional profiling has been used to create databases for global mRNA expression patterns, such as the Novartis human gene expression map (35), the Oncomine cancer database (36), and the Allen Brain Atlas (37), and such studies have led to the identification of a large number of genes implicated in various diseases (38 -40).The RNA data can therefore be used to further validate the targets but also indicate gene products of interest for the generation of antibodies for inclusion of such proteins into the protein atlas.Several examples exist where differences in transcription profiles have suggested that specified types of cancer, as defined through conventional histology, actually include several different subtypes, e.g.breast cancer (41,42).However, the use of cDNA microarrays in a diagnostic setting has limitations including reproducibility, need for fresh tissue, and lack of simultaneous morphological control, all of which IHC has a potential to resolve.
The effort to create a Web-based tool for in silico biomarker discovery described in this study can be expanded in various directions.First, it is possible to also include the data from the human cell lines already present in the HPA portal.At present, each protein is analyzed in 48 human cell lines and 12 primary cells of hematopoietic origin (3,43).The expression profiles of these cells are of course also important to include in future versions of the search function for comparative studies including cells in vivo and in vitro.Second, it is possible to also include data on the stratification of each cancer into the analysis, and we therefore aim to release the underlying histopathological data for each patient sample, which would allow for extended queries based on data related to prognosis.It might also be interesting to include search criteria based on protein classes, e.g.kinases, membrane proteins, and transcription factors, to limit the queries to a given type of protein.Finally it might be useful to also include chromosomal location as part of the query to search for proteins on a particular chromosome or in a particular chromosomal region.This could be relevant when analyzing the consequence of major chromosomal aberrations, e.g. a large deletion or when genetic epidemiology studies have shown a gene of interest in a particular chromosomal region.We plan to extend the search possibilities in these directions in the near future.
In summary, we have developed a novel search tool for the analysis of tissue-and cell type-specific proteins based on images representing immunohistochemically stained tissues in the Human Protein Atlas portal.The value of these search possibilities will increase as the content of the atlas grows.It is thus reassuring that the underlying antibody-based proteomics approach can be scaled up to include tissue profiles representing tens of thousands of antibodies.We believe that the search queries presented in this study may constitute a valuable resource to better define the proteomic landscape in tissues, support the discovery of new diagnostic and therapeutic tools, and enhance opportunities for basic biological and medical research.
FIG. 2. A search for proteins expressed in bone marrow cells (a) and proteins expressed in normal and malignant endocrine cells (b), respectively.See TableIIIfor search criteria.a, four examples of immunohistochemistry images of proteins expressed in human bone marrow.The AQP1 protein shows cytoplasmic positivity in clusters of cells in the bone marrow (top) with strong positivity in immature hematopoietic cells.The distribution pattern of positive cells in bone marrow is consistent with immature hematopoietic cells, e.g. cells involved in erythropoiesis or myelopoiesis.The week positivity in red blood cells is consistent with the function of aquaporin-1 involved in permeability of water in red blood cells.The CEBPE protein (second from top) shows strong, nuclear positivity in clusters of hematopoietic cells with a distribution pattern similar to that of aquaporin-1, although CEBPE appears positive in more mature forms of myelopoietic cells and negative in red blood cells.The overall picture is consistent with CEBPE protein being expressed in both early and late stages of myelopoiesis.The erythroid transcription factor GATA-1 (second from bottom) displays a nuclear positivity in a clustered distribution pattern consistent with that of erythropoietic cells in a normal bone marrow sample.Mature, non-nucleated red blood cells appear negative as would be expected for a transcription factor.The previously uncharacterized protein, KIAA2022 (bottom), shows strong cytoplasmic positivity in mature megakaryocytes.In addition, there are thin streaks of extracellular positivity and positive rare dispersed mature and immature cells in the bone marrow.b, four examples of proteins expressed in both in pancreatic islets of Langerhans (left column) and in carcinoid tumors (right column).The CHGA protein (top) is an established marker for neuroendocrine differentiation and here shows strong cytoplasmic positivity in all cells within an islet of Langerhans as well as rare cells dispersed within the exocrine glands of the pancreas (top, left).Tumor cells from a carcinoid tumor also show strong cytoplasmic positivity consistent with that of a tumor showing endocrine differentiation (top, right).The uncharacterized protein, KIAA0323 (second from top), shows strong cytoplasmic positivity in a subset of the cells in pancreatic islets of Langerhans (second from top, left).The amount and distribution of the KIAA0323-positive cells resemble that of alpha cells (15-20%) or delta cells (3-10%).A much weaker and partly granular cytoplasmic immunoreactivity is also present in the other islet cells.In the exemplified carcinoid tumor, all tumor cells show strong cytoplasmic positivity (second from top, right).The RTN1 protein (second from bottom) shows a similar pattern of cytoplasmic positivity in a few cells within an islet of Langerhans.In addition, the whole islet appears to be very weakly positive.The amount and distribution of the strongly positive cells are consistent with that of delta cells.Carcinoid tumor cells show a distinct and strong cytoplasmic positivity (second from bottom, right).The TSPAN7 protein (bottom) shows a general and strong cytoplasmic and membranous immunoreactivity in the islets of Langerhans.Similar to that of chromogranin-A, all endocrine cells in the pancreas appear strongly stained (bottom, left).The carcinoid tumor cells also show strong cytoplasmic staining with a tendency to enhanced membranous positivity (bottom, right).

FIG. 4 .
FIG. 3. Examples of proteins specifically expressed in prostate cancer as compared with other forms of cancer.A schematic summary view of expression levels in all analyzed cancers (216 different tumors), with a black arrow highlighting the 12 prostate cancers, is shown for six proteins found to be highly expressed in prostate cancer as compared with breast and colorectal cancer.See TableIIIfor search criteria and legend to Fig. 1 for color codes.An example of each of the immunohistochemistry images for prostate cancer, breast cancer, and colorectal cancer is shown.All six examples show a uniform and strong positivity in prostate cancer tumor cells with essentially negative surrounding tumor stroma.Of the six examples, KLK-3 (also known as PSA) (bottom row), ACPP (top row), and FOLH1 (also known as prostate-specific membrane antigen) (second from top row) are all expected biomarkers of prostate cancer.The other three proteins (FRAP-1, GAD-1, and HOXB13) are less known in this context, and they all display highly specific expression in prostatic cancer as compared with other forms of cancer.In the given examples, FRAP-1 (third from top row) and GAD-1 (third from bottom row) show strong cytoplasmic staining in prostate cancer and negative immunostaining in breast and colorectal cancers.The HOXB13 (second from bottom row) immunostaining shows an example of strong nuclear positivity in tumor cells from a prostate cancer, whereas tumor cell nuclei are negative in the other cancer types.

FIG. 5 .
FIG. 5. Examples of proteins expressed specifically in colorectal cancer and not in normal glandular cells in the colon (a) and proteins expressed more generally in many cancers (b).See Table III for search criteria.a, examples of three proteins not expressed in normal epithelia of the colon but strongly expressed in tumor cells from colorectal carcinomas.The CCNB1 protein (top) displays cytoplasmic and nuclear positivity in a few dispersed normal glandular cells located in the basal crypts of the normal colonic mucosa, whereas the adenocarcinomas show strong, mainly cytoplasmic positivity in a majority of the tumor cells.The nucleolar dyskerin (DKC1) protein (middle) shows a weak immunoreactivity in nucleoli in normal glandular cells from the colonic mucosa, whereas the adenocarcinomas from colon show distinct and strong dotlike positivity in tumor cell nuclei.The transcription factor SOX-9 (bottom) shows positive nuclei in cells constituting the basal part of colonic crypts, and the differentiated glandular cells are negative, whereas the colon cancer shows strong nuclear positivity in all tumor cells.b, three examples of proteins obtain by a search for proteins overexpressed in cancer cells as compared with normal cells exemplified by immunohistochemistry images from the surface urothelium of normal urinary bladder and urinary bladder carcinoma.The laminin-5 (LAMC2) protein (top) shows a negative staining pattern in normal surface urothelium, whereas tumor cells from a high grade urothelial carcinoma show widespread, heterogeneous, and strong cytoplasmic positivity in tumor cells.The proliferation marker Ki-67 (MKI67) protein (middle) shows a few scattered positive nuclei in normal cells from the basal areas of the urothelium using MKI67 antibodies.The pattern of immunoreactivity is consistent with that of normal proliferating cells in the urinary bladder.Strong nuclear positivity in ϳ50% of the tumor cells is shown in a case of high grade urothelial carcinoma consistent with a cancer showing a high fraction of proliferating tumor cells.The well known tumor suppressor protein p53 (bottom) shows negative surface urothelium in the normal human urinary bladder, whereas the urothelial carcinoma tumor cells show strong nuclear positivity, most likely the result of an underlying mutation of the p53 gene.

TABLE I -
continued

TABLE II
The four-grade scale used for annotation combining intensity of immunoreactivity and fraction of positive cells a Not applicable.

TABLE III -
continued

TABLE III -continued In Silico Biomarker Discovery Using a Web-based Tool Molecular & Cellular Proteomics 7.5 833
FIG.