A Human Protein Atlas for Normal and Cancer Tissues Based on Antibody Proteomics*

Antibody-based proteomics provides a powerful approach for the functional study of the human proteome involving the systematic generation of protein-specific affinity reagents. We used this strategy to construct a comprehensive, antibody-based protein atlas for expression and localization profiles in 48 normal human tissues and 20 different cancers. Here we report a new publicly available database containing, in the first version, ∼400,000 high resolution images corresponding to more than 700 antibodies toward human proteins. Each image has been annotated by a certified pathologist to provide a knowledge base for functional studies and to allow queries about protein profiles in normal and disease tissues. Our results suggest it should be possible to extend this analysis to the majority of all human proteins thus providing a valuable tool for medical and biological research.

bodies directed to a particular target protein allow numerous functional assays to be performed ranging from conventional ELISA assays to detailed localization studies using fluorescent probes and protein capture experiments ("pull-down") for purification of specific proteins and their associated complexes for structural and biochemical analyses (4,5).
The challenge for antibody-based proteomics is to move from a conventional protein-by-protein approach into a high throughput mode to allow chromosome wide analysis (6,7). Technical challenges involve both the antigen production and the subsequent generation and characterization of the antibodies. In addition, methods for systematic protein profiling on a whole proteome level are lacking. Agaton et al. (8) showed that the combination of the cloning and expression of recombinant protein fragments with immunohistochemistry analysis could be used for systematic protein expression and subcellular localization describing distribution and expression of putative gene products in normal tissues as well as in common cancers and other forms of diseased tissues. Recently Nilsson et al. (9) showed that this strategy could be further improved by a streamlined approach for affinity purification of the antibodies to generate monospecific antibodies (msAbs) 1 and the subsequent validation of the specificity of these antibodies by protein microarrays.
The use of tissue microarrays (TMAs) generated from multiple biopsies combined into single paraffin blocks enabled high throughput analysis of protein expression in various tissues and organs (2). Recently Kampf et al. (10) showed that high throughput analysis of protein expression can be performed with a standard set of tissue microarrays representing both normal and cancer tissues. The TMA technology provides an automated array-based high throughput technique where as many as 1000 paraffin-embedded tissue samples can be brought into one paraffin block in an array format.
We show that a comprehensive atlas of human protein expression patterns in normal and cancer tissues can be created by combining the methods mentioned above. A set of standardized TMAs was produced to allow for rapid screening of a multitude of different tissues and cell types using immunohistochemistry. Each antibody was used to screen a multitude of normal human tissues and cancer tissue from individually different tumors. Altogether 576 high resolution digital images corresponding to a total of 20 gigabytes of data were collected for each antibody.
Both "in-house" generated monospecific antibodies and antibodies from commercial sources were used for the profiling of a large number of protein targets. Altogether more than 700 proteins were analyzed representing all major types of protein families, i.e. protein receptors, kinases, phosphatases, transcription factors, and nuclear receptors. The results are presented as a publicly available protein atlas database, and the data suggest that it should be possible to extend this analysis to most or all human proteins. This approach could also quite effectively be used for generation of expression data for model animals such as mouse, rat, and chimpanzee. A valuable tool for medical and biological research can thus be envisioned as a complement to genome and transcript profiling data.

EXPERIMENTAL PROCEDURES
Generation of Antigens-Suitable protein epitope signature tags (PrESTs), representing unique regions for each target protein, were designed using bioinformatic tools (12) and with the human genome sequence as template (EnsEMBL database). In the design of the PrESTs, transmembrane regions and signal peptides were avoided, and an amino acid sequence with a size between 100 and 150 amino acids and low homology to other human proteins was selected to decrease the risk of cross-reactivity of antibodies to other human proteins. The cloning, protein expression, immunization, and affinity purification to yield monospecific antibodies were performed as described previously by Agaton et al. (8) and Nilsson et al. (9).
PrEST Arrays-The PrESTs were diluted to 40 g/ml in 0.1 M urea and 1ϫ PBS (pH 7.4), and 50 l of each PrEST was transferred to a 96-well spotting plate. The PrESTs were subsequently spotted and immobilized onto epoxide slides (Corning Life Sciences) using a pinand-ring arrayer (Affymetrix 427). The slides were washed in 1ϫ PBS (5 min), and then the surface was blocked with SuperBlock (Pierce) for 30 min. An adhesive 16-well silicone mask (Schleicher & Schuell) was applied to the glass before addition of the msAb diluted 1:2000 in 1ϫ PBST (1ϫ PBS, 0.1% Tween 20, pH 7.4) to a final concentration of approximately 50 ng/ml. Hen-generated tag-specific antibodies recovered from the depletion step were co-incubated with the monospecific antibodies, and the glass slides were incubated on a shaker for 60 min. The slides were washed with 1ϫ PBST and 1ϫ PBS two times for 10 min each before the secondary antibodies goat-anti rabbit Alexa 647 and goat anti-chicken Alexa 555 (Molecular Probes) diluted 1:60,000 to 30 ng/ml in 1ϫ PBST were added and incubated for 60 min. After the same washing procedure as for the first incubation, the slides were spun dry and scanned using a G2565BA array scanner (Agilent), and images were quantified using the image analysis software GenePix 5.1 (Axon Instruments).
Western Blots-Western blot analysis of affinity-purified antibodies was performed by separation of total protein extracts from selected human cell lines and tissues on precast 10 -20% Criterion TM SDS-PAGE gradient gels (Bio-Rad) under reducing conditions followed by electrotransfer to PVDF membranes (Bio-Rad) according to the manufacturer's recommendations. The membranes were blocked (5% dry milk, 0.5% Tween 20, 1ϫ TBS, 0.1 M Tris-HCl, 0.5 M NaCl) for 1 h at room temperature, incubated with the primary antibody diluted 1:500 in blocking buffer, and washed in Tris-buffered saline supplemented with Tween 20. The secondary horseradish peroxidase-conjugated antibody (swine anti-rabbit immunoglobulin/horseradish peroxidase, Dakocytomation, Glostrup, Denmark) was diluted 1:3000 in blocking buffer, and chemiluminescence detection was carried out using a Chemidoc charge-couple device camera (Bio-Rad) and SuperSignal ® West Dura Extended Duration substrate (Pierce) according to the manufacturer's protocol.
Immunohistochemistry-Slides were baked for 45 min in 60°C, deparaffinized in xylene, hydrated in graded alcohols, and blocked for endogenous peroxidase in 0.3% hydrogen peroxide diluted in 80% ethanol. For antigen retrieval, a Decloaking chamber (Biocare Medical, Walnut Creek, CA) was used. Slides were immersed and boiled in Target Retrieval Solution, pH 6.0 (Dakocytomation) for 4 min at 125°C and then allowed to cool down to 90°C. Automated immunohistochemistry was done using an Autostainer Plus instrument (Dakocytomation). Primary antibodies and a dextran polymer visualization system (Envision, Dakocytomation) were incubated for 30 min each at room temperature, and slides were developed for 2 ϫ 5 min using diaminobenzidine (Dakocytomation) as chromogen. Each incubation was followed by rinsing in wash buffer (Dakocytomation). After a short rinse in tap water, slides were counterstained in Harris hematoxylin (Sigma) and coverslipped using Pertex (Histolab, Gothenburg, Sweden) as mounting medium.
Digital Imaging of the Tissue Cores-All immunohistochemically stained sections from the eight different TMAs were scanned using an automated slide-scanning system, ScanScope T2 (Aperio Technologies, Vista, CA). For each antibody 576 digital images were generated to represent the total content of the eight TMAs. Scanning was performed at 20ϫ magnification. Digital images were separated and extracted as individual TIFF files for storage of original data. The size of each TIFF image is ϳ20 -30 megabytes. The high resolution TIFF images are stored on digital tapes, and to facilitate handling the images in a web-based annotation system the individual, images were compressed from TIFF format into JPEG format.
Scoring of Protein Expression-The annotation software was developed to allow for a basic and rapid evaluation of immunoreactivity in a broad spectrum of different tissues and cell types. The annotation tool software was developed to run on any standard desktop personal computer (Microsoft Windows and Macintosh operating systems) using an ordinary web browser interface. Parameters that were annotated included overall staining, congruity in staining between triplicate/duplicate samples, validation of immunohistochemical staining as well as staining intensity, fraction of immunoreactive cells, and pattern and localization of immunoreactivity (nuclear, cytoplasmic, or cell membranous). A text box was also included for comments.
Information Technology-The protein atlas is a web-based service clustered on multiple web servers, each serving the web pages, database, and all the images individually due to "fail-safe" demands. All software used within the protein atlas are open source based on the LAMP setup (Linux, Apache, MySQL, and PHP). The protein atlas is loaded with data and image files from the HPR-LIMS (Laboratory Information Management System), which is the production system especially developed for and used within the HPR project.

RESULTS
The Monospecific Antibodies Used in the Protein Atlas-In this study, we generated monospecific antibodies by a strategy where PrESTs are used both as antigens for the development of polyclonal antibodies and as affinity ligands for the subsequent affinity purification of the antibodies as described before (8,9,11,12). In addition, we used commercially available antibodies selected primarily by their medical relevance and importance. Altogether 718 antibodies representing all human chromosomes were used to create the protein atlas, although the focus was on chromosomes 14, 22, and X for the monospecific antibodies (Fig. 1).
Validation of the Antibodies-The monospecific antibodies were validated using a broad range of quality assurance analyses. All cloned gene fragments were sequence-verified, and the recombinant produced antigens (PrESTs) were analyzed by electrospray mass spectrometry to verify the expected molecular weight (data not shown). The affinity-purified antibodies were analyzed by a novel protein array method (9) in which a multitude of human protein fragments (PrESTs) were spotted on a single glass slide. This assay, using fluorescently labeled secondary antibodies for signal detection, provides information about the specificity and purity of the monospecific antibodies. In Fig. 2, four examples of this analysis are shown in which 1440 PrEST fragments have been spotted in triplicates on glass slides, and the signal of the antibody binding to each spot is shown. Furthermore the figure shows two separate examples of the results from the protein array analysis for duplicate antibody fractions obtained by immunizing two separate animals with the same antigen ( Fig. 2, a/b and c/d, respectively). The results show that the antibodies are in each case specific for the antigen and also that reproducible antibodies, as judged by the protein array, can be obtained by immunization with the same antigen in two separate animals.
In most cases, the monospecific antibodies were further validated by Western blotting with total protein extracts from selected human cell lines and tissues. These analyses give important data about protein size and expression patterns. The standard design of the Western blots contains three human cell lines (RT-4, EFO-21, and A-431) and two human tissues (liver and tonsil). In many cases, the analysis showed a specific staining with a single band corresponding to the size of the expected human protein as exemplified by four antibodies toward four different proteins with an expected molecular mass of 61, 22, 126, and 57 kDa, respectively ( Fig.  3, a-d). For other antibodies additional bands were detected (Fig. 3e), and in some cases no band of the correct size could be seen (Fig. 3f). It should be noted that the lack of the expected protein or the presence of additional bands could be explained by the expression pattern for the investigated protein or by protein modifications, proteolysis, or the presence of unknown splice variants.
All antibodies were further used to stain a tissue microarray with human tissues and organs. If possible, the immunohistochemistry pattern was compared with bioinformatic data based on gene, transcript profiling, protein expression, and localization data. In addition, for selected antibodies, a competition-based adsorption assay was carried out as described previously (9). The assay was performed by incubation of the antibody with its antigen before immunostaining. Disappearance of the specific staining was interpreted as evidence that the observed immunohistochemistry pattern was specific and that the msAb recognized the expected target.
Evaluation of the complete set of validation assays showed that approximately half of the antibodies exhibited possible cross-reactivity and were consequently excluded from the protein atlas database. A validation score was given for each of the remaining monospecific antibodies. A high validation score indicates that the quality assurance supports, with high confidence, the specificity of the msAb toward the expected human target protein. A low validation score shows that the antibody is probably specific to the expected target, but in this case, the validation is less clear, and cross-reactivity cannot be excluded. For the 275 msAbs analyzed in this study, 160 have a low validation score, whereas 68 and 47 have a medium or high validation score, respectively. For the 443 commercial antibodies, the only validation performed as part of the project was a comparison between our data and the data obtained from the antibody provider and a bioinformatic comparison between our experimental data from the tissue microarrays with expected tissue profiles as judged from the literature.
Immunohistochemistry and Image Analysis-Once the high throughput antibody-based tissue profiling was available, it became feasible to create an atlas of protein expression patterns in a multitude of normal human tissues and cancer tissues representing the 20 most prevalent cancer types. A set of standardized TMAs was produced as described by Kampf et al. (10) containing 48 different human tissues in triplicate and cancer tissues from 216 individually different tumors in duplicate. Digital images for annotation of expression profiles were generated using a semiautomated approach (10,13), and 576 images corresponding to a total of 20 gigabytes of data were collected for each antibody. The original TIFF images were stored on digital tapes for future analysis, and the images were converted to JPEG images suitable for web-based browsing. Each JPEG image corresponds to ϳ1 megabyte of data with the image compression ratio adequate for analysis down to subcellular levels.
The images were annotated by certified pathologists using a newly designed web-based annotation software. 2 The manual annotation of each image provides a knowledge base for functional studies and allows queries about protein profiles in normal and disease tissues. Parameters that were annotated included overall staining, congruity in staining between triplicate/duplicate samples, and validation of immunohistochemistry staining. Staining intensity, fraction of immunoactive cells, and patterns and localization of immunoreactivity, such as nuclear, cytoplasm, or membranous, were also noted (10).
The initial annotation was performed during a 2-day "annotation jamboree" in which 26 pathologists from Sweden, Norway, and Finland gathered and annotated ϳ80,000 images using individual desktop terminals. Some antibodies were annotated independently by two or more pathologists. This allowed analysis of consistency and comparability of annotations from different pathologists. The workshop was subsequently followed by an internet-based system in which each pathologist could book a particular antibody, download the corresponding 576 images, perform the annotation, and then submit back the annotation to the central database. In this way, all the 400,000 images in the protein atlas were annotated.
Protein Atlas for Normal Tissues-The publicly available database (www.proteinatlas.org) contains, in the first version, more than 400,000 high resolution images corresponding to more than 700 antibodies toward human proteins. The protein profiles in normal tissues of a particular protein are displayed by a summary page including all 48 tissues analyzed (Fig. 4, a  and b). Intensity and abundance of immunoreactivity are given as a color code from white (no protein presence) to red (high amounts of protein). Each colored circle represents an annotated tissue type, and the circles can be clicked to show the underlying original images.
The summary results from the antibody HPR000701 toward the cyclin-dependent kinase-activating kinase assembly factor MAT1 protein show a weak or moderate expression pattern in a majority of analyzed cell types, whereas a strong expression is detected in testis and urinary bladder (Fig. 4a). This gene product contains a type 1 RING-type zinc finger, and the protein is reported to be involved in cell cycle control and RNA transcription by RNA polymerase II (14,15). Highest levels of expression have been reported in colon and testis (16) supporting the protein expression data. The underlying image for testis ( Fig. 4c) with its "microscope" view ( Fig. 4d) shows strong nuclear staining in germ cells in the seminiferous duct.
In Fig. 4b, the results from the antibody HPR001012 toward the Rho-GTPase-activating protein 4 are shown. This gene product has been reported to be highly expressed in developing and adult brain (17). Moreover the protein has an inhibitory effect on stress fiber organization and may also downregulate Rho-like GTPase. Expression has predominantly been found in hematopoietic cells (spleen, thymus, and leukocytes) with only low levels in placenta, lung, and various fetal tissues (source database: ARHGAP4). Interestingly the protein atlas confirms that the protein is mainly found in hemapoietic cells in bone marrow and lymphoid tissues with only weak staining in various other cell types (Fig. 4b). Strong cytoplasmic immunoreactivity is found in lymphoid cells in a TMA spot from tonsil (Fig. 4e). Higher magnification (Fig. 4f) shows distinct staining in small non-follicle cells in paracortex, whereas larger follicle cells in the cortex region only display weak immunoreactivity.  Fig. 5, some examples of normal tissue profiles selected from the protein atlas are shown. The first example shows the results from an antibody (HPR000556) generated toward the cysteine-rich motor neuron 1 protein precursor, whose gene (CRIM1) is located on chromosome 2. This gene product contains a putative transmembrane region and signal peptide and therefore is suggested to encode for a type I membrane protein.

Some Examples of Normal Tissue Profiles-In
The protein may play a role in central nervous system development by interacting with growth factors implicated in motor neuron differentiation and survival and may also play a role in capillary formation and maintenance during angiogenesis (18,19). Transcript profiling of the CRIM1 gene suggests expression in pancreas, kidney, skeletal muscle, lung, placenta, brain, heart, spleen, liver, and small intestine (Uniprot accession number Q9NZV1). The protein atlas shows a very selective staining with only a distinct staining in kidney glomeruli (Fig. 5a) and positive trophoblastic cells in placenta (Fig. 5b).
Other tissues are negative. The finding of a highly selective staining in glomeruli is interesting and consistent with the findings of Crim1 expression in developing blood vessels as shown by Glienke et al. (19). It would be interesting to analyze Crim1 expression in kidney development as well as in glomerulopathic diseases to elucidate its role in glomeruli. The finding of cytoplasmic expression of Crim1 in trophoblastic cells is unclear and suggests that this protein plays a different role in the placenta.
The second example is an antibody (HPR000611) generated from a gene located on the X chromosome. The corresponding protein belongs to the melanoma-associated antigen (MAGE) family and is referred to as melanoma antigen family B, 10 (EnsEMBL accession number ENSP00000328007). No description is found for this gene that lacks signal sequence and transmembrane regions. The protein atlas shows a strong and distinct staining in blood vessels including small intricate capillaries in the myocardium (Fig. 5c). Small arterioles in the Intensity and abundance of immunoreactivity is given as a color code (red ϭ strong, orange ϭ moderate, yellow ϭ weak, white ϭ no staining, and black ϭ missing tissue). Each colored circle represents one tissue type. The circles are divided in three because each tissue is represented by samples from three different patients. Two examples of normal tissue profiles obtained with antibodies are shown: a, the antibody HPR000701 directed toward the MAT1 gene product from chromosome 14; and b, the antibody HPR001012 directed toward the ARHGAP4 gene product from the X chromosome. One example of a TMA spot from normal testis stained with an antibody generated from the MAT1 gene is shown at low (c) and at high magnification (d). Similarly a TMA spot showing immunohistochemical outcome from the ARHGAP4 gene is shown at low (e) and high magnification (f). kidney show staining, although specialized vasculature in glomeruli is negative (Fig. 5d). A thin area of immunoreactivity is also evident along the capsule of Bowman. Interestingly the staining of vessels is in part similar to what is found using the endothelial marker CD31. CD31 recognizes a 100-kDa glycoprotein in endothelial cells and has been used as a marker for benign and malignant human vascular disorders. Proteins that show a differential expression in different forms of vasculature are of potential importance for understanding formation and function of blood vessels and could thus provide clues into normal development and pathological conditions involving angiogenesis.
Another unknown gene is NP_443138.1 (RefSeq peptide ID) on chromosome 22. No information could be found in the literature about this gene or its gene product. Immunostaining with an antibody toward this protein (HPR000781) shows a selective and distinct staining in a subset of neuronal cells including Purkinje cells in the cerebellum (Fig. 5e). In addition this antibody shows a distinct and apparent fiber-specific staining in striated muscle fibers (Fig. 5f). The finding of strong FIG. 5. Examples of normal tissue profiles. The figure shows staining patterns for four proteins with two images for each protein. The antibody HPR000556 generated toward the CRIM-1 protein precursor whose gene is located on chromosome 2 shows a very selective staining with only a distinct positivity in kidney glomeruli (a) and positive trophoblastic cells in placenta (b). The antibody HPR000611 generated toward the melanoma antigen family B, 10 located on the X chromosome shows a strong and distinct staining in blood vessels including small intricate capillaries in the myocardium (c). The small arterioles in the kidney show positive staining, although specialized vasculature in the glomerulus is negative (d). The antibody HPR000781 directed toward an unknown gene product (NP_443138.1) from chromosome 22 shows a selective and distinct staining in a subset of neuronal cells including the Purkinje cells in cerebellum (e) as well as a distinct and apparent fiber-specific staining in striated muscle fibers (f). The antibody HPR001066 generated from the KCND1 gene located on the X chromosome shows a moderate cytoplasmic and membranous staining in several tissues including liver (g). A peculiar pattern of immunoreactivity was evident in heart tissue where several cardiomyocytes showed a condensed irregular sarcoplasmic positivity (h). staining in certain cell types of the central nervous system is intriguing, and specific antibodies directed to such proteins provide important tools for analyzing both features of normal brain development and understanding neurological diseases. It is noteworthy that this protein is highly expressed in only one of the fiber types that are present in normal striated muscles, and this finding needs further investigations to elucidate its role in characteristics of normal muscle function.
The antibody HPR001066 was generated toward KCND1 whose gene is located on the X chromosome. This gene encodes a member of the potassium channel, voltage-gated, shal-related subfamily. Voltage-gated potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. It has been shown that voltage-gated potassium channels are expressed at moderate levels in all tissues. KCND1-specific transcripts have been detected in human brain, heart, liver, kidney, thyroid gland, and pancreas as revealed by Northern blot and RT-PCR experiments (20). The protein atlas shows a moderate cytoplasmic and membranous staining in several tissues including liver (Fig. 5g). A peculiar immunoreactivity was found in heart where several cardiomyocytes showed a condensed irregular sarcoplasmic staining (Fig. 5h). The membranous expression pattern found in hepatocytes in normal liver is consistent with functions of potassium channel proteins, although the more precise role of this specific protein in liver remains ambiguous. Moreover the special pattern that is shown in cardiomyocytes was also evident in the two other normal samples from myocardium (not shown), and thus it probably reflects an altered pattern of expression in these highly specialized forms of myocytes. The shown. Intensity and abundance of immunoreactivity is given as a color code (red ϭ strong, orange ϭ moderate, yellow ϭ weak, white ϭ no staining, and black ϭ missing tissue). Each colored circle to the right represents one individual tumor. The circles are divided because they represent duplicate samples from each tumor. The commercial antibody directed toward the KLK3 gene encoding the PSA precursor protein shows expression of this antigen in all prostate cancer samples, although the abundance in different tumors differs (a). Immunohistochemical staining shows a fairly strong positivity in cancer cells (c) that at higher magnification can be seen as cytoplasmic immunoreactivity (d). The HPR000937 antibody was generated from the TYRP1 gene located on chromosome 9 that encodes the type I membrane protein 5,6dihydroxyindole-2-carboxylic acid oxidase precursor protein. The overview (b) shows that a majority of analyzed tumors are negative and that tumors that are positive only show weak staining. A strong immunoreactivity is found in malignant melanoma, and one example from a cutaneous superficial spreading melanoma with strong immunoreactivity is shown (e). Higher magnification shows cytoplasmic staining in melanoma cells spread in the superficial epidermis as well as in the invasive parts of the tumor (f). staining appears as irregular deposits in a subset of the cells. It is unclear why the protein is only visible in certain cells and what eventual role it has in these cells. It could be speculated that these peculiar patterns represents external perturbations, e.g. ischemia caused by the surrounding microenvironment.
Protein Atlas for Cancer Tissues-The protein atlas also contains a summary page for the protein profiles found in the 20 different cancer types that are analyzed for each antibody. Because of the individual heterogeneity of tumors as compared with normal tissues, we decided to analyze 12 different patients for most cancer types. Two examples of the expression patterns in different cancer patients are shown in Fig. 6, a and b. Intensity and abundance of immunoreactivity is given with the same color codes as with normal tissues. Each colored circle represents one individual tumor.
In the first example (Fig. 6a), cancer tissue profiles are shown using a commercially available polyclonal antibody, CAB000070, toward the prostate-specific antigen (PSA) precursor protein encoded by the KLK3 gene. A specific expression of this antigen is observed in all prostate cancer patients, although the abundance in different individual tumors varies. It is reassuring that immunoreactivity is only found, as expected, in prostate cancer (Fig. 6a). The immunohistochemistry image (Fig. 6c) shows a fairly strong staining in cancer cells, and higher magnification of this tumor shows cytoplasmic expression pattern of PSA in cancer cells (Fig. 6d).
The second example (Fig. 6b) shows the cancer profiles of the type I membrane protein 5,6-dihydroxyindole-2-carboxylic acid oxidase precursor protein encoded by the TYRP1 gene located on chromosome 9. This potentially melanosomal protein is suggested to be involved in melanin biosynthesis and may regulate or influence the type of melanin synthesized (21,22). A look at the summary page for this protein and the corresponding HPR000937 antibody shows that immunoreactivity is found in cancers (Fig. 6b). A majority of the analyzed tumors are negative, and strong immunoreactivity is only found in malignant melanoma. Five of 10 tumors show strong immunoreactivity, whereas three cases display a moderate degree of staining. One spot from a cutaneous superficial spreading melanoma with strong immunoreactivity is shown (Fig. 6e). Higher magnification shows cytoplasmic staining in melanoma cells spread in the superficial epidermis as well as in the invasive parts of the tumor (Fig. 6f). Fig. 7, some examples of cancer tissue profiles selected from the protein atlas are shown. The first example shows the results from an antibody (HPR000704) generated toward the MTHFD1 protein. This is a protein that possesses three distinct enzymatic activities: 5,10-methylenetetrahydrofolate dehydrogenase, 5,10-methenyltetrahydrofolate cyclohydrolase, and 10formyltetrahydrofolate synthetase. Each of these activities catalyzes one of three sequential reactions in the interconversion of one-carbon derivatives of tetrahydrofolate, which are substrates for methionine, thymidylate, and de novo purine syntheses (source database: MTHFD1). Immunostaining with antibody HPR000704 shows ubiquitous cytoplasmic staining in a majority of stained tissues. In squamous epithelium staining is restricted to the differentiated basal cells, whereas the more differentiated cells are negative. Here one example is shown from oral mucosa (Fig. 7a). Most cancer tissues were positive with moderate cytoplasmic immunoreactivity. In squamous cell carcinoma from the head and neck region staining was most abundant in the less well differentiated cancer cells (Fig. 7b). It appears as if the expression pattern in surface epithelium, where cytoplasmic staining is restricted to relatively undifferentiated cells in basal and suprabasal layers, is mirrored in cancer where high expression is evident in the less differentiated cells of squamous cell carcinomas. This protein has been implicated previously to play a role in certain hematological malignancies (23).

Some Examples of Cancer Tissue Profiles-In
In the second example, the antibody (HPR000637) generated toward the PABP2 gene product from chromosome 14 is shown. The ubiquitously expressed polyadenylate-binding protein 2 shuttles between the nucleus and the cytoplasm but is predominantly found in the nucleus (24). In addition, this protein is involved in the 3Ј-end formation of mRNA precursors (pre-mRNA) by the addition of a poly(A) tail of 200 -250 nucleotides to the upstream cleavage product. This protein stimulates poly(A) polymerase ␣ conferring processivity on the poly(A) tail elongation reaction and controls also the poly(A) tail length. The protein is reported to be present at various stages of mRNA metabolism including nucleocytoplasmic trafficking and nonsense-mediated decay of mRNA (Uniprot accession number Q86U42). Immunostaining with antibody HPR000637 shows, as expected, a general nuclear staining in virtually all cells. Distinct positive nuclei are found in epithelial cells as well as inflammatory and stromal cells in normal mucosa from rectum (Fig. 7c). In colorectal cancer strong nuclear staining is apparent in large, atypical cancer cells as well as in mesenchymal cells of cancer stroma (Fig. 7d). Although expression may appear stronger in cancer cells it probably only reflects that cancer cell nuclei are larger, contain more nucleic acids, and have a high rate of transcription and metabolism in general.
The antibody HPR000837 is generated toward the transcobalamin II precursor protein that is a member of the vitamin B 12 -binding protein family (source database: TCN2). This family of proteins, alternatively referred to as R binders, is according to literature (25) expressed in various tissues and is most likely secreted. The protein binds cobalamin and mediates the transport of this compound into cells. Like other mammalian cobalamin-binding proteins, such as transcobalamin I and gastric intrinsic factor, this protein may have evolved by duplication of a common ancestral gene. Immunostaining with antibody HPR000837 shows that most normal tissues and cell types are negative. A distinct staining can be found in basal, myoepithelial cells in prostatic glands and in the basal layer of normal urothelium from the urinary bladder (Fig. 7e). Several cancer tissues show a weak to moderate cytoplasmic staining that was most frequent and abundant in urothelial carcinoma (Fig. 7f). The localization of staining in basal cells is consistent with this protein being expressed during defined stages of normal differentiation. The finding of strong staining in virtually all tumor cells in urothelial cancers where this protein is expressed indicates that this protein may play a role in a certain subset of cancers. Overexpression of a protein in urothelial cancer also provides a possibility to use specific markers for early and non-invasive detection through analysis of urine samples.
The last example is an antibody (HPR000527) generated toward a completely unknown protein, NP_071381.1, whose gene is located on chromosome 22. No information about its expression or function can be found in the literature. Immunostaining with this antibody shows widespread, mainly cytoplasmic staining in both benign and malignant cells. Normal breast tissues show very weak staining in the epithelial cells of lobuli, whereas terminal duct cells show a slightly stronger staining (Fig. 7g). Breast carcinomas were in general positive with a stronger staining intensity (Fig. 7h). This unknown protein shows a clear and strong expression in many types of cancer, although the level of expression varies from weak to strong. In breast cancer tumors show either moderate or strong staining, and it is not known whether levels of expression are associated with grade of malignancy and clinical outcome. It is noteworthy that the expression is higher in terminal duct cells, the potential progenitor cells for ductal carcinomas, as compared with the normal cells in the breast lobules.

DISCUSSION
Here we describe a new protein atlas database that displays expression and localization patterns of proteins in a large portion of human tissues and organs. The objective with the database is to provide a publicly available protein atlas that could function as a knowledge base with regard to the structural and temporal expression of the human proteins in various cells and tissues with a focus on normal and cancer tissues. Initially the database contains ϳ400,000 high resolution images corresponding to more than 20 terabytes of original data.
Each image has been annotated by a certified pathologist. The fact that the images are annotated allows more refined queries, such as "show all proteins that are only expressed in the nucleus of pancreas but not in the liver and kidney," although this has not been implemented yet. A simplified version of the tissue ontology protocol described by Warford et al. (26) was used including a semiquantitative estimate of protein expression and subcellular localization. A basic scheme was followed to allow the time for the annotation to take less than a minute per image. However, it is important to point out that more refined annotation can be performed at a later stage in a decentralized manner because all images are available through the web-based protein atlas.
An important issue for all tissue profiling with antibodies is the specificity and selectivity of the antibody. As outlined in Table I, there are at least six different principle methods to validate the specificity of the antibody. The most frequently used method is to set up an assay with the immunogen as reagent (antigen-based validation). This approach has the advantage that it is often relatively easy to provide a comprehensive analysis as exemplified by the 1440 different protein fragments used here (Fig. 2). In addition, binding parameters, such as dissociation constants, can be obtained. However, the antigen-based assays do not provide information about the binding toward the target from natural sources, which might have extensive post-translational modifications such as glycosylation or a completely different fold as compared with the antigen used in the assay. In addition, cross-reactivity to other human protein cannot be ruled out because a rather limited protein structural space is used in the antigen-based assay.
On the other hand, the target-based assays have the advantage that the selectivity of the antibody is assayed in a natural background of many other human proteins. The assay, such as the Western blot analysis used here (Fig. 3), often yields confirmation of the size of the target and might give additional information about post-translational modifications or the presence of splice variants or proteolytic fragments. However, target-based assays are somewhat cumbersome, and many proteins cannot easily be extracted even with the use of denaturants, or the target protein might not be present in the cell extracts used for analysis. Therefore it is not possible to analyze all human proteins in a straightforward manner with this approach.
The RNA-based methods have the advantage that the specificity of the antibody is validated using an independent biomolecule, the corresponding mRNA. In this case, crossreactivity is not an issue, but the validation method depends on a correlation between RNA and protein levels, and the frequency of the existence of such correlation remains to be established. The DNA-based methods, in which the genome information is used to predict the expression and/or localization of the protein in cells or cellular compartments based on the regulatory and coding part of the gene, might be useful in particular for proteins with a subcellular localization signal such as signals for transport into mitochondria, cytoplasmic membrane, or secretion out of the cell. This is especially useful validation method when analyzing novel proteins with only a theoretical mass and completely unknown function.
The genetic methods for validation of antibodies are interesting, although validation of an antibody using the corresponding transgenic animal requires that the antibody function across species. The use of RNA interference (27) to inhibit the expression of a particular gene product is also promising and much more attractive to scale up for whole proteome studies. An interesting validation alternative is to use expression tags to analyze the localization of green fluorescent protein fusions (28) in human cells and to validate the corresponding antibody specificity by comparisons with the subcellular localization observed by immunohistochemistry.
However, the best validation is probably to have at least two independent antibodies toward the same gene product and compare the binding patterns (epitope-based validation). A fair number of such antibody "pairs" are present in the database. Although the presence of splice variants and/or protein modifications might complicate the staining patterns for such comparisons, an identical staining pattern from antibodies binding to different regions of the same target is of high value as validation criteria. In this case, it might be preferable to perform the comparison with antibodies generated in different laboratories, emphasizing the need for international efforts and collaborations between academia and commercial vendors. It might also be an advantage for such comparisons to use different types of affinity reagents such as monoclonal antibodies (29), monospecific antibodies (1,9), recombinant antibodies (30,31), or other affinity reagents such as aptamers (32) or affibodies (33).
Here we validated the monospecific antibodies with a bat- tery of quality assurance steps including a protein microarray assay, Western blot analysis, an adsorption assay, and bioinformatic/immunohistochemistry comparison including information both from literature and gene sequence predictions such as the presence of signal peptide or transmembrane regions. A validation score is given for each monospecific antibody based on the combined quality assurance assays. "High" indicates that the validation significantly supports the specificity of the antibody toward the expected human target protein, whereas "low" shows that the antibody is probably specific to the expected target, but in this case, the validation is less clear, and cross-reactivity cannot be excluded. Congruent staining of two or more antibodies directed toward the same protein is included in the score. It is important to point out that this validation score is a subjective estimate based on relatively non-quantitative data such as bands on Western blots, and an important objective for future international efforts could be to agree on rules for validation of antibodies to be used in the field of antibody-based proteomics.
For the commercial antibodies, we relied on the quality assurance of the antibody provider, but it might be relevant to exclude antibodies that have not been validated using such agreed upon criteria. In this context, it could be argued that the primary data for the quality assurance must be published in conjunction with the tissue profiles on the web-based protein atlas to allow individual researchers to estimate the probability of specificity and selectivity of a given antibody. As pointed out above, it cannot be excluded that some of the immunoreactivity in the present atlas is due to cross-reactivity to other proteins than the expected target protein. It is therefore essential to encourage a continuous dialogue with the scientific community through the protein atlas effort to find antibodies with questionable staining patterns and to enable the exclusion of such cross-reactive antibodies from the database.
It is interesting to note that very few proteins show a tissuespecific pattern in a single, unique tissue or organ. Of the proteins analyzed it is only a handful, including PSA and insulin, that show such single tissue specificity. This is somewhat surprising due to the fact that many of the proteins analyzed have been defined as tissue-specific. More work is needed, including thorough comparisons between RNA and protein expression levels in cells and tissues, to understand tissue specificity in more detail.
Using the proteomic approach described here it appears possible to scale up to involve tissue profiles representing tens of thousands of antibodies. If specific antibodies can be generated within the framework of various international efforts, a comprehensive protein atlas for a large part of the human proteome is within reach. The estimated size of the non-redundant human proteome is in the order of 20,000 -25,000 (1), and the long term objective of an international antibody-based proteomic effort could thus be to generate ϳ50,000 affinity reagents, two for each non-redundant pro-tein. These reagents would subsequently be used to produce a protein atlas covering all, or nearly all, human proteins. Furthermore the antibodies generated from such an effort would constitute an invaluable resource for continued in depth biological and clinical research. The collection of affinity reagents could subsequently also include splice variants and modification-specific antibodies. Here we focused on the analysis of normal and cancer tissues. It is not inconceivable that the antibodies generated within the proteomic effort described here could in the near future be used also for tissuebased profiling of other major disease areas as well as for blood biomarker analysis of patients with various disease profiles. A comprehensive and sensitive biomarker analysis of a majority of all human proteins in most human diseases can thus be envisioned.