Antibodypedia, a Portal for Sharing Antibody and Antigen Validation Data*S

Antibodies are useful tools to characterize the components of the human proteome and to validate potential protein biomarkers discovered through various clinical proteomics efforts. The lack of validation results across various applications for most antibodies often makes it necessary to perform cumbersome investigations to ensure specificity of a particular antibody in a certain application. A need therefore exists for a standardized system for sharing validation data about publicly available antibodies and to allow antibody providers as well as users to contribute and edit experimental evidence data, including data also on the antigen. Here we describe a new publicly available portal called Antibodypedia, which has been developed to allow sharing of information regarding validation of antibodies in which providers can submit their own validation results and reliability scores. We report standardized validation criteria and submission rules for applications such as Western blots, protein arrays, immunohistochemistry, and immunofluorescence. The contributor is expected to provide experimental evidence and a validation score for each antibody, and the users can subsequently provide feedback and comments on the use of the antibody. The database thus provides a virtual resource of publicly available antibodies toward human proteins with accompanying experimental evidence supporting an individual validation score for each antibody in an application-specific manner.

One of the great challenges in basic and clinical proteomics and in bioscience in general is the lack of well characterized affinity reagents for many human protein targets. Such protein-specific probes could be used to explore the corresponding proteins both in vivo and in vitro, including the necessary validation of potential biomarkers discovered through proteomics-based methods such as two-dimensional gels (1) or mass spectrometry (2). An important factor for the use of protein probes is the quality assurance of the affinity reagents regarding specificity and cross-reactivity (3,4). A recent analysis 1 of more than 5000 antibodies from over 50 commercial providers demonstrated that ϳ50% of the antibodies were found to be non-functional in the immunohistochemistry application tested. This demonstrates the need for standardized ways to validate antibodies that is further accentuated by the huge dynamic range of proteins in biological systems as exemplified by the 10 10 -fold difference in concentration between abundant and low abundance serum proteins in human blood (5), making potential cross-reactivity difficult to predict, and points to the need to validate affinity reagents in an application-specific manner.
Antibodies can be used in many different ways to probe protein function and expression ( Table I). Note that for many applications, the protein targets are denatured by various agents, such as detergents or formalin, and this means that conformational epitopes obtained by different parts of the protein sequence brought together by the folding process often are no longer present in the assay format. In these cases, it might be preferable to use antibodies recognizing linear epitopes, which are defined as a recognition region consisting of a stretch of consecutive amino acids, normally 8 -12 residues (6). For other applications, such as flow sorting of live cells or therapeutic antibodies, the protein target is usually in a native fold, and thus conformational epitopes are usually preferable (7), although linear epitopes might also in some cases be present at the surface of the target protein.
The functionality of a particular antibody in a specific application, however, is hard or even impossible to predict, and this emphasize the importance of testing an antibody across different application platforms to investigate the performance during a fixed set of conditions and to establish binding parameters, such as specificity and sensitivity.
An important factor for the generation of affinity reagents is the choice of antigen (8). There are at least five different ways to generate the antigen as outlined in Table II ranging from the purification of the native protein from natural resources to synthesis of short peptides by chemical means. The most common approach is to use recombinant technologies and to express the full-length protein or fragments of the protein in heterologous hosts. It is also possible to generate the antigen directly during the immunization by injecting cDNA-expressing clones into the animal in a process called genetic immunization (9). Note that the likelihood of a linear or conformational epitope of the resulting antibody is influenced by the type of antigen used to generate the affinity reagent as exemplified by the generation of predominately linear epitopes by synthetic peptides and protein epitope signature tags (10), whereas the im-munization with folded domains or purified native proteins often yields conformational epitopes (7). It is therefore not surprising that most efforts to generate therapeutic antibodies have involved antigens in the form of full-length proteins or folded protein domains, whereas research reagents for Western blots or immunofluorescence often are obtained using synthetic peptides or recombinant protein fragments (11). The fact that the antigen influences the performance of the generated affinity reagents across different assay platforms demonstrates the importance to publicly disclose the antigen sequence or even better the exact target binding site (epitope) as part of the characteristics for a particular antibody.
It has been suggested that an objective for antibody-based proteomics (4) should be to generate paired antibodies (affinity reagents) recognizing distinct and non-overlapping epitopes of the target protein to facilitate the quality assessment of the target specificity but also to allow various formats of "sandwich-based" assays (12). This calls for a database shared by the scientific community to allow access of information about antigen information coupled with a specific antibody as an important input in the selection of affinity reagents for a particular assay. Recently Mathivanan et al. (13) described the Human Proteinpedia portal, which is an attempt to share, integrate, and present proteomics data in a "Wiki-pedia"-style manner, although the contributor is expected to provide experimental evidence for the data annotated and with the restriction that only the original contributor can edit the data. Given the complexity of proteomics data, Proteinpedia allows all of the annotated data to be visualized, and the aim of the portal is to facilitate comparison and interpretation by meta-annotations pertaining to samples, method of isolation, and experimental platform-specific information (13). The need to be able to share information about antigens and application-specific validation of affinity reagents enforces the need for a conceptually similar database portal for antibodies based on community-based contributions.
Here we describe an information database that has been developed within the framework of the European Union research infrastructure ProteomeBinders (14) aimed to create a virtual resource of validated antibodies available to the scientific community. The portal allows users to share validation results on antibodies and antigens, and we report the implementation of four applications and the rules for submission of data.   No Linear concept (Linux, Apache, MySQL, and PHP). The database of the Antibodypedia was designed in a template-based manner where certain binder parameters are selected from the entire set of binder parameters for a specific binder type. The selected database model greatly facilitates the addition or modification of sets of binder parameters for each type after future community feedback. The database model for validation data was designed in the same fashion enabling addition of future validation types and parameters.
The Web Portal-The Web portal was developed with three slightly different appearances, one for each user group: binder providers, internal reviewers, and the public part for end users. The two first user groups need to log in to the portal using their account for them to add/modify binders (binder providers) or review binders (reviewers). Having a user account as a binder supplier will grant access to the part of the Antibodypedia portal where new binders can be submitted and previously submitted binders can be modified or enriched with more validation data. Having a user account as a reviewer will grant access to the part of the Web portal where newly submitted or modified binders need to be reviewed for compliance to the hereby proposed validation scoring system for each assay. After the supplier has submitted the binders, the binders and accompanying validation data will be reviewed to verify concordance with this proposed evaluation standard of the binder experiments. Thereafter if the submission appears correct the binders will be published on the Antibodypedia.
Validation of Antibodies-The validation using protein arrays was done as described earlier by Nilsson et al. (15). The Western blots were done according to Uhlé n et al. (11). The immunohistochemistry validation has been described by Kampf et al. (16). The confocal analysis (immunofluorescence) was performed as described by Barbe et al. (17).

RESULTS
The Rules for Submission of Data-A new database for sharing antibody and antigen information has been developed. The portal is genecentric based on human genes as defined by Ensembl (18,19). The principle of the information flow is shown in Fig. 1. For each application, a standard set of categories has been established (supplemental Table S1), and these are grouped into three main validation scores: (i) supportive, (ii) uncertain, or (iii) non-supportive. To submit data about an antibody, the provider needs to submit a validation based on experimental evidence that must be disclosed. Only antibodies scored as supportive or uncertain need to be accompanied with experimental data. A standard form for antibody and antigen information needs to be filled in where some of the information is mandatory (such as type of antibody and type of antigen), whereas other information is optional (such as antigen sequence and concentration of antibody). Only antibodies available to the scientific community by an antibody provider can be submitted, and the Web link to obtain the antibody must be provided. Every type of binder has a template with relevant parameters for each binder type that enables future updates to the set of parameters after feedback from the binder community. The standard Web form allows submission of one binder at a time. To submit larger sets of binders a new eXtensible Markup Language (XML)based data standard has been developed that allows for binder producers to submit a large set of binders together with results from validation experiments in a single file. The new XML-based data standard, which is an extension to the Proteomics Standard Initiative-Molecular Interaction (PSI-MI) data standard (20), has also been developed within the European Union ProteomeBinders community 2 and is called Proteomics Standard Initiative-Protein Affinity Reagent (PSI-PAR).
Antigen-based Protein Assays-A comprehensive way to validate antibodies is to use a protein array to allow for multiplex analysis in which the binding to the protein target antigen is compared with the binding to other independent protein targets. It is important to point out that the antigen is used as target in the assay, and the result therefore does not indicate whether the antibody recognizes the target protein itself. This analysis can be done using many different formats, including a microtiter well format such as ELISA (21), planar microarrays (15), and suspension arrays (22). Here we propose four validation scores for validation criteria using a pla-2 D. Gloriam, H. Hermjakob, and D. Sherman, unpublished data.  Table S1). Validation score 1 (supportive) shows high specific binding of the antibody to the target antigen reacting with no signal above 15% of the specific signal to all other antigens on the protein array. Validation score 4 (non-supportive), on the other hand, shows that at least one antigen has more than 40% of the signal obtained for the target protein or three antigens showing more than 15% of the signal as compared with the signal of the binding to the target antigen. In Fig. 2, examples of results from the supportive category (score 1), uncertain (score 2), and nonsupportive (score 4) are shown.
Western Blot Analysis-A frequently used assay for validation of antibodies is Western blot in which a tissue or cell extract is denatured with detergent (SDS) and all proteins are separated according to size using electrophoresis, blotting to a membrane, and analyzed using a specific antibody (23). In this manner, specificity of an antibody can be analyzed together with a comparison between the experimental size and the predicted size of the target protein as determined from the genome sequence. This analysis also allows probing for size differences of the protein target caused by splicing, proteolysis, or modification, such as glycosylation. Seven categories for the validation are proposed (supplemental Table S1) in which three are "supportive," two are "uncertain," and two are "non-supportive." Examples from validation scores 1 (supportive), 5 (uncertain), and 7 (nonsupportive) are shown in Fig. 2. Note that because only a limited amount of tissues or cells can be analyzed on a Western blot, a blank gel (validation score 4) only means that the protein is not present in the tissue or cell extracts analyzed and that another selection of tissue or cell extracts might yield a band of correct size.
Immunohistochemistry Analysis-The analysis using immunohistochemistry (IHC) 3 allows in situ profiling of protein targets from various organs and tissues usually prepared by formalin fixation and imbedded in paraffin (16). The use of tissue microarray (24) allows many hundreds of samples to be collected on a single slide and analyzed in parallel. The evaluation of IHC is relatively subjective with the result of the staining compared with expected expression profiles as determined by literature or bioinformatics, such as the presence of transmembrane regions or InterPro domains (25). Here we propose that the validation is grouped into five validation scores (supplemental Table S1) in which score 1 (supportive) can only be obtained if two paired antibodies with separate and non-overlapping epitopes show the same or near identical staining. Validation score 2 is also supportive, and this category is used when the staining pattern is consistent with experimental and/or bioinformatics data. Fig. 2 shows examples of IHC images scored within score 2 (supportive), 3 (uncertain), or 5 (non-supportive).
Immunofluorescence Analysis-Confocal microscopy is a powerful method for analysis of subcellular localization of protein targets using fluorescently labeled antibodies (17). A complication for validating confocal data is that many proteins are distributed in several compartments and the analysis, similar to IHC, is subjective and depends on data from literature and/or bioinformatics. The subcellular localization prediction methods are still not precise (26) and therefore cannot be used extensively to support or discard an antibody pattern using confocal microscopy. Seven categories are proposed for the validation of antibodies by immunofluorescence (supplemental Table S1), spanning from category 1 with supportive data to category 7 with weak or granular staining not supported by literature. It is important to point out that the results from validation are heavily influenced by the choice of retrieval method and the presence of the protein target in the analyzed cells. Some retrieval methods are suitable for intracellular proteins, whereas other methods are more suitable for surface-bound proteins but not for proteins in intracellular organelles (17). It is therefore important that the validation score is accompanied with a detailed description of the conditions for the analysis. Fig. 2 show examples of immunofluorescence (IF) images scored within score 1 (supportive), 4 (uncertain), or 7 (non-supportive).
The Antibodypedia Portal-The Web portal main page is the search/navigation page where users can search for a protein or gene name or navigate by chromosomes. The results from the search are presented in a list displaying the proteins and respective binders found. In Fig. 3, an example of a search and a resulting list of binders are shown. A summary of the validation scores for submitted validation experiments are shown using a color code. Green indicates supportive, yellow indicates uncertain, and red indicates a non-supportive validation in a particular assay. Selecting one binder brings up all the details about the protein target, antigen information, and all the underlying validation data with images as the common raw data (Fig. 3B). 3 The abbreviations used are: IHC, immunohistochemistry; IF, immunofluorescence.
examples of immunofluorescence analysis using confocal microscopy. The blue channel is used for the nucleus marker (4Ј,6-diamidino-2phenylindole), the red channel is used for the cytoskeleton marker (anti-tubulin antibody), the yellow channel is used for the endoplasmic reticulum marker (anti-calreticulin antibody), and the green channel is used for the antibody in validation. J, supportive for an antibody (HPA000992) toward human Golgin subfamily member 5 with a suggested subcellular localization to Golgi in human cell line U-251; K, uncertain for antibody (HPA002942) toward human transmembrane protein 179 with a subcellular localization suggested to be mitochondria but with no literature or bioinformatics data supportive for this localization (cell line U-2 OS); and L, not supportive for antibody (HPA000649) toward human ␣-N-acetylgalactosaminidase precursor with suggested subcellular localization to cytoskeleton, which is in conflict with the lysosome localization suggested by the literature (cell line U-251).
FIG . 3. The Antibodypedia portal. A, the result list from a search for "RBM" (RNA-binding protein) is shown with 11 hits (proteins) that are present in the database with their validation scores in the different assays to the right. The validation scores are shown for the four applications protein array, immunohistochemistry, Western blot, and immunofluorescence with a color code indicating supportive (green), uncertain (yellow), and non-supportive (red). The chromosomal location of the corresponding gene is shown as well as links to UniProt (25) and Ensembl (33) and a description of the type of antibody and antigen and the origin of the antibody (species). B, the submitted validation data for an antibody (HPA001634) toward human protein RBM22 with binder data, gene data, and validation data from the four assays with the raw data (images). PrEST, protein epitope signature tag; EU, European Union.

Validation of Monospecific Antibodies Generated with the
Human Protein Atlas Program-The antibodies generated during a period of 17 months within the framework of the Human Protein Atlas effort (27) were subjected to the validation as outlined in Fig. 2 and supplemental Table S1. Altogether 3900 antibodies were analyzed on at least one of the four application platforms, and all the results were submitted to the Antibodypedia portal. Fig. 4 shows the results stratified according to the three main validations categories. The results show that most antibodies have supportive validation results using the antigen (protein array), whereas the frequency of supportive antibodies is much lower for the Western blots and the immunohistochemistry. This demonstrates that assays using the antigen to validate an antibody, as shown here with a protein array assay, are not satisfactory to indicate the functionality of the antibody in recognizing the target protein in other applications, such as Western blot or immunohistochemistry. The high frequency of supportive immunofluorescence can be explained by the fact that only antibodies with a supportive or uncertain immunohistochemistry were analyzed by confocal microscopy.
The Human Protein Atlas Portal-The validation presented in the Antibodypedia portal can be used also by other databases to provide evidence-based verification of results obtained with antibodies. The Human Protein Atlas portal shows expression and localization of proteins in a large set of normal human tissues, cancer cells, and cell lines with the aid of IHC images and IF confocal microscopy images. A new feature of the protein atlas will be added in version 4.0 of the atlas to show the validation scores, as determined using the Antibodypedia portal, for all "in-house" generated antibodies with links to the underlying validation data. DISCUSSION Here we show for the first time a portal to share validation data for publicly available antibodies with accompanying data about the antigen. This new portal thus provides a Web-based submission format to allow any antibody provider to submit data about their antibodies with validation scores for various applications. It is only possible to submit antibodies available to the scientific community, and the portal provides direct links to allow users to obtain the corresponding antibody after reviewing the validation information submitted to the database. The database relies on validation scores, submitted by the antibody provider, based on a standard set of validation criteria, but it is important to point out that the validation is subjective in nature. It is thus mandatory to submit the primary data, usually in the form of an image with text annotation, to allow all users to review the data behind the validation score. Users are also allowed to send in comments to the portal about the use of a particular antibody, and in this manner both positive and negative results from a particular antibody can be shared among the scientific community. The curation of the data in the portal is important, but we suggest that this curation should only involve checking that the standardized formats and the submission rules have been followed and that the validation score is supported by evidence data.
The validation score is application-specific with the underlying understanding that antibodies can behave very differently in different applications. The results can also vary depending on the sample preparation used and even the origin of the sample. A description of the sample and the sample preparation must therefore be submitted as part of the validation score. More application, as exemplified in Table II, can be added to the portal, but for each application a standard set of validation categories must be defined, and we suggest that these should subsequently be stratified into the main three validation categories: (i) supportive, (ii) uncertain, and (iii) nonsupportive. The standards for sharing antibody data should adhere to international initiatives in a similar manner to what has already been done for protein interaction data (28).
It is important to point out that the validation of antibodies regarding specificity is subjective and depends on the concentration and dynamic range of the target protein as well as the concentration of other proteins. A "specific" antibody with subnanomolar binding might show high cross-reactivity to another human protein if the abundance and thus the concentration of the latter protein is much higher as compared with the target protein. The specificity of an assay is thus  (27) were subjected to the validation as outlined in Fig. 2. All antibodies were given a score from the categories in supplemental Table S1 (supportive, uncertain, or non-supportive). The percentage of antibodies within each category is shown for the four validation applications. The total numbers of antibodies analyzed in the various applications were 5232 for the Western blots, 6339 for the protein array (PA), 6171 for the immunohistochemistry, and 1091 for the immunofluorescence. Note that only antibodies that were scored supportive or uncertain in the protein array were tested in the Western blot (WB) and immunohistochemistry applications, and only antibodies with a supportive or uncertain score in immunohistochemistry were tested in the immunofluorescence application. dependent on the relative amounts of the target protein as compared with other human proteins. It is also noteworthy that most of the assays here are not quantitative and merely give semiquantitative measurements or relative amounts across several samples. On the other hand, the advantage with antibody-based assays is the possibility to adjust antibody concentrations, retrieval methods, and assay times to compensate for huge differences in dynamic range between different protein targets. It is also possible to validate a particular antibody with a second "paired" antibody directed to a non-overlapping binding region of the same protein target. Because it is important that the two antibodies do not share any epitope, which potentially could give rise to the same cross-reactivity pattern, an important mission of the Antibodypedia is to encourage the submission of data about the antigen to ensure that two paired antibodies do not recognize the same binding region on the protein target. Eventually it is preferable to perform more detailed epitope mapping of all antibodies and to submit such information to the portal to gather even more relevant information in the determination whether a particular staining pattern is specific or not.
There are many different types of affinity reagent that can be used as protein probes to explore the human proteome (29). For research applications, antibodies are by far the most used affinity reagents, including monoclonal antibodies generated by hybridoma technology (30) or monospecific antibodies in which the polyclonal antibody mixture is affinitypurified using synthetic peptides (31) or protein fragments (15). However, the rapid development of new selection methods (29) might make it possible in the future to create renewable affinity reagents in a high throughput manner using in vitro selection principles. Future versions of the Antibodypedia portal should therefore allow the inclusion also of recombinant affinity reagents, including new protein scaffolds (32), such as affibodies and anchyrins, and nucleic acid-based affinity reagents, such as aptamers. It might also be relevant to also include information about small molecular binders to protein targets, such as peptides or low molecular weight organic synthesis products.
The first version of this portal contains information about 3900 antibodies generated within the framework of the Human Protein Atlas program (11), but in the future all antibody providers are invited to submit their own antibodies to the virtual resource. The Antibodypedia portal will thus contain all antibodies in which Web submission has been made by the antibody provider. This is in contrast to the Human Protein Atlas portal, which only contains antibodies approved by a comprehensive analysis of immunohistochemistry using a standardized set of 48 tissues, 216 cancer patients, 47 human cell lines, and 12 primary cells (27). The protein atlas thus provides expression profiles of human proteins in tissues and cells based on selected antibodies, whereas the Antibodypedia portal contains any antibody validated by at least one of the applications included in the portal and independent of its presence or absence in the immunohistochemistry-based protein atlas portal.
In summary, we present a new community-based virtual resource for antibodies that has been validated in an application-specific manner. Proposed validation criteria are presented for four common research applications, but more applications can be added in due course. The objective of the portal is to allow antibody providers to submit validation results for their antibodies and to aid users of antibodies to select antibodies functional in a particular assay. The ultimate aim is to create a resource of validated antibodies to all human proteins to facilitate the experimental "annotation" of the human proteome and to facilitate the analysis of potential biomarkers discovered through various clinical proteomics efforts.