Use of a Single-Chain Antibody Library for Ovarian Cancer Biomarker Discovery*

The discovery of novel early detection biomarkers of disease could offer one of the best approaches to decrease the morbidity and mortality of ovarian and other cancers. We report on the use of a single-chain variable fragment antibody library for screening ovarian serum to find novel biomarkers for the detection of cancer. We alternately panned the library with ovarian cancer and disease-free control sera to make a sublibrary of antibodies that bind proteins differentially expressed in cancer. This sublibrary was printed on antibody microarrays that were incubated with labeled serum from multiple sets of cancer patients and controls. The antibodies that performed best at discriminating disease status were selected, and their cognate antigens were identified using a functional protein microarray. Overexpression of some of these antigens was observed in cancer serum, tumor proximal fluid, and cancer tissue via dot blot and immunohistochemical staining. Thus, our use of recombinant antibody microarrays for unbiased discovery found targets for ovarian cancer detection in multiple sample sets, supporting their further study for disease diagnosis.

Despite many advances in the treatment of cancer, early detection and tumor removal remains the best prospect for overcoming disease. Ovarian cancer is an excellent example of the potential prognostic value of early detection because diagnosis at a localized stage has a 5-year survival rate of 93%. However, only 19% of cases are diagnosed at this stage, and by the time the disease has evolved to an advanced stage, the 5-year survival rate drops to 31% (1).
Much effort has been expended to find early detection markers of ovarian cancer, and some success has been achieved. Most notable is CA125, the only approved marker for the detection of recurrence of ovarian cancer (2). Other leading targets are mesothelin and HE4, which have been examined by several groups for their efficacy as early detection markers (3)(4)(5)(6)(7)(8). Nevertheless, several conditions necessitate the discovery of more specific and sensitive ovarian cancer markers: the heterogeneity of this disease, the ambiguity of its symptoms, its low incidence in the general population, and the low sensitivity and specificity of currently available markers.
One of the difficulties in finding markers in blood is the complexity of the plasma/serum proteome, estimated in the tens to hundreds of thousands of proteins, as well as its large range in constituent protein concentrations, which can span 12 orders of magnitude (9). However, along with its easy accessibility, the fact that blood is in contact with virtually every tissue and contains tissue-and tumor-derived proteins makes it a preferred source for disease biomarker discovery.
Our previous results (10,11) and those of others (12)(13)(14) using high density, full-length IgG antibody microarrays to validate and discover cancer serum biomarkers demonstrated that this platform is valuable for simultaneously comparing the levels of hundreds of proteins on dozens of serum samples from cancer patients and healthy controls. We confirmed overexpression of CA125, mesothelin, and HE4 in ovarian cancer samples using this high density microarray platform, validating our array methodology for measurement of cancer serum biomarkers and yielding new putative biomarkers for this disease (10,11).
Previously reported approaches are typically limited to a few hundred antibodies. The methodology reported here allows us to exploit the specific advantages of antibodies as high affinity capture reagents to detect differential expression of thousands of tumor biomarkers using a diverse (2 ϫ 10 8 binding agents) single-chain variable fragment antibody (scFv) 1 library for detection of ovarian cancer markers in serum, tumor cyst fluid, and ascites fluid. Our results build on previous reports of phage display library microarrays to discover autoantibody (15)(16)(17)(18) and other protein (12,19,20) cancer biomarkers. Our scFv are high affinity capture reagents consisting of the variable regions of human antibody heavy and light chains joined by a flexible linker peptide. These recombinant antibodies are able to recognize a wide variety of antigens, including many previously thought difficult, such as self-antigens and proteins that are not normally immunogenic in animals (21)(22)(23)(24). Using a highly diverse recombinant antibody library, one has the ability to overcome the complexity of the serum proteome. It has been calculated that for an immune repertoire to be complete (at least one antibody in the repertoire has reasonable affinity for every epitope possible in nature) it requires a diversity of at least 10 6 antibodies (25). The reported diversity of our scFv library exceeds this value by 100-fold (21).
To enrich for antibodies that differentiate disease status, we performed a selection or panning of the naïve library for proteins that are differentially expressed in cyst fluid, ascites fluid, or serum of cancer patients with respect to healthy serum. We printed this sublibrary on activated hydrogel slides that were queried with three different sets of labeled case and control sera to further select those that discriminate cancer status in a statistically significant manner. Next, we identified some of the targets that bind to the individual scFv using high density nucleic acid programmable protein arrays (NAPPAs) expressing a total of over 7000 proteins. Finally, we validated the effectiveness of the selection process by confirming overexpression of these targets in cancer serum, cyst fluid, and ascites fluid as well as in tumor sections.

EXPERIMENTAL PROCEDURES
Selection and Description of Sera and Proximal Fluid Samples-Proximal fluid, cancer, and control sera were selected from a repository collected as part of several human subject approved research projects funded by the National Cancer Institute but collected under the same protocols (26). Proximal fluid in this report encompasses cyst fluid and ascites fluid obtained from women with ovarian tumors. Both types of fluid appear commonly in benign and malignant ovarian tumors as well as in other benign abdominal conditions. Proximal fluid may be enriched for proteins that originate from cells associated with the ovary and therefore could be an abundant source of cancer biomarkers. Proximal fluid samples used for this study contained 30 -80 mg/ml total protein, and the proteomic profile observed by SDS-PAGE and Coomassie Blue staining was indistinguishable from that of serum (data not shown). An ovarian cyst is a collection of fluid surrounded by a thin membranous wall. Cyst fluid was obtained at the time of surgery to remove ovarian tumors and was collected by inserting a needle into the tumor and aspirating 1-3 ml of liquid into a syringe. Fluid was centrifuged at 1200 ϫ g for 10 min to pellet out cells. Ascites fluid is an accumulation of fluid in the peritoneal cavity. Ascites fluid was collected at the time of surgery to remove ovarian tumors, treated with heparin to avoid coagulation, and centrifuged at 2000 rpm for 5 min to pellet cells. Volumes obtained ranged from a few milliliters to several liters. All specimens used in this study were collected between July 1, 2004 and May 10, 2006 and include pretreatment serum samples from ovarian cancer cases diagnosed at the time of surgery with primary serous ovarian cancer located in the ovary, fallopian tube, or peritoneum at various stages of disease. Controls were a heterogeneous collection of three types. Healthy controls were samples collected from healthy women attending regular mammography screening exams and who remained free of ovarian cancer for at least 2 years after serum collection. Surgical controls were women who underwent surgery for non-ovarian gynecologic conditions and who had histologically normal ovaries. Benign controls were women with surgically confirmed benign ovarian cysts or tumors. Risk status of patients was determined based on family history of ovarian or breast cancer, menopausal status, and the presence of mutations in BRCA1 and BRCA2 (for more information on the epidemiological and histological data of the samples used, see Tables I, II,  and III). Sample processing protocols for all specimens were identical. All surgical samples (case, surgical control, and benign) were collected prior to surgery and chemotherapy under the same collection protocols. Controls were selected using propensity score matching so that case and control groups were balanced with respect to age, risk status, and collection date within 3 years.
Recombinant Antibody Libraries-We utilized two naïve phagedisplayed scFv libraries derived from human B-cells, Tomlinson I and J libraries, Medical Research Council, Cambridge, UK (27,28) that express over 100 million unique scFv as a fusion with the gIII protein on the tip of the M13 bacteriophage in a monoclonal manner. The DNA that encodes the scFv was cloned into a phagemid, which allows protein display on the phage surface or expression as a soluble antibody by bacteria. The scFv contain His 6 and Myc tags to assist in purification and detection (28,29).
Selection of scFv Sublibrary-We used the complete library to generate a sublibrary of antibodies that bind proteins differentially expressed in either tumor proximal fluid or serum of cancer patients relative to serum from controls. The complexity of the library was reduced to facilitate microarray printing through a selection process referred to as panning (28,30). Positive selection with proximal fluid or serum from cancer patients was used to enrich for scFv that bind cancer targets followed by negative selection against non-diseased serum to eliminate clones that bind non-cancer targets (for a diagram of the panning procedure, see Fig. 1). For the first round of panning (positive selection), we pooled albumin-depleted proximal fluid consisting of seven cyst and nine ascites fluid cancer samples (for epi-  Table I). One of the samples was included in this pool by error and consisted of a benign cyst fluid. Samples were albumin-depleted using Cibacron Blue beads (Sigma-Aldrich) and concentrated to 2 mg/ml according to previously reported methods (31). We used 50 l of this pool to coat a 5-ml MaxiSorp tube (Thermo Fisher Scientific, Waltham, MA). After blocking the tubes with 5% nonfat dry milk (NFDM), we added 500 l of the input Tomlinson I (1.47 ϫ 10 8 clones) and Tomlinson J (1.37 ϫ 10 8 clones) libraries to the blocked tubes containing cancer proximal fluid proteins. The phage from the first round were eluted from the tubes and used to infected bacteria. The output of the first round was calculated at 2.4 ϫ 10 6 clones. Phage from the first round were rescued with M13K07 helper phage (phage/bacteria ratio of 20:1). The rescued phage (4 ϫ 10 10 colony-forming units) were negatively selected by sequential incubation in three MaxiSorp tubes that were coated with 100 l of depleted pooled healthy human serum (Table II). Only the phage that did not bind to these tubes were used for the second positive selection. The second positive selection was performed in a MaxiSorp tube coated with 25 l of albumin-depleted, pooled cyst and ascites fluids. Eluate from this selection was used to infect TG1 Escherichia coli. Output was ϳ4 ϫ 10 4 clones. Bacterial colonies were picked and inoculated into 46 96-well plates. A similar approach was followed for the panning of an scFv sublibrary selected positively with a cancer serum pool consisting of 12 cases (Table II) and negatively against the same healthy serum pool as before. This approach yielded 79 96-well plates. We then tested the levels of antibody expression of all serum-selected clones with a dot blot of cell lysates using a c-Myc antibody (clone 9E10, Santa Cruz Biotechnology, Santa Cruz, CA), a secondary anti-mouse antibody conjugated to IRDye 800 (Rockland Immunochemicals, Gilbertsville, PA), and an Odyssey Imaging Device (LI-COR Biosciences, Lincoln, NE). We found that ϳ43% of the colonies expressed high amounts of scFv, and only these resulting 34 plates were used for subsequent array screening with ovarian cancer serum. scFv Production and Purification-For production of scFv from the bacterial clones, we used previously reported methods (32). Briefly, cells were grown at 37°C to stationary phase in medium containing 2% glucose. Protein expression was induced by exchanging growth medium for medium containing 2 mM isopropyl 1-thio-␤-D-galactopyranoside and an overnight induction at 30°C. Lysis of cells was achieved with osmotic shock and lysozyme treatment, and scFv was

TABLE III Epidemiological data of patients whose serum samples were used for hybridization on three generations of antibody microarrays
Healthy controls were samples collected from healthy women attending regular mammography screening exams and who remained free of ovarian cancer for at least 2 years after serum collection. Surgical controls were women who underwent surgery for non-ovarian gynecologic conditions and who had histologically normal ovaries. Benign controls were women with surgically confirmed benign ovarian cysts or tumors. Risk status of patients was determined considering family history of ovarian or breast cancer, menopausal status, and mutations in BRCA1 and BRCA2. purified from the lysate by metal affinity chromatography using nickelnitrilotriacetic acid-agarose beads (Qiagen, Germantown, MD) according to the manufacturer's specifications. The final scFv concentration was about 0.2 mg/ml. Antibody Microarray Printing, Blocking, Serum Labeling, and Analysis-Buffers, protocols, and slide surfaces were modified from DNA array-based literature. Methods of manufacturing, processing, and analyzing arrays were reported previously (10). Detection of protein binding to the array requires direct incorporation of a fluorescent label to the serum proteins. Proteins (500 g per channel per array) were labeled with the amine-reactive dyes Cy3 and Cy5 (Amersham Biosciences) according to the manufacturer's specifications. Unincorporated dye was removed by centrifugation with 10,000 molecular weight cutoff spin filters (Millipore, Billerica, MA). Triplicate features of each antibody were printed on Nexterion Slide H hydrogel-coated glass slides (SCHOTT, Elmsford, NY). Printed array slides were washed and placed in ethanolamine block solution (0.3% ethanolamine, 0.05 M sodium borate, pH 8.0). After 2 h, slides were washed, dried by centrifugation, and incubated for 2 h with CyDye-labeled serum that had been treated with a ProteoPrep Immunoaffinity Albumin and IgG Depletion kit (Sigma-Aldrich). After incubation with serum, arrays were washed and dried. Scanning was performed on a GenePix 4000B microarray scanner (Molecular Devices, Sunnyvale, CA). For this study, case and control sera were always labeled with Cy5 and incubated with Cy3-labeled reference sera (a common pool of healthy sera used as a reference for all samples) on the array. For the first array generation, we used the BioWhittaker human serum pool (Lonza, Basel, Switzerland) as a reference. For the second and third array generations, we made our own pool of sera from 12 healthy women. Array data contain a format identical to two-channel gene expression arrays, and analysis proceeds analogously. Technical sources of variation were normalized using Lowess procedures developed for microarrays (33). Following normalization, triplicate spots were summarized using their median. Classification was performed using logistic regression predicting case status using M value adjusted for operator, batch, and their interaction effects. The p value corresponding to the coefficient of the M value was used for ranking the antibodies. The q value, the minimum false discovery rate for multiple hypothesis testing, was calculated via the methods reported by Storey (34) to indicate the error rate.
Target Identification Using NAPPA-To identify the targets that bind to the top performing 19 scFv, we incubated each on high density protein microarrays (35). Briefly, NAPPAs were activated by incubating with the transcription-translation-coupled rabbit reticulocyte lysate expression system (Promega Corp., Madison, WI) for 90 min at 30°C followed by a 30-min incubation at 15°C to allow binding of the GST-tagged proteins to their anti-GST capture antibodies. Slides were blocked with 5% blotto (5% milk in 1ϫ PBS, 0.2% Tween 20) for 1 h, and each was incubated in a 600-fold dilution of an individual scFv overnight at 4°C with constant shaking. Each slide was washed three times in 5% blotto and incubated in a 100-fold dilution of an anti-Myc antibody (Sigma) for 1 h to bind the Myc tag on each scFv. The slides were washed three times with 5% blotto and incubated with a horseradish peroxidase-conjugated anti-rabbit secondary antibody for 1 h to bind the anti-Myc protein (Santa Cruz Biotechnology). Following three washes with 1ϫ PBS, the slides were rinsed with water and treated with Tyramide Signal Amplification to visualize the reactive antigens. Each slide was scanned on a ScanArray fluorescence microarray scanner (PerkinElmer Life Sciences), and each image was quantified using Microvigene (VigeneTech, Carlisle, MA). After screening each scFv clone on a NAPPA, background subtraction was performed. Following a visual inspection of remaining spots, a Z-score was calculated. The Z-score measures how many standard deviations a signal deviates from the mean and is calculated as follows: Z-score ϭ (protein signal Ϫ mean of all protein signals on array)/(standard deviation of all protein signals on the array). All positive and negative controls on the microarray were removed from the analysis prior to Z-score calculation to prevent skewing of the result.
Dot Blot Validation of scFv Targets-For protein standards of the putative scFv targets, we expressed recombinant full-length upstream stimulatory factor 1 (USF1) protein in BL21-Codon Plus(DE3)-RIL cells (Stratagene, La Jolla, CA) and affinity-purified it via its C-terminal His 6 tag. For other standards, we purchased a 293T cell lysate overexpressing human zinc finger and BTB domain-containing protein 22 (ZNF297) (Santa Cruz Biotechnology, catalog number sc-113351) and purified BSA (Jackson ImmunoResearch Laboratories, West Grove, PA). In the case of the 293T lysate, the ZNF297 absolute concentration was not known, so a 2-fold serial dilution of the original lysate was loaded in 1-l spots. We spotted the protein on nitrocellulose, and after spots had dried, the membrane was blocked with 5% NFDM in PBS for 30 min. Incubation with primary antibody was performed overnight at 4°C followed with washes in PBS, incubation with a 1:4000 dilution of secondary antibody, and a final wash in PBS. Blots were all scanned on an Odyssey Imaging Device. We also performed dot blots with human proximal fluid and human serum. The types of proximal fluid were as follows: nine cyst fluid samples from women with benign ovarian tumors, eight cyst fluid samples from women with malignant ovarian tumors, and seven ascites fluid samples from women with malignant ovarian tumors. In the case of the serum samples, there were 16 samples from malignant serous ovarian tumors, 12 samples from healthy women, and 12 samples from women with benign disease of the ovary. We loaded 8 g of total protein for each spot in triplicate on nitrocellulose, let all spots dry, and then blocked the membrane in 5% NFDM in PBS for 30 min. Incubation with primary antibody was performed overnight at 4°C with washes in PBS followed by incubation with a 1:4000 dilution of secondary antibody and a final wash in PBS. Primary antibodies used were ZNF297 (GenWay, San Diego, CA, catalog number 18-003-43323, 1:1000; Abnova, Walnut, CA, catalog number H00009278-B01, 1:500; Santa Cruz Biotechnology, catalog number sc-102162, 1:100), USF1 (GenWay, catalog number 18-003-42527, 1:800; Abnova, catalog number H00007391-M02, 1:200; Abnova, catalog number H00007391-A01, 1:2000; Santa Cruz Biotechnology, catalog number sc-8983, 1:100), HLA-B-associated transcript 4 (BAT4) (Abnova, catalog number H00007918-B01, 1:1000; Abcam, Cambridge, MA, catalog number ab72716, 1:500), and Toll-like receptor 2 (TLR2) (Cell Signaling Technology, Danvers, MA, catalog number 2229, 1:1000; Abcam, catalog number ab13855, 1:500). scFv blots were probed with purified scFv (USF1 at 0.64 mg/ml; ZNF297 at 1 mg/ml) and detected using either an anti-Myc antibody for proximal fluid or serum blots (Santa Cruz Biotechnology, catalog number sc-40, 1:1000) or a directly labeled IRDye700DX anti-protein A antibody for the protein standard blots (Rockland Immunochemicals, 1:2000). Blots were washed and incubated with either anti-goat anti-rabbit Alexa Fluor 680 antibody (Invitrogen, 1:4000) or anti-mouse IRDye700 (Rockland Immunochemicals, 1:4000) and scanned using an Odyssey Imaging Device. To determine whether the signal intensity of the features could discriminate disease status, we aggregated the patient sample replicates by median, then ranked them, and performed a Wilcoxon test between cases and controls. We report a column scatter plot of the median signal intensities for each individual along with the p value from the Wilcoxon test.
Immunohistochemical Staining of Tissue Sections-Tissues from serous ovarian cancers and healthy ovaries were formalin-fixed and paraffin-embedded. Immunohistochemical staining was performed as reported previously (36) using the USF1 antibody from GenWay at a 1:200 dilution.

RESULTS
Phage Library Panning-To find biomarkers for the detection of ovarian cancer, we used a naïve phage-displayed, human scFv library to select a sublibrary of those antibodies that bound proteins that were differentially expressed in tumor proximal fluid and serum of cancer patients with respect to healthy controls. The Tomlinson I and J libraries we used contain 10 8 unique scFv expressed on the gIII protein of an M13 bacteriophage. For selection of the sublibraries, we performed panning of the phage library with either proximal fluid or serum of ovarian cancer patients, obtaining two sublibraries, one for each biological fluid type. Fig. 1 shows a diagram of the panning process.
scFv Microarray Analysis of Ovarian Cancer-The scFv from both sublibraries were purified and printed on activated glass slides. After binding, arrays were blocked and probed with serum samples. For this initial array analysis, samples from 12 cases and 12 controls were used. The matched case and control serum samples chosen for incubation on the microarrays are shown in Table III, Array 1. We chose to label all cases and controls with Cy5 dye and the reference pool with Cy3; having one reference pool and comparing all samples separately with this reference allowed us to compare all cases with all controls rather than one case with one control on each array. This also eliminated dye bias because both cases and controls were read on the same channel. Furthermore, it facilitated multivariate regression analysis by providing meaningful signal data for both cases and controls as opposed to a case/control ratio in which control values become noninformative (i.e. always set to 1.0). Fig. 2 shows an image of an scFv array incubated with labeled serum.
We performed dot blots using an antibody to the Myc tag of the purified scFv that were printed on the arrays to quantify total protein and observed that about 40% of clones do not express high amounts of scFv, leading to features on the arrays with low signal intensities. To avoid purifying and printing these low abundance scFv, dot blots were used to screen all clones used hereafter to eliminate low expressing clones from our studies.
A Wilcoxon test was applied to identify the scFv that bound a protein that was differentially expressed in ovarian cancer serum relative to healthy serum. We prioritized the scFv based on p value by Wilcoxon test. From a total of 4416 cystselected scFv and 3264 serum-selected scFv, we chose the top 1000 of each sublibrary to create a second generation of microarrays. An additional 1344 scFv were printed for the first time on this second array. This group was selected with ovarian proximal fluid but had not been included in the first array because of space limitations. The second generation of arrays was incubated with samples from 28 ovarian cancer cases and 37 controls (Table III, Array 2) in the Cy5 channel and a reference pool consisting of 12 healthy women in the Cy3 channel. Because this array generation used a larger sample size for analysis in addition to the within array and antibody type normalization, we were also able to perform between array normalization for print day and incubation day using a logistic regression model.
A third generation array was printed to serve as further validation of biomarkers from the first two. The top performing 384 ovarian scFv from the second array were chosen based on p value and magnitude difference in expression of the target protein in cancer serum with respect to healthy serum. For the third generation of arrays, serum from 36 cases and 66 controls (Table III,  Phage rescue was performed to obtain virus particles that were used to negatively select (three times) the library against control serum from healthy individuals. Finally, a second round of positive selection with proximal fluid proteins was done. Eluted phage were used to infect E. coli colonies, which now produce individual scFv clones that bind differentially expressed proteins between cases and controls. Shown are the input and output colony-forming units obtained after each step of the panning process.
FIG. 2. Image of scFv printed on hydrogel slide. Purified scFv were printed in triplicate on a glass slide and incubated with sera from a cancer patient labeled with Cy5 dye (red channel) and a control pool labeled with Cy3 dye (green channel). The array was scanned with a GenePix 4000B microarray scanner. pool consisting of sera labeled with Cy3 dye from 12 healthy women.
To determine which scFv are the best at discriminating cases from controls, the results from Arrays 1, 2, and 3 were evaluated. We ranked the scFv based on p value for their ability to discriminate disease status on the arrays and magnitude difference in overexpression of the protein in the cancer samples (we were less interested in scFv that bound a protein that was higher in normal serum). We selected the top 20 scFv with low p values in at least two of the three arrays and a high magnitude difference between cases and controls. The DNA of the phagemids that express these 20 scFv was sequenced, and all but two of the clones were unique at the four hypervariable regions of the antibody (data not shown). One of these repeat clones was eliminated from the list, leaving us with a final list of 19 unique scFv.
scFv Target Identification-The 19 ovarian cancer-specific scFv that exhibited the greatest ability to distinguish cases from controls were incubated on a NAPPA (35,37) to identify their protein targets. These arrays encode over 7000 human proteins distributed over three arrays. Each array was activated through in situ protein production at each spot using a coupled mammalian transcription-translation cell-free expression system to convert the cDNA into a GST-tagged protein that was bound to a capture anti-GST antibody. Each array was incubated with one scFv. After screening each scFv clone on a NAPPA, a background subtraction that eliminates about 90% of the low intensity features on the array was performed. Following a visual inspection of remaining spots, a Z-score was calculated. A protein was considered a potential hit if its Z-score was at least 2 and the spot had dimensions and shape consistent with known positive signals. Fig. 3A shows a three-dimensional representation of the spot signal as represented by a volume rendering for USF1 protein detected with scFv 4, ZNF297 detected with scFv 3, BAT4 detected with scFv 10, and TLR2 detected with scFv 17 on the NAPPA platform. The large peak at the center of each image corresponds to the feature of the protein shown in the inset (i.e. USF1, ZNF297, BAT4, or TLR2). The signal intensities of these spots were higher than any other proteins on the FIG. 3. Functional protein arrays. A, three-dimensional representation of the spot volume for USF1 protein detected with scFv 4, ZNF297 detected with scFv 3, BAT4 detected with scFv 10, and TLR2 detected with scFv 17 on the NAPPA. The large peak at the center of each image corresponds to the feature of the protein shown in the inset (i.e. USF1, ZNF297, BAT4, and TLR2). B, plots of signal intensities of all 19 scFv to a particular protein feature, indicating the specificity of the interaction between one antibody and one antigen on the array. Signal intensity was calculated as the spot signal intensity divided by the median signal for all spots on the slide. array (except for positive controls), confirming that these bound to one protein preferentially. Furthermore, these interactions appeared to lack cross-reactivity as comparison of signal intensities for a particular protein feature with all 19 scFv tested (Fig. 3B) only showed interaction between a particular scFv and its only target. In the case of scFv clone 4, in addition to binding to USF1, this antibody showed minimal cross-reactivity with TLR2 and ZNF297, yet the signal for USF1 was significantly stronger, indicating preferential binding to this protein. With this method, we identified putative targets for 15 of the 19 scFv in our top list (Table IV). In bold are targets that were high confidence identifications (i.e. with high signals and good spot characteristics). Although all 19 scFv were unique, in some cases, two different scFv bound to the same protein on the NAPPA, suggesting that two independently selected scFv have high affinity for the same target, increasing the probability that this is an important marker for ovarian cancer. Also four of the 19 scFv did not bind to any proteins on the array. Possibly their targets were not among the 7000 proteins on the array, or the protein target was not properly post-translationally modified to generate the epitope during NAPPA production.
The top 19 scFv were selected based on their log ratio (the natural logarithm of the ratio of protein expression levels between cases and controls) and p value on the different array generations (Table IV). Many of the scFv have p values less than 0.05, and all have positive log ratios (indicating their targets are elevated in cancer serum with respect to healthy serum). The numerical values of the log ratio for a specific scFv on one generation of arrays cannot be directly compared with the log ratio values on other generations of arrays because the normalization methods used in each case were slightly different because of differences in sample size and other variables. Some scFv in Table IV show no values for the first array generation because they were not printed on these arrays because of space limitations.
In Table IV, the q value represents the false discovery rate, the expected proportion of false positives. This calculation accounts for multiple comparisons in multiple hypothesis testing and is an estimate of what fraction of the antibodies are false positives due to random chance. For example, for a q value of 0.5, one would expect half the candidates to be false and half to be true positives. Although the q values reported for a single array are large, our candidates were carried forward as significant from previous arrays so we would expect more of them to be true positives than the q value would indicate. Furthermore, we subsequently attempt to validate each target using complimentary techniques and different sample sets, therefore discriminating the true from the false positives.
To validate that the NAPPA scFv target identifications were accurate and that the scFv are sensitive and specific for their antigen, we performed dot blots, spotting either purified recombinant protein in the case of USF1 or a lysate from cells that overexpress the target protein in the case of ZNF297. The results of the blots show that both full-length and scFv against USF1 (Fig. 4A) and ZNF297 (Fig. 4B) can detect their cognate antigen but do not bind BSA protein. Furthermore, this interaction is not due to nonspecific binding of the scFv to the protein because an scFv selected against BSA did not bind to These 19 best performing scFv were selected through two to three rounds of arrays. In 15 of 19 scFv, we identified a target using the functional protein arrays. Listed are the target identifications, the fluid type that was used to select the scFv sublibrary (cyst represents cyst and ascites fluid), log ratios (natural logarithm of the ratio of protein expression between cases and controls), p values, and q values (minimum false discovery rates) from a Wilcoxon test. In bold are targets that were identified on the NAPPAs with high confidence. NA indicates that the scFv was not printed in Array 1. either USF1 or ZNF297 even when it was spotted at a much higher concentration.

Validation of Top Candidates Using Commercial
Full-length Antibodies-Having found scFv that can discriminate the disease status of serum samples using the antibody microarray platform and having identified the targets that bind to the scFv using a functional protein array, we purchased full-length antibodies to these targets for their further validation as markers of disease. Based on commercial availability, we obtained antibodies for TLR2, USF1, ZNF297, BAT4, tyrosine-protein kinase 6 (BRK), and cathepsin G (CTSG).
Using several antibodies for each target from different suppliers, we probed dot blots spotted with individual serum and proximal fluid samples from cancer patients and healthy controls. Comparing benign proximal fluid with cancer proximal fluid, we found the levels of USF1, ZNF297, TLR2, and BAT4 to be higher in the cancer samples (both in cyst and ascites cancer fluid samples) in a statistically significant manner. Fig.  5A shows the column scatter graphs of the median signal intensity for each individual within a group. In the case of ZNF297 and BAT4, these results were observed with two different antibodies, increasing our confidence in these markers. It is interesting that for all antibodies tested the signal intensities in cancer ascites were higher than in cancer cyst fluid. This could indicate that ascites fluid may be a better type of sample to use when one is searching for ovarian cancer markers.
Comparing normal serum with cancer serum or benign serum with cancer serum, we were also able to find a statistically significant difference in the feature signal intensities of ZNF297 and BAT4. Fig. 5B shows the column scatter graphs of the median signal intensity of each individual sample within a group. Again in the case of BAT4, the statistically significant differences in signal intensities between the cases and controls were observed with two different antibodies. The overexpression of these markers in individual cancer samples was also observed when blots were detected using the scFv for USF1, but in this case our results did not achieve statistical significance (p ϭ 0.11 for cancer cyst versus benign cyst and p ϭ 0.14 for cancer ascites fluid versus benign cyst).
These results demonstrate that the selected scFv retain the ability to recognize disease-specific antigens after immobilization on the microarray and have high enough affinity to compensate for the complexity of the serum proteome. Furthermore, by demonstrating overexpression of these proteins in proximal fluid and serum from cancer cases, we validated our panning procedure, antibody microarray platform, and NAPPA as a novel technology pipeline to select antibodies that bind proteins that are elevated in serum from ovarian cancer patients and could be used as biomarkers of ovarian cancer.
To validate that these markers originate in the tumors themselves and are not just components of a nonspecific or inflammatory response, we used the antibodies to stain tissue sections of ovarian tumors and healthy ovaries. Staining with the USF1 antibody was specific to the neoplastic epithelial cells of the tumors and localized to a perinuclear compartment, possibly Golgi. Staining of healthy ovarian tissue was very low and was restricted to endothelial cells of blood vessels and areas on the edges of the tissue that stain nonspecifically. These areas were not used to score the staining. Staining with this antibody was measured in a blind manner on a 0 -2 scale with 2 being the highest level of staining. With this scheme, the six ovarian cancer tissues were scored as 1,  (1) is the most concentrated sample, and each spot to the right of it is a 2-fold dilution (1/2, 1/4, 1/8, and 1/16). In the USF1 sample, the highest concentration contained 640 ng of USF1 protein; in the case of BSA, it contained 8 g. ZNF297 protein supplied as a cell lysate was not quantified. The first row of each blot contains the control BSA samples, and the second row has the putative biomarker (USF1 in A and ZNF297 in B). The uppermost blot of each panel uses a full-length antibody to detect the protein spotted; the second blot uses the specific scFv for that protein, and the third uses an scFv specific for BSA. The USF1 antibody used was from Abnova (Abn), clone M02.
2, 1, 2, 2, and 1 (average rating, 1.5; five cases were stage IIIC, and one was stage IVA; all were serous ovarian cancer), and the six healthy ovarian tissues were scored as 0, 0, 0, 0, 1, and 1 (average rating, 0.33). Representative images of cancer tissue and healthy tissue with their USF1 staining are shown (Fig. 6). The overexpression of USF1 in cancer tissue samples and proximal fluid suggests that the primary ovarian tumors could be the source of the USF1 proteins found in ovarian cancer patient serum and legitimizes its further study as a biomarker for early disease detection. Staining with the ZNF297 antibody appeared to be nonspecific and did not appear to be effective at discriminating cases from controls. Staining with several different TLR2, CTSG, and BRK antibodies did not produce measurable signal. This lack of immunohistochemical staining does not preclude them for further study as markers of disease and is most likely caused by the available antibodies not functioning well in immunohistochemistry.

DISCUSSION
The search for blood-based markers of disease has been an intense topic of research for many years. In the case of cancer generally and ovarian cancer in particular, it would provide patients with valuable lead time to treat the disease at a stage when the tumor is localized and offers the best chance of avoiding disease recurrence. Some promising markers for ovarian cancer have been reported, but the heterogeneity of cancer has made it difficult to find a single marker that could be used to screen the general population. Because this disease has a relatively low prevalence in the general population (40 cases for every 100,000 postmenopausal women per year), if a test or panel is intended for widespread screening, it is critical that it classifies disease status with few false positives. One study reported that an ovarian cancer marker must have a positive predictive value of 10% and a specificity of 99.65% for the test to be of diagnostic value (38,39). It is widely believed that only a panel of markers, acting independently, would be able to achieve such rigorous requirements.
The discovery method reported here shows promise in the effort toward defining an ovarian cancer biomarker detection panel. A major advantage is its unbiased approach to biomarker discovery potentially spanning the entire plasma proteome. Our discovery platform allowed us to exploit the spe- Horizontal lines represent the mean signal intensity for that group. Signal intensities are in arbitrary units. p values indicated are from a Wilcoxon test ranking the signal intensities and measuring how well each antibody can discriminate disease status. Abc means the antibody was purchased from Abcam, Abn means the antibody was purchased from Abnova, and GW means the antibody was purchased from GenWay. arb, arbitrary. cific advantages of antibodies as high affinity capture reagents, capitalized on the power of a diverse antibody library, and allowed us to screen antibodies in high throughput using low volumes of biological fluids.
We used a human naïve phage-displayed scFv library, panned with proximal fluid of ovarian cancer patients and healthy controls, to select a sublibrary of scFv clones that bind differentially expressed proteins between these two fluids. We further selected this sublibrary by printing purified scFv in a microarray format and using it to interrogate labeled serum to determine which scFv could discriminate disease status. Several rounds of array printing and filtering the best performing scFv were performed until we obtained a list of 19 scFv that could predict disease in a statistically significant manner. These top candidates were incubated on functional protein arrays to determine which proteins they bound. To the best of our knowledge, such a combination of techniques has never been used to discover novel cancer biomarkers.
It is noteworthy that for two of our top four markers, we identified the target with two independent scFv on the functional protein array (Table IV), increasing the confidence that these may be true markers for ovarian cancer. It is not surprising that most of the top 19 candidates were obtained with the proximal fluid selection because the proximal fluid essentially bathes the tumor and therefore might contain a higher concentration of proteins secreted by the tumor.
Validation of the targets of four scFv was accomplished using full-length commercially available antibodies to detect overexpression of these proteins in individual cancer serum and proximal fluid samples. The proximal fluid was the same as used to select the original scFv sublibrary, suggesting that the panning procedure, antibody microarrays, and NAPPA are effective at obtaining a sublibrary of scFv enriched for those that bind cancer-specific proteins. The differences in signal intensities between cases and controls seemed to be more prominent in the proximal fluid samples than in the serum samples both because more antibodies were statistically significant in the proximal fluid experiments and also because the p values associated with these were generally also lower in this group. This is not surprising because it is believed that the proximity to the tumor and the smaller volumes of cysts and ascites (compared with 3 liters of serum in an adult body) would account for the higher tumor marker concentration in these types of fluid. Alternatively, because all four scFv for our validated targets were selected with proximal fluid instead of serum, it could be an indication that the panning process is effective, and therefore the scFv are identifying good targets in proximal fluid that also happen to be present in serum. Antibodies for the top 19 scFv that were selected with serum were either not available or did not perform well, so these targets could not be validated.
Lastly, we confirmed expression of USF1 protein in tumor sections compared with normal ovarian epithelium, indicating that this proximal fluid and serum marker could originate in the tumor tissue and have an important biological role in cancer progression. Because USF1 is a transcription factor, one would expect to find it expressed mainly in the nucleus (although at very low concentrations because most transcription factors are not expressed at high levels). However, our staining shows that it is mostly found in a perinuclear compartment that appears Golgi-related. This mislocalization could be a clue to its deregulation in cancer, and because the Golgi apparatus is part of the secretory pathway, it could explain why the protein is found in cancer serum. Further studies are underway to confirm this hypothesis.
All targets validated in this work have been reported to be transcribed in ovarian tissue (40 -42). USF1 and ZNF297 transcripts have been found overexpressed in ovarian tumors with respect to normal human ovarian surface epithelial cells using a digital Northern technique (43). USF1 and BAT4 have moderate/high expression in ovarian cancer tissue using immunohistochemistry according to the human protein atlas. USF1, TLR2, and ZNF297 have been reported previously to have functions that are concordant with their possible role in tumor formation. Not much is known about BAT4 and its function. USF1 is a ubiquitously expressed transcription factor that plays major roles in regulation of gene expression in response to a wide range of stress inducers, including DNA damage and oxidative stress, leading to activation of cell cycle regulation genes such as CYCLIN B1, CDK1, and hTERT and the FIG. 6. Immunohistochemical staining using full-length antibodies against scFv targets. Formalin-fixed ovarian tumors and normal ovarian tissue were stained with a USF1 antibody and a hematoxylin counterstain. A shows the normal tissue (mostly stroma) stained with hematoxylin and the endothelial cells from the blood vessels stained with the antibody. B shows a higher magnification of the normal stroma and endothelial cells found in the box in A. C shows the cancerous tissue (stage IIIC) with USF1 staining and unstained tissue surrounding the tumor. D shows a higher magnification of the tumor tissue in the box of C indicating that the USF1 staining is mainly perinuclear, possibly Golgi. Bar at low magnification, 100 m; bar at high magnification, 5 m.
DNA damage response genes TP53 and BRCA2 (44). Although the main function of TLR2 is in the innate immune response to bacterial lipoproteins, it has been reported to be important in such diverse diseases as sporadic colorectal cancer (45), prostate cancer (through its binding partners TLR1, -6, and -10 (46)), anticancer immune response (47), asthma, arthrosclerosis, arthritis, and diabetes (48). ZNF297 is a ubiquitously expressed zinc finger protein involved in transcriptional regulation, but very little is known about its precise function. BAT4 is an intracellular protein that has nucleic acid binding properties. Its gene has been localized in the vicinity of the genes for tumor necrosis factor ␣ and tumor necrosis factor ␤. These genes are all within the human major histocompatibility complex class III region. The protein encoded by this gene is thought to be involved in some aspects of immunity.
Although our results are encouraging, further studies are required to validate these proteins as clinically useful early detection biomarkers for ovarian cancer. Larger sample sets would have to be screened, and an ELISA for high throughput, quantitative, and specific measurement of these proteins in serum would have to be developed for these markers to continue advancing through the biomarker discovery pipeline. Furthermore, measuring the expression of these antigens in serum from different cancers needs to be performed to demonstrate their specificity for ovarian cancer. Our current efforts are focusing on these endeavors.
Continuing the search for early detection markers of ovarian cancer is important for the advancement of our knowledge of this disease and increasing the survival rates of patients. The technologies we report here are currently being applied to find markers for breast, colon, and pancreatic cancers, and some promising candidates are being followed. Our results constitute an innovative method for high throughput screening of biological fluids for the discovery of novel cancer biomarkers.