Integrating Proteomic and Functional Genomic Technologies in Discovery-driven Translational Breast Cancer Research*

The application of state-of-the-art proteomics and functional genomics technologies to the study of cancer is rapidly shifting toward the analysis of clinically relevant samples derived from patients, as the ultimate aim of translational research is to bring basic discoveries closer to the bedside. Here we describe the essence of a long-term initiative undertaken by The Danish Centre for Translational Breast Cancer Research and currently underway for cancer biomarker discovery using fresh tissue biopsies and bio-fluids. The Centre is a virtual hub that brings together scientists working in various areas of basic cancer research such as cell cycle control, invasion and micro-environmental alterations, apoptosis, cell signaling, and immunology, with clinicians (oncologists, surgeons), pathologists, and epidemiologists, with the aim of understanding the molecular mechanisms underlying breast cancer progression and ultimately of improving patient survival and quality of life. The unifying concept behind our approach is the use of various experimental paradigms for the prospective analysis of clinically relevant samples obtained from the same patient, along with the systematic integration of the biological and clinical data.

The application of state-of-the-art proteomics and functional genomics technologies to the study of cancer is rapidly shifting toward the analysis of clinically relevant samples derived from patients, as the ultimate aim of translational research is to bring basic discoveries closer to the bedside. Here we describe the essence of a longterm initiative undertaken by The Danish Centre for Translational Breast Cancer Research and currently underway for cancer biomarker discovery using fresh tissue biopsies and bio-fluids. The Centre is a virtual hub that brings together scientists working in various areas of basic cancer research such as cell cycle control, invasion and micro-environmental alterations, apoptosis, cell signaling, and immunology, with clinicians (oncologists, surgeons), pathologists, and epidemiologists, with the aim of understanding the molecular mechanisms underlying breast cancer progression and ultimately of improving patient survival and quality of life. The unifying concept behind our approach is the use of various experimental paradigms for the prospective analysis of clinically relevant samples obtained from the same patient, along with the systematic integration of the biological and clinical data. Molecular & Cellular Proteomics 2:369 -377, 2003. BREAST CANCER Breast cancer is the most common malignancy among women in the Western world and constitutes 18% of all cancers in women (1). In Denmark, ϳ3800 women develop breast cancer per annum, and an estimated 1200 die from the disease (2).
At present, routine mammography is the most widely used tool for the early detection of breast cancer, and Servicescreening programs reduce breast cancer-related mortality (3,4). To be detected, however, a tumor should be at least a few millimeters in size, a situation that potentially influences the odds of survival and cure. Parameters such as axillary lymph node status, tumor size, histological grade, and age, in combination with predictive factors such as estrogen and progesterone receptors, are currently used for selecting the appropriate systemic therapy (5).
Patients with primary breast cancer are offered a combination of treatment options such as surgery, often followed by adjuvant irradiation, chemotherapy, and/or endocrine therapy. These treatments have proven effective; however, despite adjuvant systemic therapy, ϳ60% of patients with lymph node-positive disease will experience a recurrence, and most of them will die from disseminated breast cancer (6). For patients with lymph node-negative disease, the 5-year recurrence rate is ϳ25%, suggesting that the risk of relapse and subsequent death is closely related to the stage of the disease at the time of primary surgery. A reasonable assumption would therefore be that the survival rate of breast cancer could be improved if the number of patients being diagnosed with early-stage disease, i.e. node-negative disease, was increased. In this context it would be important to develop new diagnostic tools to detect breast cancer at a very early stage, as this will provide one way to minimize disease-related mortality.
Today, adjuvant systemic therapy (chemotherapy and/or endocrine therapy) is offered to patients of different risks of recurrence and death, i.e. to a prognostically heterogeneous group with risks ranging from 10 to 80%. This group is characterized according to classical prognostic factors (nodal status (positive), size of the primary tumor (Ն20 mm), malignancy grade (II-III), steroid receptor status (negative), age (Ͻ35 years)) and constitutes about 70% of all new breast cancer patients (6). It is a well-established fact that 30 -40% of the expected deaths can be avoided if adjuvant systemic therapy is offered to this patient group. However, in absolute terms the mortality reduction amounts to only a few percent (i.e. from 5 to 3%) in the low-risk group and to ϳ25% in the high-risk group (i.e. from 80 to 50%). Thus, although adjuvant systemic therapy has led to a significant improvement of the prognosis of the breast cancer population, it also carries the significant adverse effect of overtreatment (7,8).
It is well known from the treatment of advanced breast cancer that patients nonresponsive to one specific type of chemotherapy, or endocrine therapy, may react positively to another type of each of the two modalities, indicating that response to a specific treatment may relate to specific characteristics (predictive factors) of the tumor. Thus, there is a pressing need to develop new independent prognostic and predictive indicators or signatures in primary breast cancer to improve the selection of patients for specific, ideally tailored treatments (9 -14 and references therein).

IMPACT OF PROTEOMIC AND FUNCTIONAL GENOMIC TECHNOLOGIES IN TRANSLATIONAL CANCER RESEARCH
The sequencing of the human genome together with that of model organisms has paved the way to the revolution in biology and medicine that we are experiencing today. In particular, the explosive growth in the number of new and powerful technologies within proteomics and functional genomics (9,11,(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and references therein), in combination with bioinformatics, promises to accelerate the application of basic discoveries into daily clinical practice (Fig. 1).
Cancer, being a complex multifactorial disease group that affects a significant proportion of the population worldwide, is a prime target for focused multidisciplinary efforts using these novel and powerful technologies (19,(31)(32)(33)(34). Indeed, tools for the rapid and efficient analysis of genes and their products are expected to hasten the translation of basic research findings into clinical applications by improving drug development and clinical trial methodologies, as well as by providing biomarkers for diagnosis, prognosis, early detection, and novel therapies. In particular, array and proteomic technologies are expected to play a key role in the study and treatment of human cancers as they provide invaluable resources to define and characterize regulatory and functional networks of genes and proteins within cells. In addition, proteomics provide tools to investigate the precise molecular defect(s) in cancer tissues and may help develop specific reagents to better understand different stages of the pathology. For drug discovery, proteomics provides tools for identifying new clinically relevant drug targets, as well as functional insight for drug development (35)(36)(37).
Presently, the application of state-of-the-art technologies from proteomics and functional genomics to the study of cancer is rapidly shifting to the analysis of clinically relevant samples such as biopsy specimens (11, 33, 38 -52 and references therein) and bio-fluids (53)(54)(55)(56), as the ultimate aim of translational research is to bring basic discoveries closer to the bedside (31,33). The complexity of the biological samples, however, is daunting and represents one of the most important hurdles we face today for implementing the new technologies. In addition, issues related to sample collection, handling, storage, and preparation are crucial and must be carefully addressed. the enabling technologies, resources, and expertise available in order to make a substantial impact on the disease. With these concerns in mind, the Danish Cancer Society recently catalyzed the creation of the Danish Centre for Translational Research in Breast Cancer (DCTB), 1 a virtual hub that brings together scientists working in various areas of basic cancer research (cell cycle control, invasion and micro-environmental alterations, apoptosis, cell signaling, and immunology), with oncologists, surgeons, pathologists, and epidemiologists, in an integrated, mission-oriented environment. The ultimate aim is to understand the molecular mechanisms underlying the disease and to improve breast cancer patient survival and quality of life. Main long-term objectives of DCTB are: • elucidation of signaling pathways involved in cancer progression; • identification of markers and/or signatures for classifying histopathological types, and for patient stratification for tailored treatment; • identification of markers and/or signatures for early detection, prognosis, and response to treatment; and • identification of novel targets for drug discovery and therapeutic intervention.
To achieve these goals, we focus on the prospective analysis of fresh tissue biopsies obtained from the same patient (Fig. 2, A and B) using a plethora of technologies from genomics, proteomics, functional genomics, cell biology, and bioinformatics. Briefly, biopsies dissected from the same tumor, axillary nodal metastasis, or nonmalignant breast epithelium are distributed to members of DCTB who apply various experimental paradigms (Fig. 2C) in a well-defined clinical and pathological framework. In due course, these studies will be complemented by proteomic analysis of plasma obtained from the same patient. Data will be integrated and shared through a large data base supported by the Image Informatics Platform-SIMS (Scimagix, San Mateo, California), a Webbased system that allows management, mining, and integration of image data generated across experiments and research sites. For retrospective studies, the Centre has the possibility of accessing samples stored at the Danish Breast Cancer Cooperative Group tumor repository bank, which contains frozen tissue from ϳ10,000 breast cancer patients, for which a full clinical follow-up is available. What follows below is a brief discussion of issues and strategies related to sample collection, handling, storage, and standardization of sample preparation, in particular for gel-based proteomics, which were recently addressed in a pilot study that involved 26 high-risk 2 breast cancer patients.

STRATEGIC ISSUES ADDRESSED IN THE PILOT PROJECT
Patient Selection-Access to large tumors and abundance of patient-derived breast nonmalignant epithelial tissue was deemed essential, as the chosen technological approach required the application of several enabling technologies to the same tissue biopsy. Consequently, women with primary operable high-risk invasive breast cancer were selected for the 1 The abbreviations used are: DCTB, Danish Centre for Translational Research in Breast Cancer; IHC, immunohistochemistry; 2D, two-dimensional; IEF, isoelectrofocussing. 2 The criteria for high-risk cancer applied by DCGB are age below 35 years old, and/or tumor diameter of more than 20 mm, and/or histological malignancy 2 or 3, and/or, negative estrogen and progesterone receptor status and/or positive axillary status. study. All 26 patients had no previous surgery to the breast and did not receive preoperative treatment. They presented a unifocal tumor of an estimated size of more than 20 mm, and all patients, except one, had mastectomy including axillary dissection (Fig. 2). Tumors studied were mostly of ductal type (22 patients), but also lobular (2 patients), medullar (1 patient), and mucinous lesions (1 patient) were included in this preliminary study.
Sample Collection and Handling-Tissue biopsies (nonmalignant epithelial tissue, tumor, and axillary nodal metastasis) were collected from the Pathology Department at Righshospitalet 20 -30 min after surgery. Samples for RNA preparation were cleaned with the aid of a scalpel directly at the Pathology Department and were snap frozen in liquid nitrogen prior to transport to the Institute of Cancer Biology. Other biopsy specimens were transported on ice, the whole process taking about 15 min. Samples were dissected at the Institute of Cancer Biology, distributed to members of the DCTB, or processed directly for gel-based proteomics and immunohistochemistry (IHC). A fraction of each sample was stored as archival material to be used for future needs. In general, the collection procedure worked extremely well thanks to the efficient and rapid handling of tissue specimens by the Pathology Department.
Core Enabling Technologies-Because biomarker discovery is a centerpiece within the program, an important issue in the planning process was to identify technologies, particularly for revealing differential gene expression, that could be applied to the analysis of complex tissue biopsies in a prospective manner. In a biomarker discovery environment, high priority lies in the analysis of proteins, as ultimately these are responsible for orchestrating most cellular functions and are the most likely to reflect changes in gene expression. Within the proteomic technologies, two-dimensional (2D) PAGE (17,(57)(58)(59)(60)(61), often referred as gel-based proteomics, multidimensional chromatography, and protein biochips, in combination with mass spectrometry (62, 63 and references therein) are among the tools that can be used for biomarker and drug target discovery. 2D PAGE reveals global patterns of protein expression in which every single protein can be quantitated in relation to the others and is perhaps the only technology readily usable for the analysis of complex tissue biopsies in a prospective manner. Even though it suffers from several limitations (17), we choose this technique in combination with mass spectrometry, given our experience in generating 2D PAGE protein data bases of various cell types (proteomics. cancer.dk) (15,25,64), 3 and because protein markers for different cell types are readily available. The latter facilitates the interpretation of protein profiles where more than one cell type contributes to the overall pattern and ameliorates in part the problem of tissue and cancer heterogeneity. In addition, the use of specific antibodies against a differentially expressed marker in immunofluorescence or IHC greatly facilitates validation of the results (33). 2D PAGE data bases can be readily annotated and provide a comprehensive framework in which to store information gathered using different technologies. It should be stressed that all cell types may share as many as 80 -90% of the proteins detected (25, 64), a fact that will facilitate exchange of information.
Among the other core technologies, high-throughput DNA microarray-based gene expression profiling (9,18,19,65) is complementary to proteomic tools and was deemed as an essential component of the arsenal of technologies that was required, as gel-based proteomics often misses the lowabundance proteins. In addition, we considered technologies for mutation analysis (PCR in combination with denaturing gradient gel electrophoresis) (66), DNA methylation analysis (melting analysis of bisulfite-treated DNA) (67), cell enrichment (laser-capture microdissection) (68), three-dimensional cell culturing (69,70), IHC (71), as well as for the isolation of tumor-infiltrating lymphocytes and peripheral blood lymphocytes with the potential to recognize specific immune markers as well as antigens specifically expressed by breast malignant cells that could constitute putative vaccination targets (72).

CHALLENGES POSED BY BIOPSY SPECIMENS: SAMPLE PREPARATION
A major challenge that must be faced when applying enabling technologies to the analysis of complex tissue biopsies is the highly heterogeneous nature of tissues in terms of cell types and pathology. This is particularly burdensome for protein and RNA expression profile analysis as often the cell composition of the samples as well as the underlying pathology cannot be well defined. These shortcomings do not only affect sample preparation, but also the interpretation of the results.
Sample Preparation for Gel-based Proteomics-Technically, tissues are much more difficult to handle than cultured cell lines and therefore standardization of sample preparation procedures is mandatory before scaling-up in a long-term translational program involving hundreds of patients. Sample preparation for gel-based proteomics proved to be demanding, as it required the analysis of hundreds of 2D gels. For tumors and lymph node metastasis, care was taken to clean the biopsies from clots and other contaminant tissue, and only small pieces of tissue (a few mm 3 in size) were homogenized in lysis solution with the aid of a hand glass homogenizer to maximize the amount of dissolved material. Carrier ampholytes were carefully titrated to provide the best possible resolution, and only small amounts of protein sample, determined after trial runs, were applied to the first-dimension gel in order to avoid overloading and streaking. Fig. 3 shows representative isoelectrofocussing (IEF, Fig. 3A) and nonequilibrium pH gradient electrophoresis (Fig. 3B) 2D gels of whole extracts of tumor proteins stained with silver nitrate. The quality of the separation as well as the number of proteins resolved is very similar to that achieved with cultured cells lines, although several serum proteins, in particular albumin, are still present. Similar results have been obtained in the case of lymph node metastasis. An even larger number of proteins could be detected when tumors or axillary nodal metastasis were labeled overnight with [ 35 S] methionine as depicted in the autoradiogram shown in Fig. 3C. These radioactive gels could be subjected to phosphorimager analysis in order to derive quantitative data. Metabolic labeling, however, proved to be rather variable from tumor to tumor, and additional work will be required to standardize this procedure. Proteins could be readily identified from silver-stained gels using mass spectrometry (73) and immunoblotting (74), and both the availability of 2D PAGE data bases of various cell types (proteomics. cancer.dk) (15,25,64), 4 as well as of specific antibodies are instrumental in interpreting the protein profiles (33).
As far as nonmalignant breast epithelia was concerned, sample preparation proved more difficult as the ratio of glands to connective tissue varied between patients, as well as between different locations within the breast of the same patient. Reasonably good protein profiles (Fig. 4B) were obtained in those cases where the ratio was favorable (Fig. 4A), although they were contaminated to some extent with connective tissue and serum proteins. The latter, however, could be readily deducted from the profiles by comparison with serum and connective tissue patterns generated during the pilot study. In addition, to address the problem of tissue heterogeneity we took a two-sided alternative approach for the enrichment of breast epithelial cells, namely laser-capture microdissection (67) and cell culturing (68,69). For laser-capture microdissection, the number of cells required to obtain a protein profile similar to that depicted in Fig. 4A is in the order of 50,000 cells or more, a fact that hindered the use of this technology on a routine basis. Even if a smaller number of cells would be required thanks to more sensitive protein detection procedures (60), one would still need to address the problem of cellular heterogeneity, as IHC with a single antibody marker can often detect heterogeneity even in ducts that are composed of a small number of cells (Fig. 4D). We also pursued measures to establish cultures of pure cell populations using reconstituted basal membrane material for three-dimensional (Fig. 4E) and monolayer cultures (Fig. 4F) (69,70). Application of gene expression profiling technologies and IHC to these samples should reveal how close they mimic the tissue microenvironment (75).
Sample Preparation for Other Technologies-The isolation of high-quality DNA as well as RNA from tumors and nonmalignant epithelial breast tissue proved to be the most problematic aspect of the nucleic acid-based technologies. We found, however, that published protocols could be adopted with minor adjustments (50,66,67).
Integration of histopathological techniques into the project posed few problems with standard laboratory protocols being mostly used (71). Two key technical aspects of sample acquisition for IHC should, however, be emphasized here. First, in order to avoid structural and/or functional alterations, samples for IHC must be placed in formalin fixative or frozen in liquid nitrogen immediately in the operating theater. Second, the size of the tissue blocks for formalin fixation should be as large as possible in two dimensions (contain the largest possible amount of representative tissue), but only 3 mm in the third dimension to allow proper penetration of the fixative. After asserting routine proficiency in the procedures involving paraffin-embedded as well as cryo-preserved tissue, only optimization of conditions for specific antibodies can present difficulties meriting some additional considerations.
Cell-based technologies (cell culturing, lymphocytes responses, etc.) require that slightly modified sampling conditions and a set of prerequisites (sterility, cell viability) is in place in order for these techniques to be successfully employed. Efforts to establish standard procedures for sample acquisition and handling are currently underway with encouraging preliminary results.

EXTRACELLULAR PROTEINS
The search for markers for early breast cancer detection will be greatly facilitated by the systematic identification of overexpressed proteins that are secreted by tumors. Some of these will be specifically expressed by mammary tissue and may thus represent potential candidates for a rapid bloodbased screening test for early detection of breast cancer. Such a systematic search will require a simple procedure for collecting reasonable levels of these proteins for profile comparisons, followed by protein identification and sequencing using mass spectrometry.
As part of the pilot project, we explored several possibilities for obtaining extracellular proteins and the simplest procedure involved placing small pieces of tumors directly in serum-free buffers overnight at 37°C followed by centrifugation. Fig. 3D shows silver-stained IEF 2D gels of proteins recovered from one such preparation. As expected, there is some contamination with serum proteins (marked with red), but most of the proteins identified so far by mass spectrometry (indicated with blue crosses in Fig. 3D, insert a) and/or immunoblotting (Fig. 3D, insert b) are either derived from direct secretion or are present in vesicles that are shed to the extracellular fluid. Protein profiles are very similar from tumor to tumor, although interesting differences have already been observed. The procedure can also be applied to lymph node metastasis and nonmalignant epithelial tissue. Studies intended to determine the levels of some of these proteins in the serum of the same patients are currently underway. In addition, we are testing whether these sera contain antibodies against any of the extracellular proteins.

LESSONS LEARNED FROM THE PILOT PROJECT AND FUTURE PERSPECTIVES
One of the major outcomes of the pilot study was the realization that we had grossly underestimated the amount of information such a small study could generate. During the initial phase we produced a very large number of images (proteomics, genomics, histology, immunohistochemistry, etc.), a fact that raised the need for providing all those that contributed to the project with integrated access to the collected information, as well as with tools to store and mine image, numerical, and textual information.
To address this problem, we are in the process of implementing the Image Informatics Platform-SIMS, in partnership with Scimagix. This image informatics infrastructure has the capability to store and integrate experimental images with annotations into a common large data base that can accommodate thousands of images a day. Using this single image data repository, which will be centrally managed, researchers at DCTB will be able to maximize data sharing and mine image data generated across experiments and research sites distributed across the virtual Centre. The systematic integration of the data will enhance our understanding of the molecular mechanisms underlying breast cancer and will lead us to systems biology. Ultimately, the information will be integrated with clinical data to generate a prognostic index and molecular profile for each patient as well as to derive markers for early detection, prognosis, and response to treatment. Targets for drug discovery and therapeutic intervention are also expected to arise from these studies. Through a strategic partnership with Scimagix it will be possible to develop the image informatics system in a way that it can be adapted to future developments Even though the pilot study highlighted the limitations of the current technologies when applied to complex and heterogeneous tissue samples, the overall outcome was positive as it proved feasible to orchestrate and manage a multidisciplinary research environment devoted to the study of breast cancer. The DCTB has achieved sufficient critical mass of resources and expertise to attract international networking and is strategically well poised for industrial partnership. Currently, a 5-year project involving 500 high-risk patients is underway in which both prospective and retrospective studies are planned. The latter takes advantage of the Danish Breast Cancer Cooperative Group tumor repository bank, which contains frozen tissue from about 10,000 breast cancer patients, for which a full clinical follow-up is available. The program will be extended to include the systematic analysis of plasma and will embrace additional technologies, in particular for subproteome analysis as well as for the study of protein-protein interactions. These studies will be enriched by the wealth of breast cancer research data currently available in the literature.