|
|
||||||||
,
,¶,||
From the
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario M5G 1X5, Canada,
Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, and ¶ Department of Clinical Biochemistry, University Health Network and Toronto Medical Laboratories, Toronto, Ontario M5G 1X5, Canada
| ABSTRACT |
|---|
|
|
|---|
23%. Gene expression patterns have been used to classify breast tumors into clinically relevant subgroups (luminal A, luminal B, basal, ERBB2-overexpressing, and normal-like) (3, 4). In general, the luminal subtypes are estrogen receptor (ER)1-positive and grow slowly, whereas basal types lack ER and are usually high grade cancers that grow rapidly. Recently the molecular taxonomy has been confirmed by protein expression profiling (5, 6).
Aberrant secretion or shedding of proteins is commonly associated with disease, including cancer. The pathogenic signaling pathways involved during the process of cancer initiation and progression are not confined to the cancer cell itself but are rather extended to the tumor-host interface (7). It is a dynamic environment in which fluctuating information flows between the tumor cells and the normal host tissue. Therefore, it is conceivable that either the tumor itself or its microenvironment could be sources for biomarkers that would ultimately be shed into the serum proteome, allowing for early disease detection (8), monitoring therapeutic efficacy, or understanding the biology of the disease (9). Given that
20–25% of all cellular proteins are secreted, it is reasonable to hypothesize that proteins or their fragments originating from cancer cells or their microenvironment may eventually enter the circulation (10).
Accordingly one of the best ways to diagnose cancer early or to predict therapeutic response is to use serum or tissue biomarkers. Carcinoembryonic antigen and carbohydrate antigen 15-3 (CA 15-3) are the most commonly used tumor markers for breast cancer. Their levels in serum are related to tumor size and nodal involvement and are recommended for monitoring therapy of advanced breast cancer or recurrence but are not suitable for population screening (due to low diagnostic sensitivity and specificity) (11–13). Currently mammography remains the cornerstone of breast cancer screening despite its disadvantages such as high false positive and negative rates, hazardous exposure, and patient discomfort (14, 15). In addition, for women under the age of 40, mammographic screening yields a poor sensitivity of 33% (16, 17). Recent technological advances in proteomics have opened up new and exciting avenues for the discovery of biomarkers or for characterization of molecules involved in cancer initiation and progression.
A number of different proteomics-based approaches have been utilized to discover and characterize disease-specific molecules. Fluid found within the ductal and lobular system of the breast can be extracted through the nipple using an aspiration device to obtain nipple aspirate fluid (NAF) (18). Non-pregnant and non-lactating women continuously secrete and reabsorb this fluid (19). Because of the complex nature of biological fluids relevant to breast cancer, only a handful of high abundance proteins have been identified in NAF, illustrating the need to find another source to mine for the initial biomarker discovery (20–22).
A number of studies have used a cell culture model system in which the cells were grown in serum-free media to perform proteomics analysis (23–28). The clinical relevance of using a cell culture model to understand biological processes and functions has been examined. Using DNA microarrays, the molecular subtypes of 31 breast cell lines yielded two discriminating clusters corresponding to luminal cell lines and basal/mesenchymal cell lines (29). The basal subtype was further subdivided into Basal A and Basal B; this subdivision was not observed in primary tumors. Also recently, it was found that cell lines display the same heterogeneity in copy number and expression abnormalities as the primary tumors (30).
In this study, we report a shotgun proteomics approach to sample the conditioned media of three human breast cell lines (MCF-10A, BT474, and MDA-MB-468). MCF-10A, a Basal B subtype with intact p53, was derived by spontaneous immortalization of breast epithelial cells from a patient with fibrocystic disease, and it has been used extensively as a normal control in breast cancer studies (31). These cells do not survive when implanted subcutaneously into immunodeficient mice (31). BT474, a luminal subtype obtained from a stage II localized solid tumor, is positive for ER and progesterone receptor (50–60% of all breast cancer cases) (32). This cell line also displays amplification of Her-2/neu or ERBB2 (30% of all breast cancer cases) (33). Her-2/neu is a cell membrane surface-bound tyrosine kinase involved in signal transduction, leading to cell growth and differentiation. Its overexpression is associated with a high risk of relapse and death (33) and is the target of the therapeutic monoclonal antibody Herceptin (34). Finally MDA-MB-468, a Basal A-like subtype obtained from a pleural effusion of a stage IV patient (35), is ER- and progesterone receptor-negative (15–25% of breast cancer) and phosphatase and tensin homolog-negative (30% of breast cancer) (36, 37).
These cell lines were cultured in serum-free media (SFM) to ensure that the collected conditioned media (CM) contain no other extraneous proteins except for the secreted or shed proteins from the cancer cells. By collecting and concentrating large volumes of CM produced from cell lines representing seminormal (MCF-10A), non-invasive (BT474), and metastatic origins (MDA-MB-468), the secreted and shed proteins would accumulate in the CM, thereby facilitating their identification through MS. Our comparative proteomics analysis of the CM of MCF-10A, BT474, and MDA-MB-468 identified over 600, 500, and 700 proteins, respectively. A large portion of the proteins was present in all three cell lines; however, a significant portion contained proteins that were unique to each of the lines. Among these were our internal control proteins, human kallikreins 5, 6, and 10, that were identified by MS and ELISA in MDA-MB-468 cells at a concentration ranging from 2 to 50 µg/liter. Members of the human kallikrein family (KLKs) have been implicated in the process of carcinogenesis, and the application of kallikreins as biomarkers for diagnosis and prognosis is currently being investigated. Kallikreins are secreted enzymes that encode for trypsin-like or chymotrypsin-like serine proteases (38). Prostate-specific antigen (KLK3), belonging to the family of human tissue kallikreins, and human kallikrein 2 (KLK2) currently have important clinical applications as prostate cancer biomarkers (39). In addition to the control proteins, various proteases, receptors, protease inhibitors, cytokines, and growth factors were identified. Cellular localization, biological function, and Unigene analyses were performed for the shortened list of candidates consisting of extracellular, membrane, and unclassified proteins. A significant degree of overlap was observed among the proteins identified in this study using a cell culture model and other studies using relevant biological fluids such as NAF and tumor interstitial fluid (TIF). The expression of four candidate molecules was examined in biological fluids, tissues, serum, and breast cytosols. Finally spectral counting analysis revealed promising molecules to investigate further for both understanding the disease and as potential biomarkers for breast cancer.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Cell Culture—
Approximately 30 x 106 cells were seeded individually into six 175-cm2 tissue culture flasks per cell line. After 2 days, the RPMI 1640 or DMEM/F-12 media were discarded, and the cells were rinsed twice with 1x PBS. Following this, 30 ml of chemically defined Chinese hamster ovary serum-free medium (Invitrogen) supplemented with glutamine (8 mM) (Invitrogen) were added, and the flasks were incubated for an additional 24 h. The CM were collected and spun down to remove cellular debris. CM were then frozen at –80 °C until further use. A 1-ml aliquot was taken at the time of harvest to measure for total protein (Bradford assay), lactate dehydrogenase (LDH), KLK5, KLK6, and KLK10 via ELISA. The adhered cells were trypsinized and counted using a hemocytometer. This procedure was repeated several times for reproducibility. In addition, 30 ml of the culture media (RPMI 1640 and DMEM/F-12) were subjected to the same conditions as above with no cells added and used for comparison. For the MDA-MB-468 cell lysate experiment, at the end of 24 h in SFM, the adhered cells were lysed using a French press (Thermo Electron) in which the cells are sheared by forcing them through a narrow space. Total protein was measured, and 400 µg of protein from the lysate were added to 60 ml of chemically defined Chinese hamster ovary medium and processed in the same manner as the CM. The cell lysate experiment was performed in duplicate.
Sample Preparation—
Two 30-ml CM aliquots were combined (60 ml) for each cell line, creating three biological replicates per cell line, and dialyzed using a 3.5-kDa molecular mass cutoff membrane. The CM were dialyzed in 5 liters of 1 mM ammonium bicarbonate solution overnight at 4 °C with two buffer changes. The dialyzed CM were poured equally into two 50-ml conical tubes. The CM were frozen and lyophilized to dryness. The lyophilized sample was denatured using 8 M urea and reduced with DTT (final concentration, 13 mM; Sigma). Following reduction, the sample was alkylated with 500 mM iodoacetamide (Sigma) and desalted using a NAP5 column (GE Healthcare). The sample was lyophilized and trypsin (Promega)-digested (1:50, trypsin:protein concentration) overnight in a 37 °C waterbath. Following this, the peptides were lyophilized to dryness.
Strong Cation Exchange Liquid Chromatography—
The trypsin-digested dry sample was resuspended in 120 µl of mobile phase A (0.26 M formic acid in 10% acetonitrile). The sample was directly loaded onto a PolySULFOETHYL ATM column (The Nest Group, Inc.) containing a hydrophilic, anionic polymer (poly-2-sulfoethyl aspartamide). A 200-Å pore size column with a diameter of 5 µm was used. A 1-h fractionation procedure was performed using an HPLC system (Agilent 1100). A linear gradient of 0.26 M formic acid in 10% acetonitrile as the running buffer and 1 M ammonium formate added as the elution buffer was used. The eluent was monitored at a wavelength of 280 nm. Forty fractions, 200 µl each, were collected every minute after the start of the elution gradient. These 40 fractions were pooled into eight combined fractions (each pool consisting of five fractions) and lyophilized to
200 µl.
Mass Spectrometry (LC-MS/MS)—
The eight pooled fractions per replicate per cell line were loaded into a ZipTipC18 pipette tip (Millipore; catalogue number ZTC18S096) and eluted in 4 µl of 68% ACN made up of Buffer A (95% water, 0.1% formic acid, 5% ACN, 0.02% TFA) and Buffer B (90% ACN, 0.1% formic acid, 10% water, 0.02% TFA). 80 µl of Buffer A were added, and 40 µl were injected onto a 2-cm C18 trap column (inner diameter, 200 µm). The peptides were eluted from the trap column onto a resolving 5-cm analytical C18 column (inner diameter, 75 µm) with an 8-µm tip (New Objective). The LC setup was coupled on line to a 2-D linear ion trap (LTQ, Thermo Inc.) mass spectrometer using a nano-ESI source in data-dependent mode. Each pooled fraction was run on a 120-min gradient. The eluted peptides were subjected to MS/MS. DTAs were created using the Mascot Daemon (version 2.16) and extract_msn. The parameters for DTA creation were: minimum mass, 300 Da; maximum mass, 4000 Da; automatic precursor charge selection; minimum peaks, 10 per MS/MS scan for acquisition; and minimum scans per group, 1.
Data Analysis—
The resulting raw mass spectra from each pooled fraction were analyzed using Mascot (Matrix Science, London, UK; version 2.1.03) and X!Tandem (Global Proteome Machine Manager, version 2.0.0.4) search engines on the non-redundant International Protein Index (IPI) human database version 3.16 (>62,000 entries). Up to one missed cleavage was allowed, and searches were performed with fixed carbamidomethylation of cysteines and variable oxidation of methionine residues. A fragment tolerance of 0.4 Da and a parent tolerance of 3.0 Da were used for both search engines with trypsin as the digestion enzyme. This operation resulted in eight DAT files (Mascot) and eight XML files (X!Tandem) for each replicate sample per cell line. Scaffold (version Scaffold-01_05_19, Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 95.0% probability as specified by the PeptideProphet algorithm (40). Protein identifications were accepted if they could be established at greater than 80.0% probability and contained at least one identified peptide. Protein probabilities were assigned by the ProteinProphet algorithm (41). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. The DAT and XML files for each cell line plus their respective negative control files (RPMI 1640 or DMEM culture media only) were inputted into Scaffold to cross-validate Mascot and X!Tandem data files. Each replicate sample was designated as one biological sample containing both DAT and XML files in Scaffold and searched with MudPIT (multidimensional protein identification technology) option clicked. The results obtained from Scaffold were processed using an in-house-developed program that generated the protein overlaps between samples. Each protein identification was assigned a cellular localization based on information available from Swiss-Prot, Genome Ontology (GO), Human Protein Reference Database, and other publicly available databases. To calculate the false positive error rate, the individual fractions were analyzed using the "sequence-reversed" decoy IPI human version 3.16 database by Mascot and X!Tandem, and data analysis was performed as mentioned above.
Ingenuity Pathways Analysis (IPA)—
Extracellular, membrane-bound, and unclassified proteins were evaluated by Ingenuity Pathways Analysis software to identify global functions of the proteins. This software uses a knowledge base derived from the literature to relate gene products to each other based on their interaction and function. The list of proteins and their corresponding IPI human identification numbers were uploaded as an Excel spreadsheet file onto the Ingenuity software (Ingenuity Systems). Ingenuity then used these proteins and their identifiers to navigate the curated literature database. The biological functions assigned to each network were ranked according to the significance of that biological function to the network. A Fischer's exact test was used to calculate a p value. A detailed description of IPA can be found on the Ingenuity Systems website.
Spectral Counting—
Using the number of total spectra output from Scaffold, we identified the differentially expressed proteins using spectral counting. Common peptides among proteins were grouped, proteins containing more than 10% of their total spectra from negative control samples were removed, and one Excel file containing total proteins identified and their presence (defined by spectral counts) in the three cell lines was generated. A normalization criterion was applied to normalize the spectral counts so that the values of the total spectral counts per sample were similar. An average of the spectral counts was generated for each cell line (based on the triplicate samples). The sum of the three variances for the cell lines, an indicator of the variance within each cell line, was calculated. The variance of the average spectral counts for each cell line revealed the variability between the cell lines. Analysis of variance (Fisher test) was performed to obtain the ratio of the "between sample variance" to the "within sample variance." Apparent -fold changes were calculated when possible.
Total Protein Assay and LDH Measurements—
Total protein was quantitated in the CM using a Coomassie (Bradford) protein assay reagent (Pierce). All samples were loaded in triplicates on a microtiter plate, and protein concentrations were estimated by reference to absorbances obtained for a series of BSA standard protein dilutions. LDH, an intracellular enzyme that if found in the CM is an indicator of cell death, was measured using an enzymatic assay based on lactate to pyruvate conversion and parallel production of NADH from NAD. The production of NADH was measured by spectrophotometry at 340 mm using an automated method (Roche Applied Science Modular system).
Quantification of Elafin and KLK5, -6, and -10 by ELISA—
Elafin sandwich ELISA kit, purchased from Hycult biotechnology, was used to measure levels of human elafin in serum, pooled biological samples, and pooled tissue lysates. The assay was performed according to the manufacturer's instructions. The concentration of KLK5, -6, and -10 was quantified with KLK5, -6, or -10-specific non-competitive immunoassays developed in our laboratory (42–44). For more details, see the cited literature.
Biological Samples Examined—
The following 10 biological fluids were examined: amniotic fluid, ascites, breast cyst fluid, cerebral spinal fluid, follicular fluid, milk, NAF, urine, saliva, and seminal plasma. Ten samples were combined for each fluid to generate a pooled sample. The following 27 human tissues were examined (the number indicates the number of samples pooled): adrenal, 5; aorta, 3; bladder, 3; bone, 4; colon, 5; esophagus, 4; heart, 5; kidney, 5; liver, 5; lung, 4; lymph node, 3; muscles, 3; pancreas, 5; skin, 4; small intestine, 4; spinal cord, 3; spleen, 5; stomach, 4; thyroid, 3; trachea, 4; ureter, 4; prostate, 4; testis, 4; breast, 4; fallopian tube, 4; uterus, 4; and ovary, 2.
| RESULTS |
|---|
|
|
|---|
6–7% cell death was occurring in the CM of the cells. Finally to demonstrate the accumulation of extracellular proteins in our optimized cell culture model system, our internal control proteins KLK5, KLK6, and KLK10 were quantified in MDA-MB-468 using ELISAs (Supplemental Fig. 1E). The remaining two cell lines do not secrete any kallikreins unless they are hormonally stimulated. Kallikrein 5 was the most abundant kallikrein expressed in this cell line (50 µg/liter) followed by KLK10 (
3.5 µg/liter) and KLK6 (
2 µg/liter).
Identification of Proteins by Mass Spectrometry—
The workflow and experimental design performed in this study are shown in Fig. 1. Approximately 35–40 proteins were identified in the negative control samples, which were made up of the culture media only (Supplemental Table 1A). Many of the proteins in this list originated from fetal bovine serum used to initially culture the cells. These proteins were deleted from the list of total proteins identified in the CM and were not considered further. In MCF-10A, we identified 632 proteins (Fig. 2A). Of these, 459 were identified in all three replicates, yielding a protein identification reproducibility of 73%. Furthermore a total of 505 proteins were identified in BT474 (Fig. 2B). For this cell line, 380 proteins were common to all three replicates (75% reproducibility). Finally 723 proteins were identified in MDA-MB-468 (Fig. 2C) of which 553 were identified in all three replicates (76% reproducibility). In general, using the workflow presented here, we typically achieved technical reproducibility (same sample injected twice into the LTQ) of
90% (data not shown). The total number of proteins identified per number of unique peptides per cell line is shown in Table I. Many of the proteins identified contained two or more unique peptide hits. Supplemental Table 2 contains detailed information on all of the proteins identified for each of the cell lines, including number of unique peptides identified per protein, peptide sequences, precursor ion mass, and charge states. In addition, eight, three, and five proteins were identified in MCF-10A, BT474, and MDA-MB-468 cells, respectively, using a non-sense database, yielding a false positive rate of
1% (see Supplemental Table 1B) (45).
|
|
|
Cellular Localization of Identified Proteins—
Fig. 2, D–F, display the cellular distribution of proteins identified in the conditioned media of MCF-10A (D), BT474 (E), and MDA-MB-468 (F). 22% of the proteins identified in MCF-10A CM were classified as being extracellular and membrane-bound. For BT474 and MDA-MB-468, the percentages were 25 and 28%, respectively. A large portion of proteins identified were classified as intracellular (>50%), whereas 4–5% remained unclassified.
Overlap of Proteins between the Three Cell Lines—
The proteins identified among the three cell lines were analyzed for overlapping members (Fig. 3). A significant portion (234 or 20%) of the 1,139 proteins was identified in all three cell lines (Fig. 3A). Fig. 3, B and C, show the overlap among the 175 extracellular proteins and the 211 membrane-bound proteins, respectively. Combined together, extracellular and membrane proteins accounted for 34% of all proteins identified. MDA-MB-468 displayed the greatest number of extracellular and membrane proteins, presumably illustrating that cancer cells secrete and/or express an increased amount of these proteins. In accordance with this postulation, cellular localization analysis of the overlap between BT474 and MDA-MB-468 yielded the greatest percentage (40%) of secreted and membrane proteins (Fig. 3D).
|
Tissue-specific Expression and Biological Function Analysis—
Unigene analysis of 422 extracellular, membrane-bound, and unclassified proteins ("shortened list of candidates"), using the Expressed Sequence Tag ProfileViewer, was utilized to identify genes that were relatively breast-specific. Of these genes, five were found to be relatively specific to normal and/or cancerous breast (SCGB1D2, SBEM, TFF1, DCD, and CALML5). Literature mining on these proteins showed that one of the proteins had previously been evaluated in serum of breast cancer patients (trefoil factor 1). As well, using the IPA software, we classified the shortened list of proteins by biological function. The top 15 functions are displayed in Fig. 4A. The top functions were cellular movement followed by cell-to-cell signaling and interaction. Finally these proteins were searched with the Human Plasma Proteome database to decipher whether they have been identified in plasma. 104 of 422 proteins were identified in human plasma.
|
260 proteins using 2-D gel electrophoresis, immunoblotting, and mass spectrometry of which 112 were also identified in our study (43%). Fig. 4B summarizes the overlaps observed among the other publications and the data presented here. Supplemental Table 4 contains all of the proteins identified by us and the four previous studies. Finally the lung, along with the bone, is one of the most frequent sites of breast cancer metastasis. A set of 54 genes that mediate breast cancer metastasis to the lungs have been identified (50). Given that MDA-MB-468 cells were collected from pleural effusion, we compared the list of proteins identified from MDA-MB-468 conditioned media with the 54 candidate lung metastasis genes. Seven genes were found in the CM of our cell line (KYNU, TNC, ROBO1, FSCN1, MAN1A1, LTBP1, and GSN). Interestingly none of the genes that overlapped between the lung and bone metastasis signatures were identified in MDA-MB-468.
Spectral Counting and Differential Expression of Proteins Identified—
An alternative way to decipher protein abundance is to perform multidimensional scaling for all nine experiments (each cell line in triplicate) using spectral counts. Refer to "Materials and Methods" for details on how the analysis was conducted. The Venn diagram in Supplemental Fig. 4 displays the overlaps of proteins among the cell lines based on spectral count analysis and their cellular localization. The top
100 extracellular and membrane-bound proteins obtained from spectral counting analysis are shown in Table II. The variability within the replicates (within variance) and between the three cell lines (between variance) are highlighted along with the F ratio. Apparent -fold changes were calculated where possible. A numerical value is indicated in places where both cell lines/conditions being examined contain a normalized spectral count greater than zero. In the event of a comparison where one of the conditions/cell lines had a spectral count of zero, an expression is given (e.g. BT >> MCF; indicating that the spectral count for BT474 was greater than MCF10A). Cells that are gray indicate a negative -fold change, whereas cells that are white indicate a positive (numerical value indicated) or no -fold change (cells are blank). In addition, in the first column displaying the -fold change between BT474/MCF-10A, cells in white highlight the proteins that have a higher spectral count in BT474 compared with MCF-10A, whereas cells in black highlight proteins that have a lower spectral count in BT474 compared with MCF-10A. A similar color coding scheme applies to the other two columns comparing the different cell lines/conditions within each column. Known breast cancer biomarkers such as Her-2/neu are among the top five proteins identified by this unbiased method of analysis. Furthermore 23 proteins previously associated with cancer (as determined by Ingenuity Biomarkers Comparison Analysis software) were found among the top
100 extracellular and membrane proteins including epidermal growth factor receptor and various insulin-like growth factor-binding proteins (IGFBP-2, -3, -5, and -6). Supplemental Table 5 contains all of the 1,062 proteins on which this analysis was performed. In addition, Supplemental Table 6 contains the overlaps of the proteins among the cell lines.
|
|
| DISCUSSION |
|---|
|
|
|---|
We specifically examined MCF-10A, BT474, and MDA-MB-468 because they represent progression phases to breast cancer. We observed a higher cell death in MCF-10A, which was expected because MCF-10A is considered to be a normal breast epithelial cell line that does not have the advantage of growing uncontrollably as do the cancer cell lines in SFM (Supplemental Fig. 1). It was our aim that examining the proteins that are unique to each of the cell lines might shed light into both the pathways leading to breast cancer development and to the discovery of biomarkers for breast cancer. Toward this aim, using the Ingenuity Pathways Analysis, the shortened list of candidates was filtered by extracting only those genes that were identified in literature to be present in human and associated with cancer. Approximately 100 of 422 extracellular and membrane genes were identified to meet this criterion. The top 14 canonical pathways that demonstrated a relationship to the 100 genes filtered as well as known genes linked to breast cancer are highlighted in Fig. 6. Dysregulation of signaling pathways plays an important role in cancer initiation and progression. The proteins identified in this study may be useful for further investigation.
|
Particular emphasis was placed on extracellular, membrane-bound, and unclassified proteins because these proteins have the highest chance of being found in the circulation and thus serving as cancer biomarkers or as important molecules involved in cancer progression. More than 34% of these proteins were classified as being extracellular and membrane-bound. Among the known and novel proteins released by breast cancer cells, we identified various proteases, receptors, protease inhibitors, and cytokines. All experiments were performed in triplicates with excellent reproducibility between runs. Due to the inherent nature of mass spectrometers, not all peptides were ionized in each run, and consequently, different peptides were selected for ionization and detected. This selective ionization can account for the 75% reproducibility in our biological triplicates. As well, the various steps during sample preparation, including C18 extraction of the fractions, cannot be dismissed as an important contributing factor to the variations observed. In addition, we validated the ability of elafin to discriminate between normal and tumor breast cytosols. Currently no conclusion can be drawn without examining more cytosols. Although we did not observe circulating biomarker potential in elafin to discriminate normal from diseased individuals, to our knowledge, we are the first to report on the expression of this protein in breast cytosols, biological fluids, and tissue lysates using an immunoassay. Elafin was found to be expressed in normal breast tissue.
Although the objective of this study was to identify secreted and membrane-bound proteins that have the potential to be cleaved and thus found in circulation, the identified proteins included many intracellular proteins, including ones classified by GO as nuclear and cytoplasmic. During the cell culture process, a portion of the cell population will die, resulting in the release of intracellular proteins into the media. Despite optimizing cell culture conditions to minimize cell death, the identification of intracellular proteins in the CM is inevitable because of the high sensitivity of MS-based techniques utilized in this study. Martin et al. (23) examined the CM of a prostate cancer cell line and found a very similar GO distribution to the one we present here: that more than 50% of the proteins identified were intracellular. Therefore, one of the major challenges in the analysis of secreted proteins is distinguishing between proteins that are targeted to the extracellular space versus those that arise as low level contaminants due to normal cell death in routine cell culture. To address this, we performed a cell lysate proteome experiment to demonstrate conclusively that through our approach we are significantly enriching for secreted proteins. Furthermore a recent study examining the cell lysate proteome of a human mammary epithelial cell line found that 2% of the entire proteome was classified as extracellular, which was consistent with our findings (63).
Recently the analysis of thousands of genes in breast and colorectal cancers has shown that individual tumors can accumulate an average of 90 mutant genes (64). The authors identified 189 previously unknown genes that were mutated at a high frequency. From these genes, six were found in this study (filamin-B, spectrin
chain, gelsolin, extracellular sulfatase Sulf-2, neuronal cell adhesion molecule, and polypeptide N-acetylgalactosaminyltransferase 5). Furthermore it is particularly important that many of the proteins identified by other groups using relevant biological fluids such as NAF and TIF were also present in our analysis (see Supplemental Table 8). Many of the proteins identified by Pawlik et al. (47) were highly abundant serum proteins such as albumin, transferrin, and various immunoglobulins, all of which were not identified in our serum-free media. As well, the top biological functions of the candidate molecules appear well suited to cancer initiation and progression such as those involved in cellular movement and cell-to-cell signaling. Finally a large portion of the shortened list of proteins identified was also found in the Human Plasma Proteome database. This finding was not unexpected as it served to highlight the fact that many of the proteins identified in our study had previously been found in plasma. But it also demonstrated that many more proteins have yet to be identified in plasma. There can be a number of reasons why they have not been identified, one of which is the fact that their concentration in plasma is too low to measure by current technologies, and thus other means of initially identifying them and then developing a specific and sensitive immunoassay are critical.
Our group has previously published data on the conditioned media of a prostate cancer cell line, PC3(AR)6, using a roller bottle cell culture method (65). Through this approach, the authors identified 262 proteins in the CM. The workflow presented in the current study has significant differences and improvements compared with our previous work. At the tissue culture step, we cultured three cell lines in triplicate versus only one cell line studied. We also optimized our cell culture conditions (seeding density, incubation time, and volume of media used) to minimize cell death and maximize secreted protein content. Different methods of fractionation of the peptides versus protein fractionation and a more robust and sensitive mass spectrometer were utilized in this study compared with our previous work. Finally the bioinformatics analysis has also been significantly improved upon from before as we used two different search engines and incorporated protein and peptide probability calculations into our final list of proteins. As a result, in the current study, we identified over 1,000 proteins using the conditioned media approach.
Current therapies for advanced cancers are elusive. Novel breast cancer biomarkers that can be effective early in the course of the disease have the potential to reduce morbidity and mortality as well as receive a higher compliance rate by patients for undergoing screening. However, there is a growing consensus that panels of markers may be able to supply the specificity and sensitivity that individual markers lack. A number of studies have demonstrated that this is indeed true. Although a biomarker should be detected in serum using an immunoassay (ELISA), developing such an assay for multiple potential novel biomarkers is very labor-intensive (66). The vast majority of the proteins identified in this study (extracellular, membrane, and unclassified) do not have commercially available ELISA kits. Alternatively to decipher whether the candidates are present in serum, multiple reaction monitoring mass spectrometry technology can be performed. Using the latter technology, it is possible, in a single experiment, to detect and quantify specific peptides (representing specific proteins) in biological fluids of patients with breast cancer to determine whether the protein has biomarker potential. A number of studies have shown the feasibility of such an approach (67–69).
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, July 25, 2007, DOI 10.1074/mcp.M600465-MCP200
1 The abbreviations used are: ER, estrogen receptor; CA, carbohydrate antigen; CM, conditioned media; 2-D, two-dimensional; ECD, extracellular ligand binding domain; GO, Genome Ontology; KLK, kallikrein; LDH, lactate dehydrogenase; NAF, nipple aspirate fluid; SFM, serum-free media; TIF, tumor interstitial fluid; DMEM, Dulbecco's modified Eagle's medium; IPI, International Protein Index; IPA, Ingenuity Pathways Analysis. ![]()
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
|| To whom correspondence should be addressed: Dept. of Pathology and Laboratory Medicine, Mount Sinai Hospital, 600 University Ave., Toronto, Ontario M5G 1X5, Canada. Tel.: 416-586-8443; Fax: 416-58-8628; E-mail: ediamandis{at}mtsinai.on.ca
| REFERENCES |
|---|
|
|
|---|