|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:43-52, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.


,¶
,
,¶,
,

From the
Department of Chemistry, The University of Michigan, Ann Arbor, Michigan 48109-1055, 
Department of Surgery, The University of Michigan Medical Center, Ann Arbor, Michigan 48109-0654,
Department of Pathology, The University of Michigan Medical Center, Ann Arbor, Michigan 48109-0602, || Department of Statistics, The University of Michigan, Ann Arbor, Michigan 48109-1092, ** Eprogen, Inc., Darien, Illinois 60561, and the ¶ Comprehensive Cancer Center, The University of Michigan Medical Center, Ann Arbor, Michigan 48109-0942
| ABSTRACT |
|---|
|
|
|---|
Ovarian cancer is particularly problematic in that it comprises a heterogeneous group of tumors, many with poorly characterized precursor lesions. Worldwide ovarian cancer is the sixth most common cancer in women with the highest incidence rates appearing in developed countries (1). Epithelial ovarian cancer, or ovarian carcinoma, constitutes about 90% of all ovarian cancers and is divided into four major distinct subtypes (serous, mucinous, endometrioid, and clear cell) based on their morphological features. Interestingly each ovarian carcinoma subtype resembles normal epithelial cells found elsewhere in the female reproductive tract derived from a common embryological precursor known as the coelomic mesothelium (5). For example, serous, mucinous, and endometrioid ovarian carcinomas display morphological features similar to normal epithelial cells lining the fallopian tube, endocervix, and endometrium, respectively. Presently ovarian carcinoma is managed clinically without consideration of morphology, yet there is growing clinical-pathologic and molecular evidence that the different subtypes may represent clinically, biologically, and genetically distinct disease entities.
There are several strategies that can be used to classify cancers based upon either gene or protein expression. In particular, DNA microarrays have been used to characterize global gene expression patterns of cancer samples (1, 4). This technology has enabled the study of comprehensive gene expression profiles of large numbers of tumor samples that can be used to classify cancers based upon characteristic gene expression patterns. In recent work, for example, Schaner et al. (4) used DNA microarrays to identify groups of genes that could distinguish ovarian from breast carcinomas, clear cell subtype from other ovarian carcinomas, and grades I and II from grade III serous papillary carcinomas. In related work, Perou et al. (8) were able to use gene expression signatures to define subclasses of breast cancer, and Sorlie et al. (9) were able to correlate differences in expression patterns of breast cancers with clinical outcome and identify subclasses having poor prognosis. In other work, Giordano et al. (3) were able to use gene expression profiles of adenocarcinomas of the lung, colon, and ovary to demonstrate the ability to classify tumors in an organ-specific manner. Schwartz et al. (5) used gene expression patterns to classify the different subtypes of ovarian cancer and showed that these patterns in ovarian adenocarcinomas reflect both the morphological features as well as the biological behavior. Numerous other studies have also used gene expression to classify various cancers, their subtypes, and their relationship to one another (1012).
An alternative means of classifying different types of cancers involves profiling the protein expression of cells or serum (7, 13). The use of protein expression may be most informative for classification of cancers because mRNA and protein expression from a given gene may be discordant, and it is ultimately the protein expression that determines the function and structure of the cells. In addition, protein expression can be profiled from either tissues or serum. The traditional method for profiling large numbers of proteins from cells is 2-D1 gel electrophoresis (14). A number of studies using large numbers of quantitative 2-D gels for tumor classification have been performed for bladder, breast, lung, prostate, and ovarian cancers (6, 7) where, in general, benign and malignant tumors were identified by proteins that were differentially expressed and tumor stage classified by marker proteins that were up- or down-regulated. Alaiya et al. (6) did extensive work on the classification of ovarian tumors where protein expression of 40 tumor samples was evaluated using 2-D gels with hierarchical cluster analysis to distinguish borderline ovarian tumors from malignant and benign tumors. Other work by this group using quantitative 2-D gel electrophoresis identified protein markers that could classify benign, borderline, and malignant tumors (15). In recent work Jones et al. (7) examined the use of laser capture microdissection of human ovarian epithelial cells in tissue specimens followed by 2-D gel electrophoresis to identify proteins that change between invasive and noninvasive ovarian cancers. These differentially expressed proteins could potentially be used to generate markers of early detection or therapeutic targets unique to the invasive cancer.
Although 2-D gel electrophoresis has been the most widely used technique for separating large numbers of proteins, there are still drawbacks that limit its utility as a general tool for profiling large numbers of samples. The 2-D gel method is generally a slow, manually intensive technique that can require several days to run and stain. Moreover the reproducibility of interlysate comparisons may be limited because of varying run conditions between gels where spots may become difficult to compare and quantitation may be limited. In addition, proteins are embedded in the gel requiring manually intensive procedures to excise the spots for further analysis by mass spectrometry.
An alternative strategy to rapidly classify large numbers of different types of cancers using protein expression profiling is 2-D liquid mapping of proteins (16, 17). This technique uses chromatofocusing as the first separation parameter followed by nonporous reversed-phase HPLC as the second dimension to orthogonally map large numbers of proteins based on their pI and hydrophobicity, respectively. Both dimensions of analysis use standard chromatography (HPLC) equipment designed to reproducibly handle large numbers of samples in the liquid phase. It also uses UV absorption for detection so that quantitative comparisons of protein expression can be performed between samples. The method has distinct advantages in automation in that all fractions are in the liquid phase, large numbers of samples can be run, and protein maps can be obtained for easy differential comparisons. Moreover because proteins elute in the liquid phase, direct interface with other methods such as mass spectrometry is readily achieved.
In this work, we demonstrated the use of an automated 2-D liquid fractionation system for the liquid phase separation and mapping of the protein expression for eight serous ovarian cancer and three ovarian surface epithelium (OSE) cell lines. Hierarchical clustering analysis was used to classify the different samples according to their protein expression profiles showing that specific types of serous carcinoma cell lines tend to cluster together. Several other cell lines, e.g. ovarian clear cell and endometrioid carcinoma and breast epithelial cell lines, were also fractionated and mapped by the 2-D liquid method, and cluster analysis was performed on a total of 18 samples. We compared our cluster analysis with results using oligonucleotide microarrays, identifying some similarities and some differences in the observed clustering patterns. We could classify different types of cancers as well as identify potential marker bands to classify different subtypes of individual cancer using this methodology. The use of this method also limits the number of potential marker bands that may need to be identified by mass spectrometry. We also demonstrated that these samples could be run reproducibly and in an automated fashion using this method.
| EXPERIMENTAL METHODS |
|---|
|
|
|---|
Chromatofocusing (CF) and Nonporous (NPS) RP-HPLC Separation of Ovarian Serous Carcinoma Cell Line Lysates
CF and NPS RP-HPLC were performed continuously using an integrated protein fractionation system, ProteomeLabTM PF 2D (Beckman Coulter, Inc., Fullerton, CA). An HPCF-1D column (250 x 2.1 mm) was used to perform chromatofocusing. Two buffers, a start buffer (SB) (Beckman Coulter, Inc.) and an elution buffer (EB) (Beckman Coulter, Inc.), were used to generate the pH gradient on the column. Both buffers were prepared in 6 M urea and 0.2% octyl glucoside. Before running the CF, the pH of SB was adjusted to 8.5 ± 0.1 and the pH of EB was adjusted to 4.0 ± 0.1 using either a saturated solution (50 mg/ml) of iminodiacetic acid (Sigma, catalog number I5629) if the buffer was too basic or 1 M NH4OH if the buffer was too acidic. A PD-10 G-25 column (Amersham Biosciences) was used to exchange the protein sample from the lysis buffer to the equilibration buffer used in the CF experiment.
The HPCF-1D column was first flushed with 100% distilled water (filtered through a 0.45-µm filter) for 10 column volumes at 0.2 ml/min and then equilibrated with 100% SB for 30 column volumes. After equilibration with SB, the HPCF column was ready to start the ProteomeLab PF 2D default method where injection of the sample began the method. After the method had been started, the column was washed with 100% SB to remove material that did not bind to the column at pH 8.5. When the wash was complete, the UV absorbance returned to base line. Once a stable base line was achieved, the method was initiated at 100% EB. UV detection was performed at 280 nm, and the pH was monitored on line by a flow-through pH probe (Beckman Coulter, Inc.). As the pH decreased, pH fractions were then collected in 0.15 pH intervals where 30 fractions in total were collected in the range of pH 8.54.0. After the pH of the eluent reached 4.0, the HPCF column was washed with 10 column volumes of 1 M NaCl, and the fractions were collected by time. After the salt wash, the HPCF column was washed with 10 column volumes of distilled or deionized water. The CF portion of the method for the ProteomeLab PF 2D required around 185 min.
When the first dimension separation was completed, the pI fractions collected from the first dimension were then automatically run on the second dimension based on the specified ProteomeLab PF 2D sequence. Proteins were resolved by reversed-phase chromatography using an HPCF-2D (4.6 x 33 mm) NPS column (Beckman Coulter, Inc.) and detected by absorbance at 214 nm using a Beckman model 166 UV absorption detector. Solvent A was 0.1% TFA in water with 0.05% n-octyl ß-D-galactopyranoside, and solvent B was 0.08% TFA in acetonitrile with 0.05% n-octyl ß-D-galactopyranoside. The gradient was run from 15 to 25% B in 1 min, 2535% in 6 min, 3538% in 4 min, 3845% in 6 min, 4565% in 2 min, 6567% in 6 min, finally up to 100% in 1 min, and then back to 5% in 1 min. After the gradient, the column was washed by two fast gradients from 5% B to 100% B in 5 min and 100% B back to 5% B in 1 min. At the end of each second dimension run, the method equilibrated the column with an initial mobile phase (A) for 10 column volumes. The flow rate used was 0.75 ml/min, and the column temperature was 65 °C. Proteins were collected for further analysis using an automated fraction collector. The method automatically saved the raw UV absorbance data for each second dimension analysis of the chromatofocusing fractions for protein mapping and data analysis using ProteoVueTM in the PF 2D Software Suite.
Software
The data from the 2-D liquid separations are displayed using ProteoVue and DeltaVue software available in the PF 2D Software Suite (Beckman Coulter, Inc.). The chromatographic UV intensities result from the NPS HPLC second dimension separation of each pI fraction that were converted and displayed in a 2-D "lane and band" format by the ProteoVue software resulting in a highly detailed pI versus hydrophobicity protein expression map. ProteoVue allows comparison of multiple or all second dimension runs for one sample in a 2-D map using either gray scale or a color-coded format where color hue or its intensity is proportional to the relative quantitative UV intensity of each peak. Relationships or patterns within a complex chromatographic data set can be easily viewed in this format. The DeltaVue software allows side-by-side viewing of the second dimension runs for two samples or two groups of samples so that differences in protein expression between them can be compared. This software quantitatively displays one protein map in shades of red and the other map in shades of green. The difference between the two maps is obtained by point-by-point subtraction or by area difference and displayed as a third map in the middle. The color (red or green) at a particular location in the difference map indicates which protein is more abundant, and the color brightness indicates the quantitative difference. The program also provides a means to obtain a quantitative number between the expression levels of protein in the two samples.
Data Analysis and Clustering
Data Standardization
The raw UV data for each sample were standardized to remove differences in the level and slope of the base line. To do this, for each point, the 10th percentile within a window of ±50 measurements was calculated and subtracted from the point. Then negative values were replaced with zero, 0.0001 was added to all values, and the data were log-transformed.
Alignment
Standardized UV data for each pair of samples were aligned to maximize the local correlation coefficients between the aligned samples. Specifically suppose samples A and B are to be aligned. An alignment is defined by a sequence of index pairs (ta(1),tb(1)), (ta(2),tb(2)) ... such that (ta(k + 1) ta(k), tb(k + 1) tb(k)) is equal to (1,0), (1,1), or (0,1). That is, at each step either the A sequence advances by one index, the B sequence advances by one index, or both sequences advance by one index. At the initial point either ta(1) or tb(1) is equal to 1, and at the final point either ta or tb is equal to the length of the data sequence (corresponding to the greatest measured hydrophobicity value).
To evaluate alignment quality, for each pair of indices ta in sample A and tb in sample B such that |ta tb| < 150, the correlation coefficient between the data values A(ta 75) ... A(ta + 75) and B(tb 75) ... B(tb + 75) was calculated. The goal is to maximize the sum of local correlation coefficients over all possible ta, tb sequences that always remain within 150 units of each other. This problem can be efficiently solved using dynamic programming techniques.
Comparisons
To compare the overall pattern of protein expression in the samples, each pair was aligned separately as described above, and then a correlation matrix was formed by calculating the Pearson correlation coefficient between each aligned pair of samples. These correlation matrices were then visualized using a hierarchical clustering technique. The hierarchical clustering technique produces a dendrogram in which pairs of points are joined sooner (i.e. closer to the ends of the dendrogram) if they have greater correlation. Complete linkage clustering was used to define the dendrograms.
Biomarker Identification
To identify potential biomarkers, all samples were first aligned to a single sample selected as the standard. Then comparisons were made separately at each hydrophobicity level between two groups of samples. Values were selected if the ratio between mean levels within the two groups exceeded 4 and if the t test p value between the two groups was less than 0.05. In addition, at least 25 consecutive hydrophobicity levels were required to meet these conditions for the band to be considered as a biomarker.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
Three MCF10 breast epithelial cell lines (MCF10AT1, MCF10CA1a.cl1 (CA1a), and MCF10CA1d.cl1 (CA1d)) were also mapped and compared with the ovarian carcinoma cell lines. The MCF10A cell line originated from a patient with fibrocystic disease (24). MCF10AT1 cells are MCF10A cells transformed with a mutant c-Ha-Ras protein (T24). This line forms preneoplastic lesions in nude mice that represent a premalignant stage with potential for neoplastic progression. CA1a and CA1d were derived from xenografts of MCF10AT1. When subcultured and rexenografted, these lines rapidly form invasive carcinomas with metastatic potential and display histologic variations ranging from undifferentiated carcinomas to well differentiated adenocarcinomas (25).
Analysis of Ovarian Cancer Proteomes
The primary objective of this study was to classify large numbers of samples by mapping and comparing 2-D liquid maps generated with UV detection so that we can obtain basic information on the similarities and differences among ovarian cancer cell lines. OSE and serous carcinoma-derived cell lines were fractionated using chromatofocusing at 0.15 pI intervals. Each of these fractions was automatically collected and continuously injected into the second dimension column using NPS RP-HPLC. The result is a virtual 2-D UV map profile that displays the pI versus hydrophobicity of the protein expression for the whole cell lysate. Protein detection in the first dimension step is performed using UV absorption at 280 nm and 214 nm in the second dimension. As an example, the profile of serous carcinoma HOC-1 is shown in Fig. 1. In Fig. 1,
4.0 mg or 2.0 x107 cells were loaded onto the first dimension chromatofocusing column. In this figure, each lane corresponds to a different pI value, and the bands correspond to the hydrophobicity as generated by the percentage of acetonitrile on the HPLC gradient at that pI. For each map, a total of 28 pI fractions have been mapped that correspond to a pH range of 4.118.32. The more acidic and basic fractions are not shown here because few proteins were detected. Many of the RP-HPLC fractions obtained after chromatofocusing contained as many as 60100 proteins. As a representative example the different proteins separated by NPS RP-HPLC from the pI fraction 4.855.00 of sample DOV13 are shown in Fig. 2. It is estimated that each sample was fractionated into over 1500 protein bands using the 2-D liquid mapping method.
|
|
|
The peak patterns for each pI fraction of each cell line were aligned using a dynamic programming technique. The samples were transformed for reasonable statistical analysis. The transform first subtracts the base line of the curve so that a flat base line at zero was obtained; next each point was taken as the natural log. A difference of 1 unit on the y axis corresponds to a roughly 2.7-fold difference in the UV readout. Two original data sets of samples HEY and PEO1, fraction pH 4.554.70, gradient range from 41.5 to 49.5% are shown in Fig. 4A. The index of data pairs aligned together is used as x axis units, and the converted intensity of each peak as we described above is used as y axis units. The data sets that were aligned using this method are shown in Fig. 4B. It is seen that the two traces are not well aligned originally, but after stretching of the profile, the alignment is much improved. The chromatographic profiles are properly aligned to compensate for minor drifts in retention times. These small retention time shifts may be due to changes in the columns during use, minor changes in mobile phase composition, drift in the instrument, interaction between analytes, etc. This alignment technique compensates for these drifts and allows comparisons of protein bands in different cell lines for large numbers of samples.
|
|
|
To quantify differences between the immortalized OSE lines (IOSE-144 and HOSE-A), ovarian carcinoma-derived lines (ES-2, MDAH-2774, and TOV-112D), and breast epithelial (AT1, CA1a, and CA1d) cell line groups, the pairwise correlation coefficients were calculated using normalized data between 24 pI ranges that were aligned separately for each pair of samples, and the Pearson correlation coefficients were calculated between the resulting aligned values. These were averaged for all distinct sample pairs within each of these three groups to produce a single within-group correlation coefficient for each group. Similarly for every pair among these groups, correlation coefficients for every pair of samples spanning the two groups were averaged to produce a single between-group average correlation coefficient. These within-group and between-group correlation coefficients were prepared separately for each fraction and also averaged across the fractions to provide an overall summary. It is interesting that the actual correlation numbers are such that correlation between the breast and ovarian clusters is greater than that of the ovarian carcinoma and OSE clusters, which in turn is greater than that of the breast and OSE clusters as represented in the dendrograms. It would be expected that the breast cancer and OSE clusters would have the lowest correlation.
There is some information on the molecular relationship of these cell lines based upon gene expression profiles in prior work (4).4 There are some similarities in terms of the cell lines that clustered together compared with the protein mapping but also very distinct differences. ISOE-144, IOSE-80, and HOSE-A clustered together by gene expression, but IOSE-80 distinctly did not cluster with these lines in the protein maps. Based on the mRNA expression data, ES-2 clear cell line clustered with the OSE samples, whereas MDAH-2774 did not cluster with ES-2 or TOV-112D but rather with the serous carcinoma cell lines. TOV-112D appeared to cluster by itself in the gene expression arrays, whereas in the protein maps it clearly clustered with MDAH-2774 and ES-2. HEY and ES-2 were found to cluster in the gene expression data (4),4 but HEY clustered with PEO1 in the protein expression data; this makes sense because these are both serous carcinomas. OVCA429 and OVCA433 were closely clustered by gene expression as in the case of the protein mapping, and OVCA432 was reasonably closely linked in both cases. OVCA420 was not closely linked to the OSE cell lines as in the protein expression maps. Nevertheless it is not surprising that the gene expression and protein expression clusters provided different information because in prior work (26) it was found that the gene expression and protein expression for several cell lines had a poor correlation. Such poor correlation has been observed in other studies (27) and is due to the fact that many of the mRNA messages do not translate into proteins (28) or the proteins produced are short lived or misfolded and rapidly degraded (29). Ultimately, however, it is the protein expression that determines the function of the cell so that the relationships as determined by protein mapping will be essential for searching for distinctive markers of cancer.
An important capability of the 2-D mapping technique is the use of proteomic patterns for classification using common marker bands in the comparison of different clusters of cell lines. The peak retention time and intensity for each band can be obtained using the Beckman software. As an example the average number of peaks for 11 samples for fraction pI 7.577.72 is 67. Based on the protein expression map, proteins can be classified into three groups. One group of proteins is likely to be common to most cell types. For this fraction, we found 12 proteins of all cells have the same retention time and are likely to be the same. A second set of proteins appears to be linked to one group of cell lines only. This set may provide the basis for detection and classification of serous carcinoma and have the potential to provide identifying biomarkers. A third group of proteins appears to be expressed uniquely on each individual cell line. It is possible to hypothesize that this third group of proteins is responsible for unique aspects of cell behavior.
To identify markers of groups of serous carcinoma cell lines, standardization and alignment of bands were performed, and then comparisons were made separately at each hydrophobicity level between two groups of samples. A differentially expressed band is selected on the basis of having at least a 4-fold different mean level within the two groups of samples. In addition, at least 25 consecutive hydrophobicity levels were required to meet these conditions for the band to be considered as a marker. Fig. 7 shows peak patterns for two cluster samples: one group as OVCA429, OVCA433, and DOV13 and the other group as IOSE-80, PEO1, and HEY. The image is displayed in a format with each different sample on the x axis and hydrophobicity on the y axis. The relative intensities of the band are quantitatively proportional to the amount of corresponding protein detected by UV absorption. The three groups of bands are only observed in the group of IOSE-80, PEO1, and HEY but not in the group of OVCA433, OVCA429, and DOV13. The use of differential analysis allows us to identify proteins that may be common bands for classification and limits the number of proteins that might require further analysis by mass spectrometric techniques.
|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
* This work was supported in part by NCI, National Institutes of Health Grants R01CA10010 (to D. M. L. and K. R. C.) and R01CA90503 (to D. M. L.) and National Institutes of Health Grant R01GM49500 (to D. M. L.). Support was also generously provided by Eprogen, Inc. ![]()
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Published, MCP Papers in Press, September 2, 2005, DOI 10.1074/mcp.T500023-MCP200
1 The abbreviations used are: 2-D, two-dimensional; OSE, ovarian surface epithelial; RP, reversed-phase; CF, chromatofocusing; NPS, nonporous; SB, start buffer; EB, elute buffer; HPCF, high performance chromatofocusing. ![]()
3 D. M. Lubman and H. Kim, unpublished results. ![]()
4 K. R. Cho, K. A. Shedden, D. R. Schwartz, and R. Wu, unpublished results. ![]()

To whom correspondence should be addressed: Dept. of Chemistry, The University of Michigan, Ann Arbor, MI 48109-1055. Tel.: 734-764-1669; Fax: 734-615-8108; E-mail: dmlubman{at}umich.edu
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Y. Guo, Z. Fu, and J. E. Van Eyk A Proteomic Primer for the Clinician Proceedings of the ATS, January 1, 2007; 4(1): 9 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. McDonald, S. Sheng, B. Stanley, D. Chen, Y. Ko, R. N. Cole, P. Pedersen, and J. E. Van Eyk Expanding the Subproteome of the Inner Mitochondria Using Protein Separation Technologies: One- and Two-dimensional Liquid Chromatography and Two-dimensional Gel Electrophoresis Mol. Cell. Proteomics, December 1, 2006; 5(12): 2392 - 2411. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||