Discordant Protein and mRNA Expression in Lung Adenocarcinomas *

The relationship between gene expression measured at the mRNA level and the corresponding protein level is not well characterized in human cancer. In this study, we compared mRNA and protein expression for a cohort of genes in the same lung adenocarcinomas. The abundance of 165 protein spots representing 98 individual genes was analyzed in 76 lung adenocarcinomas and nine non-neoplastic lung tissues using two-dimensional polyacrylamide gel electrophoresis. Specific polypeptides were identified using matrix-assisted laser desorption/ionization mass spectrometry. For the same 85 samples, mRNA levels were determined using oligonucleotide microarrays, allowing a comparative analysis of mRNA and protein expression among the 165 protein spots. Twenty-eight of the 165 protein spots (17%) or 21 of 98 genes (21.4%) had a statistically significant correlation between protein and mRNA expression (r > 0.2445; p < 0.05); however, among all 165 proteins the correlation coefficient values (r) ranged from −0.467 to 0.442. Correlation coefficient values were not related to protein abundance. Further, no significant correlation between mRNA and protein expression was found (r = −0.025) if the average levels of mRNA or protein among all samples were applied across the 165 protein spots (98 genes). The mRNA/protein correlation coefficient also varied among proteins with multiple isoforms, indicating potentially separate isoform-specific mechanisms for the regulation of protein abundance. Among the 21 genes with a significant correlation between mRNA and protein, five genes differed significantly between stage I and stage III lung adenocarcinomas. Using a quantitative analysis of mRNA and protein expression within the same lung adenocarcinomas, we showed that only a subset of the proteins exhibited a significant correlation with mRNA abundance.

Lung cancer is the leading cause of cancer death for both men and women in the United States. Adenocarcinomas of the lung comprise ϳ40% of all new cases of non-small cell lung cancer and are now the most common histologic type. Functional genomics, broadly defined as the comprehensive analysis of genes and their products, have become a recent focus of the life sciences (1). Application of these approaches to lung adenocarcinomas has the potential to aid in the identification of high risk patients with resectable early stage lung cancer that may benefit from adjuvant therapy, as well as to identify new therapeutic targets. In human lung cancer, however, little is currently understood regarding the relationship between gene expression as determined by measuring mRNA levels and the corresponding abundance of the protein products.
A number of powerful techniques for analysis of gene expression have been used including differential display (2), serial analysis of gene expression (3), DNA microarrays (4), and proteomics via two-dimensional polyacrylamide gel electrophoresis and mass spectrometry (5). Bioinformatics tools have also been developed to help determine quantitative mRNA/protein expression profiles of all types of cells and tissues (6) and now can be applied to benign and malignant tumors. DNA microarrays (cDNA and oligonucleotide) permit the parallel assessment of thousands of genes and have been utilized in gene expression monitoring (7), polymorphism analysis (8), and DNA sequencing (9). Recent studies have focused on classification or identification of subgroups of lung tumors using DNA microarrays (10,11). The use of mRNA expression patterns by themselves, however, is insufficient for understanding the expression of protein products, as additional post-transcriptional mechanisms, including protein translation, post-translational modification, and degradation, may influence the level of a protein present in a given cell or tissue. Proteomic analyses, a complementary technology to DNA microarrays for monitoring gene expression, involves protein separation and quantitative assessment of protein spots using 2D 1 -PAGE and protein identification using mass spectrometry. By combining proteomic and transcriptional analyses of the same samples, however, it may be possible to understand the complex mechanisms influencing protein expression in human cancer.
In this study, we determined mRNA and protein levels for 165 proteins (98 genes) in 76 lung adenocarcinomas and nine non-neoplastic lung tissues. Protein levels were determined using quantitative 2D-PAGE analysis, and the separated protein polypeptides were identified using matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). The corresponding mRNA levels for the identified proteins within the same samples were determined using oligonucleotide microarrays. Correlation analyses showed that protein abundance is likely a reflection of the transcription for a subset of proteins, but translation and post-translational modifications also appear to influence the expression levels of many individual proteins in lung adenocarcinomas.

EXPERIMENTAL PROCEDURES
Tissues-Fifty-seven stage I and 19 stage III lung adenocarcinomas, as well as nine non-neoplastic lung tissue samples, were used for protein and mRNA analyses. Patient consent was obtained, and the project was approved by the Institutional Review Board. All tissues were obtained after resection at the University of Michigan Health System between May 1991 and July 1998. Tissues were all snap-frozen in liquid nitrogen and then stored at Ϫ80°C. The patients included 46 females and 30 males ranging in age from 40.9 to 84.6 (average 63.8) years. Most patients (66/76) demonstrated a positive smoking history. Sixty-one tumor samples were classified as bronchial-derived, 14 were classified as bronchoalveolar, and one had both features. Eighteen tumor samples were classified as well differentiated, 38 were classified as moderate, and 19 were classified as poorly differentiated adenocarcinomas. Hematoxylin-stained cryostat sections (5 m), prepared from the same tumor pieces to be utilized for protein and mRNA isolation, were evaluated by a pathologist and compared with hematoxylin-and eosin-stained sections made from paraffin blocks of the same tumors. Specimens were excluded from analysis if they showed unclear or mixed histology (e.g. adenosquamous), tumor cellularity less than 70%, potential metastatic origin as indicated by previous tumor history, extensive lymphocytic infiltration, or fibrosis or if the patient had received prior chemotherapy or radiotherapy.
Oligonucleotide Array Hybridization-The HuGeneFL oligonucleotide arrays (Affymetrix, Santa Clara, CA) containing 6800 genes were used in this study. Total RNA was isolated from all samples using Trizol reagent (Invitrogen). The resulting RNA was then subjected to further purification using RNeasy spin columns (Qiagen). Preparation of cRNA, hybridization, and scanning of the HuGeneFL arrays were performed according to the manufacturer's protocol (Affymetrix, Santa Clara, CA). Data analysis was performed using GeneChip 4.0 software. The gene expression profile of each tumor was normalized to the median gene expression profile for the entire sample. Details of data trimming and normalization are described elsewhere (11).
2D-PAGE and Quantitative Protein Analysis-Tissue for both protein and mRNA isolation came from contiguous areas of each sample. Protein separation using 2D-PAGE, silver staining, and digitization were performed as described previously (12,13). Our 2D-PAGE system allows us to run 20 gels at one time (one batch). Spot detection and quantification were accomplished utilizing Bio Image Visage System software (Bioimage Corp., Ann Arbor, MI). The integrated intensity of each spot was calculated as the measured optical density units ϫ mm 2 . Of the total possible 2000 spots detectable on each gel, 820 spots on the gel of each sample were matched using a Gel-ed match program with the same spots on a chosen "master" gel. In each sample, 250 ubiquitously expressed reference spots were used to adjust for variations between gels, such as that created by subtle differences in protein loading or gel staining. Slight differences because of batch were corrected after spot-size quantification.
Mass Spectrometry and 2D Western Blotting-Preparative 2D gels were run using extracts from A549 lung adenocarcinoma cells (obtained from ATCC) and using the identical experimental conditions as the analytical 2D gels, except 30% more protein was loaded. The resolved protein gels were silver-stained using successive incubations in 0.02% sodium thiosulfate for 2 min, 0.1% silver nitrate for 40 min, and 0.014% formaldehyde plus 2% sodium carbonate for 10 min. For protein identification, protein polypeptides underwent trypsin digestion followed by MALDI-MS using a MALDI-TOF Voyager-DE mass spectrometer (Perseptive Biosystems, Framingham, MA). The masses were compared with known trypsin digest databases using the MS-FIT database (University of California, San Francisco; prospector.ucsf.edu/ucsfhtml3.2/msfit.htm). Some of the polypeptides included in the analysis had been identified prior to this study on the basis of sequencing (14). The identified protein spots used in this paper are shown in Fig. 1A. The method for 2D-PAGE Western blot verification was as described previously (15). The 2D Western blots of GRP58 and Op18 are shown in Fig. 1, C and E; the others, such as GRP78, GRP75, HSP70, HSC70, KRT8, KRT18, KRT19, Vimentin, ApoJ, 14 -3-3, Annexin I, Annexin II, PGP9.5, DJ-1, GST-pi, and PGAM, are described elsewhere. 2 Statistical Analysis-Missing values were replaced with the mean value of the protein spot. The transform x 3 log (1 ϩ x) was applied to normalize all protein expression values. The relationship between protein and mRNA expression levels within the same samples was examined using the Spearman correlation coefficient analysis (16). To identify potentially significant correlations between gene and protein expression, we used an analytical strategy similar to SAM (significance analysis of microarrays) (17), which uses a permutation technique to determine the significance of changes in gene expression between different biological states. To obtain permuted correlation coefficients between gene and protein expression, genes were exchanged first in such a way that permutated correlation coefficient were calculated based on pseudo pairs of genes and proteins. The distribution of permutated correlation coefficients became stable after 60 permutations. This procedure was then repeated 60 times to obtain 60 sets of permutated correlation coefficients. For each of the 60 permutations, the correlations of genes and proteins were ranked 2 Chen et al., submitted for publication. such that p (i) denotes the ith largest correlation coefficient for pth permutation. Hence, the expected correlation coefficient, E (i), was the average over the 60 permutations, E (i) ϭ ⌺ p ϭ 1 60 p (i)/60. A scatter plot of observed correlations ((i)) versus the expected correlations is shown in Fig. 2D. For this study, we chose threshold ⌬ ϭ 0.115 so that correlation would be considered significant if absolute value of difference between (i) and E (i) was greater than the threshold. Twenty-nine (including one with observed correlation coefficient Ϫ0.4672) of 165 pairs of gene and protein expression were called significant in such criteria, and the permuted data generated an average of 5.1 falsely significant pairs of gene and protein expression. This provided an estimated false discovery rate (the percentage of pairs of gene and protein expression identified by chance) for our data set.

Correlation of Individual Proteins and mRNA Expression within Each Tumor-We have examined quantitatively 165
protein spots on 2D gels representing 98 genes and compared protein levels with mRNA levels for a cohort of 85 lung adenocarcinomas and normal lung samples. Of the 165 protein spots, 69 proteins were represented by only one known spot on 2D gels for an individual gene, whereas 96 protein spots showed multiple protein products from 29 different genes. 2D Western blotting verified the proteins identified by mass spectrometry when specific antibodies were available. Spearman correlation coefficients of the proteins and their associated mRNA for each protein spot were generated using all 76 lung adenocarcinomas and nine non-neoplastic lung tissues (see Tables I and II, and see Figs. 1 and 2). The correlation coefficients (r) ranged from Ϫ0.467 to 0.442 (Fig.  2D). A total of 28 protein spots (21 genes) were found to have a statistically significant correlation between expression of  (Table  I), nine genes (9/69, 13%) were observed to show a statistically significant relationship between protein and mRNA abundance (r Ͼ 0.2445; p Ͻ 0.05). The proteins whose expression levels were correlated with their mRNA abundance included those involved in signal transduction, carbohydrate metabolism, apoptosis, protein post-translational modification, structural proteins, and heat shock proteins (Table III).  (spots 1492, 1493, and 1494) showed a statistically significant correlation between their protein and mRNA abundance (r ϭ 0.3234, 0.3154, and 0.4003, respectively). The forth isoform (spot 1488) showed no correlation be-tween protein and mRNA expression (r ϭ 0.0495). Similarly, just one of five quantified isoforms of cytokeratin 8 (spot 439) demonstrated a statistically significant correlation between protein and mRNA abundance (r ϭ 0.3049; p Ͻ 0.05) (Table II).
In addition to differences in the relationship between mRNA levels and protein expression among separate isoforms, some genes with very comparable mRNA levels showed a 24-fold difference in their protein expression. Genes with comparable protein expression levels also showed up to a 28-fold variance in their mRNA levels.

Lack of Correlation for mRNA and Protein Expression when Using Average Tumor Values across All 165 Protein Spots (98 Genes)-
The relationship between mRNA and protein expression was also examined by using the average expression values for all samples. To analyze this relationship using this approach, the average value for each protein or mRNA was generated using all 85 lung tissue samples. The range of normalized average protein values ranged from Ϫ0.0646 to 0.0979 (raw value 0.0036 to 4.1947), and the range for mRNA was from 0 to 15260.5 for all 165 individual protein spots. The Spearman correlation coefficient for the whole data set (165 protein spots/98 genes) was Ϫ0.025 (Fig. 3A). Even for the 28 protein spots (Fig. 2D) that were found to have a statistically significant correlation between their mRNA and protein, use of the average value resulted in a correlation coefficient value of Ϫ0.035, which was not significant (Fig. 3B).
Lack of a Relationship between Protein/mRNA Correlation Coefficients and Average Protein Abundance-To determine whether an absolute protein level might influence the correlation with mRNA, the mean value of each protein (relative abundance) and the Spearman protein/mRNA correlation coefficients among all 85 samples were examined. No relationship between the protein abundance and the correlation coefficients was observed (r ϭ 0.039; p Ͼ 0.05). A detailed analysis of separate subsets of proteins with differing levels of abundance (less than Ϫ0.0014, larger than Ϫ0.0014, or larger than 0.0077) also showed a lack of correlation between mRNA and protein expression among the 83 (50%), 82 (50%), and 41 (25%) of 165 total protein spots, respectively (r ϭ 0.016, 0.08, and 0.172, respectively).
Stage-related Changes in the Protein/mRNA Correlation Coefficients-To determine whether the 21 genes (28 protein spots) showing a significant correlation between the protein and mRNA expression among all samples demonstrate changes in this relationship during tumor progression, the correlations were examined separately for stage I (n ϭ 57) and stage III (n ϭ 19) lung adenocarcinomas (Table III). The number of non-neoplastic lung samples (n ϭ 9) was insufficient for a separate correlation analysis of this group. Many of the protein spots represent one of several known protein isoforms for a given gene. The majority of genes (16/21) did not differ in the protein/mRNA correlation between stage I and stage III tumors indicating a similar regulatory relationship between the mRNA and protein spot. GRP-58, PSMC, SOD1, TPI1, and VIM, however, were found to demonstrate significant differences in the correlation coefficients between stage I and stage III lung adenocarcinomas. For GRP-58, PSMC, and VIM the change in the correlation coefficient was because of a relative increase in protein expression in stage III tumors. For SOD and TPI the change resulted from a relative decrease in expression of this specific protein in stage III tumors. one or several protein products (18). Celis et al. (19) found a good correlation between transcript and protein levels among 40 well resolved, abundant proteins using a proteomic and microarray study of bladder cancer. By comparing the mRNA and protein expression levels within the same tumor samples, we found that 17% (28/165) of the protein spots (21/98 genes) show a statistically significant correlation between mRNA and protein. These proteins appear to represent a diverse group of gene products and include those involved in signal transduction, carbohydrate metabolism, protein modification, cell structure, heat shock, and apoptosis. These results suggest that expression of this subset of 165 proteins is likely to be regulated at the transcriptional level in these tissues. The majority of the protein isoforms, however, did not correlate with mRNA levels, and thus their expression is regulated by other mechanisms. We also observed a subset of proteins that demonstrated a negative correlation with the mRNA expression values; for example ␣-haptoglobin demonstrated a strong negative correlation with its mRNA expression values. This may reflect negative feedback on the mRNA or the protein or the presence of other regulatory influences that are not understood currently.
Post-translational modification or processing will result in individual protein products of the same gene migrating to different locations on 2D-PAGE gels (20). Because the identity of all possible isoforms for each protein examined has not been characterized completely, this may influence the correlation analyses performed in this study. This is partly because of limitations of the 2D-PAGE and mass spectrometry technologies (21,22). Potential inconsistencies between mRNA and protein correlations that have been reported may also be because of differences, even in the same gene, in the mech-anisms of protein translation among different cells or as measured in different laboratories (23).
In this study, we examined 165 protein spots identified in lung adenocarcinomas. Ninety-six protein spots, representing the products of 29 genes, contained at least two protein isoforms. Nineteen of 96 protein spots, representing 12 genes, were shown to have a statistically significant correlation between their protein and mRNA expression, suggesting that the levels of these proteins reflects the transcription of the corresponding genes. Differences in protein/mRNA correlations were found among the individual isoforms of a given protein. For example, of the four OP18 isoforms, three showed a statistically significant correlation between the protein and mRNA expression levels. The lack of relationship for the one isoform, however, indicates that individual protein isoforms of the same gene product can be regulated differentially. This is not unexpected and likely reflects other post-translational mechanisms that can influence isoform abundance in tissues and cancer.
In addition to the analyses of the correlation of mRNA/ protein within the same tumor samples, we also tested the global relationship between mRNA and the corresponding protein abundance across all 165 protein spots in the lung samples. A protein and mRNA average value for each gene was generated using all 85 lung tissues samples. We observed a very wide range of normalized average protein and mRNA values. The correlation coefficient generated using this average value data set was Ϫ0.025, and even for the 28 protein spots that showed a statistically significant correlation between individual mRNA and proteins, the correlation value was only Ϫ0.035. This suggests that it is not possible to predict overall protein expression levels based on average  (25), who examined 106 genes in yeast. Both studies found a lack of correlation between mRNA and protein expression when average or overall levels were used. A good correlation was reported when the 11 most abundant proteins were examined in yeast (25), suggesting that the level of protein abundance may be a factor that may influence the correlation between mRNA and protein. In the present study, a fairly wide range of mean protein values among 165 protein spots in lung adenocarcinomas was observed, and the correlation coefficients also varied from Ϫ0.467 to 0.442.
A comparison between the mean value of each protein and the correlation coefficient generated using all 85 tissue samples did not reveal a strong relationship between the overall protein abundance and the correlation coefficients (r ϭ 0.039; p Ͼ 0.05). Detailed analysis of different subsets of protein abundance also failed to show a correlation between mRNA and protein expression. Thus in contrast to yeast, a relationship between mRNA/protein correlation coefficient and protein abundance in human lung adenocarcinomas was not observed.
The results of this study indicate that the level of protein abundance in lung adenocarcinomas is associated with the corresponding levels of mRNA in 17% (28 proteins) of the total 165 protein spots examined. This was substantially higher than the amount predicted to result by chance alone (which was 5.1) and suggests that a transcriptional mechanism likely underlies the abundance of these proteins in lung adenocarcinomas. We also demonstrate that the expression of individual isoforms of the same protein may or may not correlate with the mRNA, indicating that separate and likely post-translational mechanisms account for the regulation of isoform abundance. These mechanisms may also account for the differences in the correlation coefficients observed between stage I and stage III tumors, indicating that specific protein isoforms show regulatory changes during tumor progression. Further studies in lung adenocarcinomas will examine the relationship between the expression of individual protein isoforms and specific clinical-pathological features of these tumors, such as the presence of angiolymphatic invasion, and nodal or pleural surface involvement. The potential to identify specific protein isoforms associated with biological behavior in lung adenocarcinomas would be of considerable interest and will add to our understanding of the regulation of gene products by transcriptional, translational, and post-translational mechanisms.