Gene Expression Analyzed by High-resolution State Array Analysis and Quantitative Proteomics

The transcriptome provides the database from which a cell assembles its collection of proteins. Translation of individual mRNA species into their encoded proteins is regulated, producing discrepancies between mRNA and protein levels. Using a new modeling approach to data analysis, a striking diversity is revealed in association of the transcriptome with the translational machinery. Each mRNA has its own pattern of ribosome loading, a circumstance that provides an extraordinary dynamic range of regulation, above and beyond actual transcript levels. Using this approach together with quantitative proteomics, we explored the immediate changes in gene expression in response to activation of a mitogen-activated protein kinase pathway in yeast by mating pheromone. Interestingly, in 26% of those transcripts where the predicted protein synthesis rate changed by at least 3-fold, more than half of these changes resulted from altered translational efficiencies. These observations underscore that analysis of transcript level, albeit extremely important, is insufficient by itself to describe completely the phenotypes of cells under different conditions.

Genome-wide analysis of gene expression generally involves quantitative surveys of the transcriptome by such powerful technologies as interrogation of microarrays or serial analysis of gene expression (SAGE), which provide predictors of physiological state or cellular phenotype. Indeed, genomescale transcript analysis has produced clear successes; e.g. discovery of co-regulated networks of transcripts in yeast (1)(2)(3)(4)(5), intracellular locations of mRNAs (6,7), and clinically valuable phenotypes of cancer cells (8).
Despite the powerful information obtained, documentation of a transcriptome provides only the inventory that is available for a cell to draw upon for translation under certain physiological conditions. Proteins are the effectors of cell phenotype, and their levels and activities do not necessarily correlate with mRNA levels (9 -12). One source of this lack of correlation is discrepancies in protein half-lives. A second arises from the fact that the synthesis of individual protein species is regulated, not only by transcript level, but by cis elements that confer unique translational properties on individual mRNA molecules (13). In light of this latter point, it would be valuable to supplement mRNA expression patterns with estimates of translation efficiencies of individual transcripts. Translation occurs on ribosomes, and variable numbers of ribosomes are loaded onto actively translated mRNAs, forming polysomes of various sizes. The density of ribosome packing on transcripts is proportional to the rate of synthesis of the protein products (14,15). Several groups have carried out polysome fractionation prior to transcript analysis (16 -20), but in these approaches fractions from sucrose-gradient centrifugation were combined into pools before transcript analysis, thereby losing the rich information associated with the mRNA distributions across these fractions. In the present study, we undertook a "high resolution" analysis of transcript distributions across these polysome profiles, defining ribosome-loading parameters for each detectable mRNA species. Selective, quantitatively significant changes in translation of the yeast transcriptome were found in response to the mating pheromone ␣-factor. Groups of functionally related genes were found to be co-regulated by a combination of altered transcript levels and translational efficiencies; these observations were supplemented by proteome analysis.
Polysome Fractionation and RNA Isolation-Polysomes were prepared and fractionated by modification of a previously described method (21,22) that employed detergent extraction of cell lysates and centrifugation through sucrose gradients in high salt to disrupt inactive 80S monomers (see supplemental materials).
After centrifugation (see Fig. 2a), fractions from identical gradients were pooled on ice and adjusted to 0.5% SDS after addition of three artificial mRNAs to monitor recovery and integrity after purification (Universal ScoreCard control poly(A) ϩ -RNAs; Amersham Biosciences, Sunnyvale, CA). These "utility control" RNAs were added at 0.075, 0.75, or 15 pg per pooled fraction before precipitation of RNA at Ϫ70°C with 2.5 volumes 100% ethanol. RNA samples from each of the 25 pooled fractions were purified using Qiagen RNeasy midikits, and the eluted RNA was precipitated with 1/10 volume of 10 M LiCl, resuspended in 25 l of RNase-free H 2 O, quantitated by absorbance at 260 nm, and stored at Ϫ70°C.
For total unfractionated RNA, 600 ml of cells from the same culture as the polysome preparation were pelleted, washed with ice-cold RNase-free water, quick frozen, then subsequently resuspended in lysis buffer, processed to the clarified lysate stage, and ethanolprecipitated, as above. The resulting pellet was resuspended in water, and total RNA was isolated using Qiagen midi-columns followed by LiCl precipitation. The RNA samples were pooled in a final volume of 1 ml of RNase-free H 2 O and the A 260 was measured.
Specific transcripts were quantitated with quantitative real-time PCR (Q-PCR) 1 using an iCycler (Bio-Rad, Hercules, CA) and SYBRgreen detection of products (according to Bio-Rad's specifications). Primers for the reactions were designed to amplify regions of 75-200 nucleotides near the 3Ј end of the coding region. Q-PCR data were normalized with one of the artificial utility control RNAs added before purification of RNA in the gradient fractions (see above).
Microarray Hybridization and Quantitation-For target generation, 80 g of total RNA from the peak fraction in the polysome region of the gradient and the equivalent volume from all other fractions was converted to fluorescently labeled cDNA using Cy3-labeled dCTP (Amersham Pharmacia Biotech, Uppsala, Sweden) and primed with oligo-dT 25 with a G/C/A 3Ј anchor as described previously (20). For each reaction, 31 l of the "Test spike mix" of Universal ScoreCard control RNAs was added to monitor labeling and hybridization efficiencies, as well as to generate a standard curve determining the linear range of the signal intensity. Similarly, 2 mg of unfractionated total RNA was converted to Cy5-labeled cDNA target after addition of the "Reference spike mix" of Universal ScoreCard controls. The yield of the Cy3-labeled target for the peak fraction was 52 pmol of product; for each fraction, the entire Cy3-labeled target was mixed with 31 pmol of Cy5-labeled target from unfractionated RNA, then taken to dryness and resuspended in 80 l of hybridization buffer (50% formamide, 5ϫ SSC, 5ϫ Denhardt's solution, 0.1% SDS).
Custom yeast high-density microarrays spotted with PCR-amplified genomic DNAs corresponding to open reading frames (ORFs), with negative control artificial cDNAs, and with artificial cDNAs for the Universal ScoreCard control RNAs were produced by the Center for Expression Array Analysis at the University of Washington (ra.microslu.washington.edu). Each slide represents two arrays of ϳ6000 ORFs comprising nearly the entire yeast genome.
The hybridization mixes of Cy3-and Cy5-labeled targets were heated at 96°C for 3 min, cooled on ice for 30 s, and spun briefly. For each fraction, 40 l was applied to each of two washed slides, followed by hybridization and washing as described previously (20).
Normalization and Analysis of Microarray Data-Initial processing of the raw data included correction based on negative control spots, calibration with internal standards, RNA recovery correction by utility controls, as well as filtering those genes of low abundance or with inconsistent Cy3:Cy5 ratios, as detailed below: 1. Correction by negative controls. All images with intensity values below those for the negative control cDNA spots were transformed to be equivalent to the negative controls. 2. Calibration. To correct for possible array-specific effects, standard curves were generated for each array from 12 replicates of each of 10 standards in the calibration controls added at different concentrations, as provided by Amersham Bioscience. By linear interpolation of signal intensity, we calculated each image's expression level relative to the absolute amounts of the artificial calibration controls on the arrays. In the plus ␣-factor dataset, two arrays from fraction 4 of the sucrose gradient were deleted because of poor quality. 3. Recovery correction. After calibration, the expression levels of the utility controls (added in known amounts to each gradient fraction before RNA purification) were used to determine mRNA recovery and to adjust and standardize the Cy3-labeled target population used on each array. 4. Filtering of genes with low expression values. The same amount of Cy5-labeled target made from unfractionated RNA was used for each hybridization, and the image intensity reflects the relative abundance of individual transcripts. Using a paired t test with a significance level of 5%, we filtered out those mRNAs when a majority of the 100 Cy5 intensity measurements were lower than the low boundary of the linear range of the Cy5 calibration curve. 5. Filtering of inconsistent ratios. For every gradient fraction, there are four replicate arrays and hence four repeated measurements of the Cy3:Cy5 ratios. When Cy5 values are low, ratios for specific mRNAs could become unstable and vary radically, particularly for the first few fractions where both RNA concentrations and recoveries are lower than for later fractions. Using the assumptions that repeatedly measured ratios are largely consistent and do not vary substantially between two adjacent fractions, we adopted two filtering strategies: 1) when one of the four measurements is a significant outlier, it is filtered out in computing the averaged signal, and 2) when two out of four ratios are extremely high for fraction 1 compared with the other two ratios, the average of the two low ratio values is used. Ratios in the first fraction are expected to be generally low, as it is unlikely that authentic transcripts would sediment in this fraction under these centrifugation conditions. Also, the Cy5 expression values for fractions 2 and 5 were low compared with the other Cy5 signals across the gradient; we therefore used the averaged Cy5 values from the two neighboring fractions for these calculations.
Assumptions and Derivation of the Multiple Peak Model (MPM)-Transcripts with varying numbers of ribosomes, ranging from 0 up to an estimated 25 ribosomes, have different quantized weights, due to the large size of a ribosome relative to an mRNA. Using sucrosegradient centrifugation and fractionation, we can separate this disparate pool of ribosome-bound mRNAs by weight, then identify and quantify each mRNA species by microarray technology. The individual mRNA profiles across the gradient provide the weight distribution for that molecule and hence insight on its translational properties.
The mathematical objective here is to model the weight distributions of all molecules. Let Y jk denote the abundance measurement for the jth transcript in the kth fraction. Let f k ϭ k denote the fraction number, from 1 to 25. The fraction number is indicative of the weight of the mRNA-ribosome complex, which is denoted by w k . Conceptually, experimental data of relevance may be represented as in supplemental Table 1S. The abundance measurement Y jk is the averaged Cy3:Cy5 ratio for the four replicate arrays. Data have been normalized following the protocol described above. Because the modeling process is the same for all individual transcripts, we drop the subscript j for simplicity hereafter, unless noted otherwise.
To facilitate modeling, we make the following assumptions. First, for each mRNA, the mixture of mRNA complexes has at most four different particles: mRNP particles, monosomes, polysomes, and an uncharacterized, rapidly sedimenting component. The second assumption is that the weight of a specific particle distributes as a Gaussian distribution, centering on the actual weight of the particle population. This distribution results, at least in part, from diffusion theory. Let R take value 1, 2, 3, and 4 and denote four different populations of particles. Under this assumption, the distribution of abundance given the particle population R may be written by where R is the mean and R 2 measures the spread of the peak. Under these two assumptions, one can derive models for the observed expression levels in the kth fraction interval via the following steps. First, the abundance in the kth fraction is proportional to the total number of particles in the interval, where the summation is over all particles, indicator I(w k Յ D i Ͻ w kϩ1 ) for the particle in the kth fraction, N represents the total number of particles, and E[I(w k Յ D i Ͻ w kϩ1 )] represents the expectation. Now the expectation may be computed as where g(D͉R) represents a Gaussian distribution with mean R and with the variance R 2 . Resulting from the above derivation is the model for multiple peaks: where ␤ R is the peak height and quantifies the abundance of each particle, and ⌽(.) is the cumulative distribution function of Gaussian distribution. The above representation, despite being motivated statistically, is actually connected with phenomenological theory of sedimentation processes in the ultracentrifuge (23). In fact, the mixture of Gaussian distributions, with appropriate re-parameterization, could approximate the solutions to the set of partial differential equations used to describe the consequences of sedimentation and diffusion in the ultracentrifuge. Applying the above model to gene expression analysis, we need to take into account mRNA baseline abundance and random fluctuation due to uncontrolled random variations. The revised model may now be written as where ␣ j is the baseline abundance measurement for the jth transcript, and ⑀ jk is a random variable characterizing variations from all other sources. With the least-square technique, we can estimate all of the gene-specific parameters. Using the estimating equation theory, we can estimate standard errors for all estimated parameters.
Comparison of Polysome Profiles Between Microarray Experiments-From the MPM modeling, values were calculated for the percent of each transcript in polysomes and its ribosome density, as well as the relative transcript level (determined from the Cy5 signals). These values in each of the two experiments were then used to calculate the ϩ/ratios (plus ␣-factor to minus ␣-factor) for transcript level, translation efficiency and predicted protein synthetic rate for each mRNA (see Fig. 4).
ICAT Proteome Analysis-Simultaneously with harvesting cells for lysis and polysome analysis (see above), four 25-ml culture samples were harvested and the pellets immediately quick-frozen. Samples were stored at Ϫ80°C until use. Each cell pellet was briefly washed in prechilled phosphate-buffered saline containing 1 mM phenylmethylsulfonylfluoride (Sigma, St. Louis, MO) and then transferred to a 1.7-ml Eppendorf tube. For cell lysis and protein precipitation, trichloroacetic acid (Sigma) was added to a final concentration of 10%, and the sample was incubated on ice for at least 60 min prior to centrifugation at 14,000 ϫ g for 15 min. Resulting trichloroacetic acid precipitates were washed twice with 90% acetone prior to mild desiccation in a -20°C freezer for 20 min. Dried protein samples were resuspended in isotope-coded affinity tag (ICAT) label buffer (6 M urea, 200 mM Tris, pH 8, 5 mM EDTA, 0.05% SDS). Protein concentration was measured by bichionic acid assay (Pierce, Rockford, IL), and experimental and control protein samples were each adjusted to a concentration of 2 mg/ml in a volume of 0.5 ml. Cysteinyl disulfide linkages were reduced by the addition of 5 mM Tris(2-carboxyethyl)-phosphine hydrochloride (Pierce) for 60 min at 37°C. Following addition of 500 g of either isotopically light or heavy cleavable 12/13 C ICAT reagent (Applied Biosystems, Foster City, CA), the protein solutions were incubated at 37°C for 3 h. ICAT labeling was quenched by addition of 12 mM dithiothreitol. Heavy (experimental) and light (control) ICAT-labeled peptide solutions paired for comparison were combined and diluted 10-fold with 20 mM Tris, pH 8.3, 5 mM EDTA, and 20 ng/l sequencing-grade modified trypsin (Fisher Scientific, Pittsburgh, PA) and placed in a 37°C water bath overnight.
ICAT-labeled peptide mixtures were fractionated by strong cation exchange via high-performance liquid chromatography (HPLC) on a polysulfoethyl A column (200 ϫ 4.6 mm, 5 m, 300 Å; PolyLC) using an Integral HPLC instrument (PerSeptive Biosystems, Foster City, CA). Fractions (ϳ30) with highest peptide content as indicated by A 214 measurements were dried without heat under vacuum for 30 min to remove acetronitrile. Avidin chromatography was performed using manual syringe columns (Applied Biosystems) on each selected fraction according to the manufacturer's recommendations. One-dimensional reversed-phase chromatography with on-line mass spectrometry was performed generally as described (24), but employing a 2-h binary gradient from 5 to 80% acetonitrile during which each mass spectrometry (MS) scan was followed by three MS/MS scans.
Tandem MS data were analyzed by Sequest software to determine protein identity and relative quantitation (25). Statistical robustness of peptide identifications was determined using Peptide Prophet software (26,27), and relative quantitation of peptide pairs was further refined using ASAP software (28). Proteomic data was uploaded into the Institute for Systems Biology gene expression platform SBEAMS (Systems Biology Expression Analysis, which can be accessed at db.systemsbiology.net/sbeams/).

Protein Synthesis as a Function of Ribosome
Density-It is generally accepted that the synthetic rate of a specific protein is related to the number of ribosomes actively translating its mRNA. To confirm this relationship, we employed two transcripts of different translational efficiencies dictated by the presence or absence of an inhibitory (20,29) upstream ORF (uORF) (Fig. 1a). Presence of the uORF strongly inhibited synthesis of His3p (Fig. 1b). After correction for mRNA levels, FIG. 1. Correlation between protein synthetic rate from a transcript and ribosome loading. a, diagram of the HIS3-HA reporter system. Plasmids pVW05 (uORF-containing) and pVW06 (no uORF) were derived from pMHY1 and pMHY3 (20,29) by replacing the GCN4-lacZ ORF with the HIS3-HA coding sequence (the S. cerevisiae HIS3 ORF with a 3Ј DNA sequence encoding a C-terminal hemagglutinin epitope). Expression is from the GCN4 promoter and both constructs have the GCN4 3Ј UTR and a modified GCN4 5Ј leader in which 232 nucleotides of the leader are deleted and replaced with a 33-nucleotide sequence that contains the AdoMetDC uORF or the mutated form (A to G mutation of the initiating ATG codon) (29). Deletion of UPF3 in VM1601 was necessary to prevent nonsense-mediated degradation of the mRNA, particularly with the uORF in the leader. b, scan of the pulse-labeled His3-HA protein after immunoprecipitation and electrophoretic separation on a polyacrylamide gel. Lane 1, Vector control; lanes 2 and 3, two independent isolates transformed with pVW05; lanes 4 -6, three independent isolates transformed with pVW06. For labeling, the transformed cells were grown at 30°C in methionine-free minimal-glucose medium with necessary supplements. Culture samples were labeled in 0.3 ml of the same medium containing 0.5-1.0 mC i 35 S-EasyTag EXPRE 35 S 35 S Protein Labeling Mix (Perkin-Elmer, Wellesley, MA) for 5 min at 30°C and processed as described (49). HA-tagged protein was immunoprecipitated with mouse monoclonal antibody HA.11 (Covance Research Products, Berkeley, CA) and adsorbed to Protein G Plus-Agarose (Oncogene Research Products, San Diego, CA). Proteins in the redissolved pellet were separated by electrophoresis on a 15% polyacrylamide gel. c, relative His3-HA protein synthesis and mRNA levels. Labeled His3-HA protein (b) was quantitated with a Storm 840 phosphorimager (Amersham Biosciences). HIS3-HA transcript levels were determined by Q-PCR and normalized to PRP8 and DED1 mRNA levels (n ϭ 20). d, transcript profiles from sucrose gradient sedimentation of yeast transformed with the two reporter transcripts. HIS3-HA mRNA was determined by Q-PCR (in quadruplicate, see legend to Fig. 2), normalized to one of the "utility control" RNAs added to the fractions before RNA purification (see "Experimental Procedures"), and the level in each fraction expressed as percent of total HIS3-HA signal across the gradient. F, pVW05 (with the uORF); E, pVW06 (ATG to GTG mutation of the uORF). Assignment of polysome size (number of ribosomes per mRNA) as a function of sedimentation rate in the gradient was based on the A 254 profile (see Fig. 2).

FIG. 2.
Polysome profiles for individual mRNA species in gradient fractions. a, experimental strategy for high-resolution TSAA. Clarified lysates from steady-state growing strain BY2125 were sedimented through high-salt 7-47% sucrose gradients (21,22) and collected into 25 fractions (see supplemental material for lysis and centrifugation conditions). Polysome profiles were found to vary by less than one-half fraction a 12-fold difference in translational efficiency (Protein:RNA, Fig. 1c) was found between the two transcripts. Cytosol preparations from the same cultures were fractionated by velocity sedimentation through sucrose gradients. The presence of the uORF effectively inhibited ribosome loading so that a majority of the transcript contained only one ribosome or less (Fig. 1d). In the absence of the uORF, most of the mRNA was loaded with five to seven ribosomes. These results are consistent with the accepted relationship between ribosome profiles and protein synthetic rates (see "Discussion").
Transcriptome-wide Analysis of Ribosome Loading-Lysates from cultures of steady-state growing yeast were fractionated by sucrose-gradient centrifugation. The transcripts in the 25 fractions were converted to Cy3-labeled cDNA target and mixed with a constant amount of Cy5-labeled cDNA target from unfractionated RNA for hybridization to microarrays (Fig. 2a). Thus, the Cy3:Cy5 ratios generate profiles across the sucrose gradient for each specific transcript, expressed relative to its abundance in total RNA. The Cy3 and Cy5 signals were normalized to a standard curve and corrected for RNA recovery through the use of exogenously added standards; of the 6022 elements on the microarrays, 4928 had Cy5 signals within the linear range of the standard curve. The Cy3:Cy5 ratios, plotted against fraction number, were analyzed assuming the MPM (see "Experimental Procedures"). This model assumes a transcript to be distributed among four compartments: 1) mRNP particles, 2) monosomes, 3) polysomes, and 4) rapidly sedimenting material at the bottom of the gradient. These assumptions were validated by the fact that 82% (4020) of the individual profiles generated from a steady-state growing culture fit this model with R 2 Ͼ 0.7. Specific examples of the data points and fitted curves are presented in Fig. 2b. It can be seen that gene-specific variations occur in all parameters that describe the profiles, including the relative proportions in the four compartments and the rate of sedimentation of the polysome peak.
For subsequent analyses, 379 transcripts with ORFs fewer than 400 nucleotides in length were omitted, because 60% represented either dubious genes or uncharacterized small ORFs. Peak 4 was also omitted from detailed analysis of the data, because it was not clear whether this material was actually associated with ribosomes. The proportion of tran-scripts in peak 4 averaged less than 10% across the entire transcriptome and showed no significant trend as a function of either transcript abundance or ORF length. Omitting these transcripts did not significantly affect the conclusions drawn here.
For validation, independent sucrose-gradient experiments were performed and mRNA profiles were examined gene-bygene. Of 24 genes examined by Q-PCR or Northern blots, only one failed to validate the array results. The results from four of these genes are shown in Fig. 2b. Given that these results were from independent experiments and analyzed by quite different technologies, the strong agreement demonstrates the robustness of this approach to fractionation and analysis.
The transcriptome of steady-state growing yeast is, on average, well translated with the transcripts averaging nearly 80% association with polysomes (Fig. 3a). However, at the level of individual mRNA species, the proportion of a transcript located in peak 3 (polysomes) varies widely from gene to gene. The association of individual transcripts with polysomes ranges from 0 to 100% (Fig. 3b), with a slight tendency to decrease with the less-abundant transcripts.
The average number of ribosomes associated with a particular transcript (polysome size) is proportional to the rate of synthesis of the encoded protein. Polysome size is dictated by: 1) the length of the ORF and 2) the rate of loading of ribosomes onto an mRNA (translation initiation), relative to their rate of linear progression along the message. ORF length is available from the sequence of the yeast genome, and polysome size was derived from the dataset generated in this study. Ribosome density (number of ribosomes per 1000 nucleotides of ORF length) is quite disperse across the transcriptome (Fig. 3c), with no strong trend as a function of transcript abundance. One curious property of ribosome density, which has also been observed by others (30), is a quite significant decrease with increasing ORF length (more than 4-fold across the distribution, Fig. 3d).
We have also examined the relationship between codon adaptation index (CAI) (31) and ribosome density (supplemental Fig. 1S). Transcripts with more favorable CAI values (greater than 0.2) show a tendency toward higher ribosome densities, consistent with higher translational efficiency. Applying an index value (the AUG CAI or AUG Context Adaptaamong six gradients. The absorbance trace established the positions of mRNAs loaded with 1 to 8 ribosomes, and the positions of higher oligomers were estimated by extrapolation of a curve fit to these points. For genome-wide microarray hybridization, the poly(A)-containing RNA in each fraction was converted to Cy3-labeled cDNA target with reverse transcriptase and mixed with Cy5-labeled target made from unfractionated RNA. Hybidization to the microarrays was carried out with four replicas per fraction. b, top row, Data from microarray analysis and curves fit to the data points by the MPM (see "Experimental Procedures"). Solid line, Cy3:Cy5 ratios calculated from signal intensities. Dotted line, Curve fit to the data points. The R 2 value and relative distribution ("Dist.") (derived from curve fitting) among peaks 1 through 4, respectively, are shown above the graph for each mRNA. For modeling, peak 1 (mRNP particles) was allowed to vary within fractions 1-5, peak 2 (monosomes) was fixed at fraction 7, peak 3 (polysomes) could vary in position within fractions 9 -20, and peak 4 was fixed within fractions 21-25. The areas calculated beneath Gaussian curves fit to the peaks yielded the proportional distributions. Results are shown for transcripts from the following genes: SAG1 (YJR004C), PRP39 (YML046W), SST2 (YLR452C), and HSP104 (YLL026W). Bottom row, Determination of relative mRNA levels across a polysome sucrose gradient. Total RNA (ϳ2 g) from the peak fraction in the polysome region of the gradient and the equivalent volume from all other fractions was converted to cDNA using 0.5 g oligo-dT 25 with a G/C/A 3Ј anchor and SuperScript II (Amersham Biosciences) and used in Q-PCR ("Experimental Procedures"). tion Index (32)) to the 12 nucleotides surrounding the AUG for all the ORFs in this database revealed no significant trend over the total compilation of data (not shown).
Transcripts in Monosomes and mRNP Particles-Because many of the well-established mechanisms of translational control involve sequestration of mRNAs into mRNP particles (peak 1) or monosomes (peak 2) (13,15,33,34), transcripts enriched in these compartments are likely candidates for translational control. For example, the transcripts of the GCN4 and HAC1 genes (supplemental Fig. 2S) both have well-established mechanisms of translational regulation (35,36), and in steady-state growing cells they are found 85 and 80%, respectively, in the combined mRNP and monosome compartments, far above the average value of 10% (Fig. 3a). The transcripts over-represented in the mRNP and monosome compartments tend to be low abundance (supplemental Fig.  3S), suggesting that translational control occurs more frequently with the less-abundant proteins. No RNA structural feature was found to correlate with appearance in this fraction, including CAI, AUG CAI, and estimated lengths of 5Ј and 3Ј untranslated regions (data not shown).
The RPL41A transcript has an ORF of only 78 nucleotides and occurs primarily in monosomes, with a small fraction in disomes (supplemental Fig. 2S). This represents maximum loading of this short mRNA (37). In this case, the monosome is actively translating the encoded protein and, contrasting RPL41A with GCN4, it is apparent that caution must be exercised in interpreting monosome peaks.
Changes in Gene Expression During the Response of Yeast to Pheromone-Because translation state array analysis (TSAA) provides simultaneous measurements of the level and translational efficiency of any detectable transcript, it is possible to globally compare gene expression between cells with different phenotypes. As a test, we examined the changes in gene expression in yeast after acute exposure to mating pheromone. Ratios of treated to untreated cells were calculated for transcript levels, translational efficiencies, and estimated protein synthesis rates, providing a global view of changes in gene expression across the entire transcriptome (Fig. 4). Out of 3874 transcript profiles from TSAA that could be modeled with an R 2 of at least 0.5 under both conditions, 3058 showed less than a 2-fold change in estimated rate of protein production (gray data points, shown without drop lines). Of 816 genes that were altered at least 2-fold in estimated protein expression, the change was driven solely by transcript level in 76% (617 transcripts, red data points on the diagonal at translation efficiency ϭ 0.5-2.0). The remaining 24% of these transcripts showed at least a 2-fold alteration in The x-axis has been truncated at 600 to improve the display for the majority of the data points. c, ribosome densities on mRNAs in peak 3 (polysomes). Data shown are for the 4066 transcripts with at least 50% mRNA in peak 3 and ORF lengths greater than 400 nucleotides (see text). Ribosome density was calculated for each mRNA as ribosomes per 1000 nucleotides of ORF length, using the average number of ribosomes per mRNA in peak 3 (obtained from modeling). The x-axis has been truncated at 600 to improve the display for the majority of the data points. d, ribosome density plotted against ORF length. ORF lengths were obtained from the Saccharomyces Genome Database (genome-www.stanford.edu/Saccharomyces). The x-axis has been truncated at 4000 to improve the display for the majority of the data points. translational efficiency; 163 were translationally up-regulated (blue data points) and 36 were down-regulated (green data points). As can be seen from Fig. 4, many of the translationally controlled genes also changed in transcript level. Customary transcript array analysis would have ignored those that were regulated solely at the translational level and would have erred quantitatively with those transcripts that showed mixed regulation (see "Discussion").
It is also of note in Fig. 4 that very few transcripts show opposing changes in transcript level and ribosome loading in response to ␣-factor. These homodirectional changes in transcript level and translation are consistent with what was found recently in response to two other external stimuli (38).

Functional Classification of Regulated Genes Revealed from
Comparing Quantitative Proteomic Analysis with TSAA-A quantitative comparison of the proteomes of steady-state growing yeast before and after treatment with ␣-factor was carried out using the ICAT methodology (39). From a total of 607 tagged proteins that provided reliable data for identification and quantitation, 47 were found to be up-regulated and 79 down-regulated by the criteria used in Table I. After independently clustering the results of TSAA and ICAT analysis by function, those functional groups that were significantly overrepresented in both datasets are listed in Table I along with the corresponding p values. Twenty-five genes in the "carboxylic acid transport" category were identified as decreasing in expression significantly in response to ␣-factor and clustered with highly favorable p values ( Table I). Two of these genes were identified by both TSAA and ICAT. This category was dominated by genes involved specifically in amino acid transport (19 out of the 25). The entire gene list from TSAA was searched for additional genes in amino acid transport, and six more were identified and added. Two of the genes in the general carboxylic acid transport category identified from the ICAT analysis were eliminated because they did not specifically function in amino acid transport. The TSAA and ICAT results for the complete set of amino acid transport genes are plotted in Fig. 5a. The entire category shows a strong bias toward down-regulation and, of the six additional transcripts added to the dataset based solely on functional category, only one, AGP3, showed a predicted increase in expression rate, while the other five followed the overall trend of this category.
A group of 23 genes, which were found to be elevated in expression by either TSAA or ICAT (four of these genes were identified in both analyses), clustered into the category of "protein catabolism" ( Table I). Inspection of this group of genes showed a strong bias toward the PRE, PUP, RPN, and RPT families of genes that encode components of the proteasome. Of the 19 up-regulated protein catabolism genes discovered through TSAA, 12 were from one of these four gene families, as were six of the eight identified by ICAT (three were found by both TSAA and ICAT). The TSAA dataset was examined for additional genes from these four families, and 18 more were added, resulting in a total of 33 genes. TSAA and ICAT results for these four families of proteasome genes are plotted in Fig. 5b. Of these 33 genes, only one was significantly down-regulated (i.e. below a ratio of 0.75), and in this case, RPN2, both TSAA and ICAT showed decreased expression in response to ␣-factor.
Not surprisingly, one of the most common groups of genes detected by TSAA comprised those in the pheromone response category (Table I). This group was not highly represented in the results of the ICAT analysis (compare with the protein catabolism category). Because many pheromone response genes have regulatory functions, it is likely that the levels of their encoded proteins are generally low, making reliable detection difficult. FIG. 4. The response of the yeast transcriptome to mating pheromone as measured by TSAA. TSAA was used to compare cells before and after 30-min treatment with ␣-factor, as described under "Experimental Procedures." The data are presented as the ratio of plus ␣-factor to minus ␣-factor for those transcripts that could be modeled with R 2 values Ͼ0.5 and that contained ORFs Ͼ400 nucleotides in length. The ratios of transcript levels (normalized Cy5 signals), translational efficiencies and calculated protein synthesis rates are plotted (see "Experimental Procedures" for the bases of these calculations). The calculated changes in protein expression were considered "biologically significant" only if the ratio of synthesis rates was less than 0.5 or greater than 2.0; data points in the "insignificant" range are colored gray. The red data points are those values where the changes in translational efficiencies are in the insignificant range. Those genes for which the ratio of translation efficiencies is Ͻ0.5 or Ͼ2.0 are colored green and blue, respectively.

DISCUSSION
The composition of a proteome is the end product of regulated gene expression; the activities of its component proteins are largely responsible for defining cellular phenotype under a particular physiological state. We (16,20) and others (17)(18)(19)30) have developed experimental approaches to gain insight into the expression of a proteome by evaluating the association of a transcriptome with the translational apparatus. Prior studies used a somewhat arbitrary division of the resulting mRNA profiles into two pools, "poorly translated" and "well translated" transcripts, thereby losing much of the rich information contained in the complete polysome distribution. Using recent approaches (this article and Refs. 30 and 38), which assay multiple fractions from sucrose gradients, one can define with precision, across an entire transcriptome, the proportion of transcripts actively engaged with polysomes and, within the polysome compartment itself, the number of ribosomes associated with each transcript. Furthermore, as demonstrated here, translation of the transcriptomes of two different cell populations can be compared.
The rate of synthesis of a particular protein can be expressed as the number of translationally active transcripts a Those genes whose estimated rates of protein expression from TSAA (Fig. 4) were down-regulated to Ͻ0.75 or up-regulated to Ͼ1.5 by ␣-factor, relative to untreated cells, were analyzed (869 genes up and 1081 genes down) using the gene ontology (GO) tools available on the Saccharomyces Genome Database (www.yeastgenome.org). An identical analysis was carried out with the regulated proteins revealed through ICAT analysis (49 up-and 79 down-regulated). Those genes that were significantly over-represented in both the TSAA and ICAT datasets are given above, along with the probabilities that the observed frequencies could have occurred by chance.
FIG. 5. Changes in transcript level, translation, and protein level induced by ␣-factor for genes encoding amino acid transport and proteasome components. The ratios, plus ␣-factor to minus ␣-factor, obtained from TSAA (red symbols) for transcript levels, translational efficiencies, and calculated protein synthesis rates (natural logarithms) are plotted as in Fig. 4. The blue symbols are the natural logarithms of the ICAT ratios for the proteins detected by this methodology, plotted against translational efficiency and transcript level. The grid lines for synthesis ratio ϭ 1, or e 0 , are shown in bold. Genes that run counter to the general trend of each functional category in either TSAA or ICAT are labeled. a, all 25 genes in the "amino acid transport" category detected in TSAA. Two genes were detected in both TSAA and the ICAT experiment. The outlier, AGP3, were detected only in TSAA. b, all members of the PRE1-10, PUP1-3, RPN1-13, and RPT1-6 gene families that were detected in either the TSAA or ICAT experiment (33 genes total: 16 by TSAA only, 17 by both ICAT and TSAA). RPN2 was an outlier in both the ICAT and the TSAA data.
times the translational efficiency (number of completed protein molecules produced per mRNA per unit time) (15). Because, with a few exceptions, the macroscopic rate of nascent peptide elongation seems to be constant across the transcriptome of a cell (15), the linear ribosome density (ribosomes per 1000 nucleotides) should provide a good comparison of rates of peptide completion between transcripts (discussed further below). To test the validity of this approach, we employed two transcripts that produced the same protein (His3p) with different translational efficiencies. As expected, the better translated transcript was loaded on average with five to seven ribosomes, while the poorly translated mRNA was located on monosomes and small polysomes (Fig. 1d). The ribosome density was calculated in each fraction, multiplied by the quantity of the transcript in the corresponding fraction and summed. From these calculations, the rate of His3p synthesis from the mRNA lacking the uORF was estimated to be ϳ5-fold higher than the poorly translated transcript. Given the necessary assumptions, this is a reasonable agreement with the experimental observation of a 12-fold difference in translational efficiency (Fig. 1c). These assumptions were: 1) one ribosome on each transcript containing the uORF is stalled at termination of the uORF (40) and not actively engaged in translation of His3p, and 2) with an ORF length for HIS3-HA of 759 nucleotides, most of these transcripts are bound by fewer than 10 ribosomes (13 ribosomes per 1000 nucleotides; see Fig. 3c). The average signal in fractions 20 -25 for the uORF-containing construct (Fig. 1d) therefore provided a baseline value that was subtracted from all fractions of both gradients.
One of the striking features of the two datasets reported here is the extraordinary diversity in translation state across a transcriptome. Transcripts can vary from being localized nearly 100% in untranslated mRNP particles to essentially complete association with polysomes. In addition, the linear density of ribosomes along an mRNA was found to range from a maximum of approximately one ribosome per 30 nucleotides, which is the length of mRNA protected by a single eukaryotic ribosome from nuclease digestion (41), to less than one per 1000 nucleotides. Ribosome density is determined by the relative rates of initiation (ribosome loading) and peptide chain elongation (ribosome movement). Although translation is generally controlled at initiation (15), fluctuations in observed ribosome density for some transcripts could arise in principle from variations in elongation rate. Also, it should be underscored that these values are average densities across a transcript and that, for some mRNAs, changes either in elongation rate along the length of the message or in termination rate could result in localized regions of altered ribosome density.
This study of yeast cultures responding to ␣-factor demonstrated several features of the TSAA methodology. First and foremost, analysis of translation state appears to be robust and reproducible from sample to sample. This is clearly illus-trated by the fact that only a fraction of the transcripts evaluated (325 out of 3874) showed significant differences in translation state between the extracts from treated and untreated cells. Second, because independent analysis of these cells by TSAA and ICAT yielded overlapping functional groups of co-regulated genes, it seems that the assumptions that went into estimating protein synthesis rate from ribosome loading onto transcripts were warranted, at least for those proteins and transcripts that were detected by both methods. In comparing TSAA estimates of changes in rates of synthesis with proteome measurements, the ICAT ratios were usually smaller. This was expected because estimates of ribosome loading reflect the instantaneous rate of protein synthesis, while proteome measurements integrate the balance between synthesis and degradation over the course of the experiment. Therefore, except for proteins with very short half-lives, one would anticipate only qualitative agreement between the two measurements. Examples of wide discrepancy between the two approaches would suggest instances of regulated protein stability.
The combined TSAA and ICAT analysis revealed a strongly coordinated up-regulation of proteins of the proteasome, including components of both the 20S catalytic complex (the PRE and PUP genes) and the 19S regulatory complex (the RPN and RPT genes). There are several known examples of specific ubiquitination and degradation by proteasomes in S. cerevisiae that are related to pheromone response (42)(43)(44)(45)(46). Perhaps the increased synthesis of proteasome subunits detected by ICAT and TSAA represents preparation for recovery from pheromone response, through degradation of key regulatory proteins such as Far1p (44,45) and Ste7p (46), and re-entry into the mitotic cell cycle. Progression through the cell cycle is regulated in part by ubiquitin-proteasome proteolysis (47).
An extensive study of the response of transcript levels to ␣-factor has been published (48). With our current experimental paradigm, a robust dataset of transcript levels (Cy5 values) was created containing 100 replicas each in the presence and absence of ␣-factor. Using 3-fold increases or decreases in transcript level as a measure of biological significance, changes were seen in 376 genes of the 5227 where the Cy5 measurement was greater than background in at least one of the two conditions. Of these 376 transcripts, 101 were identified as changing in the previous study (48). In addition, 79 of these with altered transcript levels also changed at least 2-fold in translation efficiency. Of the 3874 transcripts for which ribosome loading was analyzed in this study, 325 showed at least a 2-fold change in translation efficiency but only 44 of these also had significant (3-fold) transcript changes.
The outcome of this study is clear: each species of mRNA is unique, not only with respect to the protein it encodes, but also in its interaction with the translational machinery. Translation of the transcriptome is highly diverse both qualitatively and quantitatively, and it is impossible to assume a simple, linear relationship between the level of an mRNA and the rate of synthesis of its encoded protein. Furthermore, although translational control seems to be quite selective, ribosome loading can change with physiological state and, together with altered protein stability, can produce dramatic discrepancies between transcript levels and apparent rates of protein synthesis.