Proteomics of Saccharomyces cerevisiae Organelles

Knowledge of the subcellular localization of proteins is indispensable to understand their physiological roles. In the past decade, 18 studies have been performed to an-alyze the protein content of isolated organelles from Saccharomyces cerevisiae . Here, we integrate the data sets and compare them with other large scale studies on protein localization and abundance. We evaluate the com-pleteness and reliability of the organelle proteomics studies. Reliability depends on the purity of the organelle preparations, which unavoidably contain (small) amounts of contaminants from different locations. Quantitative proteomics methods can be used to distinguish between true organellar constituents and contaminants. Com-pleteness is compromised when loosely or dynamically associated proteins are lost during organelle preparation and also depends on the sensitivity of the analytical methods for protein detection. There is a clear trend in the data from the 18 organelle proteomics studies showing that proteins of low abundance frequently escape detection. Proteins with unknown function or cellular abundance are also infrequently detected, indicating that these proteins may not be expressed under the conditions used. We discuss that the yeast organelle proteomics studies provide powerful lead data for further detailed studies and that methodological advances in organelle preparation and in protein detection may help to improve the com-pleteness and reliability of the data. 2010.

The yeast Saccharomyces cerevisiae is a widely used monocellular eukaryotic model organism. S. cerevisiae was the first eukaryotic organism of which the complete genome was sequenced (1). The genome contains 6,608 open reading frames of which 5,797 encode polypeptides. 4,666 proteins have annotated functions in the yeast genome database SGD. 1 S. cerevisiae has intracellular compartments typical for eukaryotic cells (Table I). The subcellular localization of proteins in yeast has been investigated using genome-wide chromosomal GFP tagging approaches (2) in addition to immuno-EM and other low throughput methods. The genome-wide studies have provided an unprecedented wealth of information. Yet, for ϳ17% of the 5,797 gene products (939 proteins), many of which have unknown biological function, the subcellular localization has not been assigned. In addition, tagging approaches can result in mislocalizations. Consequently, complementary proteome-wide studies, which avoid tagging, are required. Mass spectrometry-based proteomics, combined with subcellular fractionation, provides such alternative methodology. Yeast is uniquely suitable for this approach because of its relatively low complexity, infrequent occurrence of gene isoforms, and limited extent of posttranslational modifications. In the past decade, 18 studies of the protein content of isolated organelles by means of MS-based approaches (organelle proteomics) have been performed, covering all the subcellular compartments with the exceptions of the ER, the endosome, and the cytosol (Table II and Refs. [3][4][5][6][7][8][9][10]. The protein composition of supramolecular assemblies from yeast, such as the nuclear pore complex, and the translation machineries (11)(12)(13)(14) will not be discussed here, but the experimental design and the challenges are very similar to organelle proteomics.
In this review, we will give an overview of results of the organelle proteome studies and try to answer the following questions. How complete is our current view of organelle proteomes? And how reliable are the proteomics data? Completeness and reliability of the organelle proteome data are dictated to a large extent by methodology. Although the different studies have used a plethora of methods (Table II), two steps are in common for all published organelle proteome analyses. First, pure organelles are isolated by biochemical fractionation methods. Second, the proteins in the purified organelles are identified. The reliability of organelle proteome analyses ultimately depends on the purity of the isolated organelles. In the ideal case, no proteins from different cellular locations (contaminants) should be present in the final preparation. Defining whether or not an identified protein is a contaminant is challenging as proteins may have multiple destinations in a cell, and the distribution over the different

Properties of yeast organelles
Physical and biological properties of organelles are given. OM, outer membrane; IM, inner membrane; IMS, intermembrane space; GL, glycerol; OA, oleic acids; ϩ/ϪCarb., with and without carbohydrates; SER, smooth ER; RER, rough ER; t-SNARE, target soluble N-ethylmaleimide-sensitive factor attachment protein receptor; TGN, trans-Golgi network.  The lipid-to-protein ratio describes the sum of phospholipids and sterols (33,72). In the case of the heavy microsomal fraction, the phospholipid-to-protein ratio was taken from Kuchler et al. (73). b The apparent density of organelles depends on the separation medium. Unless stated otherwise, the reported densities are for organelles in sucrose solution.
c From Ref. 35. localizations may vary throughout the cell cycle and as a function of metabolic and environmental conditions. It is also important to realize that within a single culture cells of different age and cell cycle stages are present. Thus, organelle proteomics studies report "averaged" proteomes. The completeness of the organelle proteomes also depends on the quality of the organelle preparation because loosely attached proteins may be lost during the isolation. In addition, completeness depends on the sensitivity of the analytical methods that are used to identify the proteins. These methods include extraction of the proteins from the organelle preparation, generation and separation of peptides, and (tandem) mass spectrometry analysis to identify the proteins. Because organelle proteomics is so intimately dependent on biochemical methods for subcellular fractionation, we will give a brief overview of the methodology to isolate subcellular compartments. We will not discuss the methods for protein extraction, peptide generation, and identification using tandem MS-based techniques. The analytical methods continue to evolve at a great pace because of ongoing technical advances. Consequently, the methodological details vary considerably between the different studies that were performed over a period of a decade. Recently, state of the art methods in proteomics were summarized in a comprehensive review by Speers and Wu (15). Mass spectrometry-based proteomics methods for organellar proteomics have been reviewed by Andersen and Mann (16) and Yates et al. (17). Methods dedicated to membrane proteomics have been reviewed by Wu et al. (18,19), and recent instrumental developments in mass spectrometry for proteomics have been summarized in Han et al. (20).

ISOLATING ORGANELLES
All methods used to prepare different organelles have several steps in common: cells are cultivated and disrupted, and the cellular content including organelles is liberated. Then, the released organelles are separated into different populations according to their physical or chemical properties. In Table II, details for the organelle isolation in the 18 proteomics studies are given. Before integrating the results of the 18 organellar proteomics studies into a global view of protein localization in yeast, we will discuss a few experimental details of the isolation procedures that may affect the ensemble interpretation.
Considerations about Growth Conditions-The content and/or the abundance of organelle proteomes in yeast depends on the growth conditions. Care must be taken when integrating the results from different studies to obtain a global view of protein localization because different growth conditions have been used. It also must be noted that different strains of S. cerevisiae have been used in the various organelle proteomics studies (Table II).
For most organelle proteome analyses, yeast cultures were grown aerobically at 30°C on a rich medium (YPD, 1% yeast extract, 2% Bacto peptone, 2% glucose) containing an ex-cess of glucose as carbon source, and the cells were harvested in the mid-logarithmic phase. These are "standard" growth conditions for S. cerevisiae, but some organelles require specific growth conditions to be obtained in sufficient amounts (Table II and discussed below).
For the proteomics studies of peroxisomes, cells were grown in conditions that stimulate fatty acid ␤-oxidation (21)(22)(23), which leads to an increase in the number of peroxisomes from one to four per cell (under standard conditions) to as many as 14 per cell, occupying 8 -10% of the cytoplasmic volume (24). Also, mitochondrial morphology and abundance depend on growth conditions (25). For the mitochondrial proteome analyses, yeast was grown on the non-fermentable carbon sources to activate respiration and induce many small mitochondria (26 -30).
When grown on glucose-containing medium, the abundance of lipid particles increases toward the stationary phase. For the proteome analysis, the cells were grown on glucose and harvested in the late exponential phase (31).
Considerations about Organelle Isolation-When aiming at an inventory of proteins from organelles, a sensitive technique such as mass spectrometry will identify not only proteins that are truly located in the organelle of interest but also contaminating proteins. It is important to realize that, for physical and biological reasons, it is not possible to obtain 100% pure organelle preparations. One of the challenges in organelle proteomics is to obtain preparations of subcellular compartments as pure as possible. On the other hand, genuine organellar residents may have multiple locations and may be hard to discriminate from contaminants. Another challenge is to obtain complete/intact organelles. Because of generally lengthy isolation procedures, a fraction of the true resident proteins may be lost because they have a more dynamic or low affinity association with the organelle or because they are prone to proteolytic degradation.
In all of the 18 organelle proteomics studies, the crude cell lysates were subjected to differential centrifugation to obtain enriched crude organelles (Fig. 1). Differential centrifugation can involve multiple subsequent centrifugation steps that are performed at increasing g forces.
Organelles can be further separated from each other on the basis of their physical properties by the means of density gradient centrifugation or free flow electrophoresis or based on their antigenic properties by immunotechniques (6,32). Density gradient centrifugation uses differences in buoyant density of the particles for separation (Table I) and was utilized in all yeast organelle proteomics studies performed thus far ( Fig. 1 and Table II) except for the proteomes of the Golgi (where immunoprecipitation was used) and the cell wall (where differential centrifugation only was used). In the case of peroxisomes, density gradient centrifugation was used in combination with immunologically based pullouts (21).
Suborganelle Preparations-Membrane-enclosed organelles such as mitochondria, peroxisomes, vacuoles, ER, Golgi, and nuclei can be fractionated further into subcompartments. The internal (luminal) content can be separated from the membrane upon osmotic and/or mechanical lysis of the organelles and subsequent differential centrifugation. Because membrane proteins are usually of a lower abundance than luminal enzymes, enrichment of the membrane fraction of an organelle is important for a comprehensive identification of these proteins. In only two cases (vacuoles and mitochondria) have suborganellar proteomes been studied separately (7,8,29,33).
Two questions arise in the light of these studies: how complete are these organelle proteomes, and how reliable are the results of these studies? In other words, what level of sensitivity has been achieved, and how selective were these studies? Sensitivity is determined by the analytical methods used as well as by the isolation procedure in which proteins may be lost. Selectivity is determined by the quality of the organelle preparation: how many contaminating proteins from different cellular locations were present in the purified organelles? To be able to answer these questions accurately, we would need to know the subcellular localization of all protein in yeast. But, of course, we do not have this information because the knowledge of the subcellular localizations was the purpose of organelle proteomics. Fortunately, for yeast, we can use a good approximation because of the wealth of information organized in SGD, a manually curated database that integrates information on all genes and gene products of S. cerevisiae.
Defining the Reference Proteomes-For the purpose of this review and based on the annotation of subcellular locations from the yeast genome database SGD, we define that proteins can be located in 11 different subcellular locations: cell wall, plasma membrane, cytosol, nucleus, mitochondrion, peroxisomes, lipid particle, endosome, ER, Golgi, and vacuole (supplemental Table 2). We define the content of each reference proteome as "all proteins that according to the annotations in the SGD are present in the subcellular location." The annotations for localization in the SGD are derived from a variety of genome-wide high throughput studies (2,35), low throughput studies on individual proteins (e.g. immuno-EM), and global computational predictions. Localization data obtained in the 18 mass spectrometry-based organelle proteomics studies had not been included in the SGD at the time of our analysis, apart from the mitochondrial proteomics data, the implications of which will be discussed below.
It is possible that a protein has SGD annotations for localization in more than one organelle. Multiple annotated locations of a protein could be biologically relevant as some proteins reside in more than one subcellular location. But dual or multiple localizations may also be artifacts of the methods used, e.g. a tagged protein may be targeted to the wrong location. So, we emphasize that the reference proteomes are not complete and contain mistakes. Nonetheless, by and large they give us a good approximation of the numbers of proteins to expect. Because we have to take into account dual/multiple localizations, proteins with more than one annotated localization in the SGD were counted in each separate reference organelle proteome.
The reference organelle proteomes defined in this way have very different sizes (Fig. 2, large pie diagram, bottom right). The two smallest annotated proteomes harbor only 35 (lipid particles) and 58 proteins (peroxisomes). In contrast, the two largest reference proteomes contain 1,820 and 1,870 proteins FIG. 1. Centrifugation for subcellular fractionation. Differential centrifugation is usually the first step after cell lysis in subcellular fractionation protocols and is often followed by density gradient centrifugation. The fragments of the cell wall, unlysed cells, and the nuclei can be sedimented at low speed (3,000 -5,000 ϫ g). The pellet can be subjected to density gradient centrifugation for further purification of the nuclei. The postnuclear supernatant, containing all other organelles and the plasma membrane vesicles, is usually subjected to density centrifugation for further fractionation. The postnuclear supernatant can also be subjected to a second round of differential centrifugation at higher g force (20,000 -30,000 ϫ g). The mitochondria and peroxisomes are then further purified from the pellet using density gradient centrifugation, and the vacuoles, smooth ER (SER), and plasma membrane (PM) vesicles can be purified from the supernatant. Alternatively, the supernatant can be subjected to ultracentrifugation to sediment light organelles and organellar vesicles and separate these from the cytosol and lipid particles that remain in the supernatant. We emphasize that the figure is an oversimplification because of limitations to the designation of organelle names to crude differential fractions. RER, rough ER. FIG. 2. Abundance, annotated localization, and experimental coverage of yeast proteins. The yeast proteins were split into six groups according to their cellular copy numbers. The top six bars represent the abundance groups and show the percentage of proteins identified by experimental proteomics studies in each group. The pie diagrams to the right of the bars show the annotated localizations of the proteins (identified plus unidentified) in each abundance group. The total number of unique proteins per group is indicated as the "SUM," and the number of identified proteins is indicated in parentheses. Because proteins with multiple annotations were counted more than once, the sum of the proteins from all the segments of each pie is not the same as the SUM of unique proteins. The bottom bar shows the percentage of (cytosol and nucleus, respectively) of which 576 proteins have annotations for both organelles. The mitochondrial proteome with 1,056 proteins is the third largest, containing 594 proteins with unique mitochondrial annotation. Other organelle proteomes are of intermediate size, containing between 100 and 300 annotated proteins.
For many of the proteins in the reference proteomes, not only their subcellular localizations but also their copy numbers in the cell have been determined. The copy numbers are largely based on a genome-wide study using TAP fusions in conjunction with quantitative Western blot analysis, which yielded copy number estimates for 3,837 of the 5,797 yeast proteins (35). Again, just like the annotation of subcellular localizations, the annotated copy numbers for some of the proteins may be incorrect (for instance, when the tag interferes with targeting, it is also likely to affect the copy number). However, for the purpose of a global analysis of the organelle proteomes, the data provide us with extremely useful information. We distributed all quantified proteins over five abundance bins (copy numbers: Ͻ500, 500 -2,000, 2,000 -5,000, 5,000 -20,000, and 20,000 -2,000,000). The bin widths were chosen to have approximately equal numbers of proteins per bin. The small pie charts in Fig. 2 show the distribution of cellular localizations of the proteins in each abundance bin. The distribution varies among the bins: the reference peroxisomal and nuclear proteomes are slightly biased toward lower copy numbers (only 75 nuclear proteins and no peroxisomal proteins are present in the cell at a copy number Ͼ20,000). The bias for low abundance proteins in the nuclear proteome may be caused by the presence of many low copy number transcription factors and regulatory proteins. The low abundance of peroxisomal proteins is caused in part by the growth conditions used in the studies to determine abundance: peroxisomal proliferation was not induced (with oleic acid). The reference proteomes for the cytosol, the cell wall, and lipid particles have relatively many high copy number proteins, i.e. highly abundant glycolytic and other metabolic enzymes and free ribosomes of the cytosol.
The protein abundances range from 10 1 to 10 6 molecules per cell in four of the reference proteomes (cell wall, plasma membrane, mitochondria, and cytosol), from 10 1 to 10 5 in three organelles (vacuole, nucleus, and Golgi), from 10 1 to 10 4 molecules per cell in lipid particles and ER, and from 10 2 to 10 5 for endosomal proteins. The reference peroxisomal proteome has the narrowest range, covering only 2 orders of magnitude from 10 2 to 10 4 molecules per cell (Table I).
Overall Experimental Coverage-The 18 proteomics studies of yeast subcellular compartments have collectively led to a coverage of 61% of the proteins from the entire predicted yeast proteome (3-10, 21-23, 26 -31, 34). In other words, 3,516 different proteins were found in at least one of 18 individual proteome analyses. It is noteworthy that ϳ80% (4,517 proteins) of the yeast proteome has been detected under the standard growth conditions by chromosomal TAP and GFP fusions (35). The lower fraction of identified proteins by the ensemble of proteomics studies may have several causes. 1) Some subcellular compartments have not been analyzed in dedicated proteomics studies (cytosol, ER, and endosome). 2) Loosely attached proteins may be lost in the organellar isolation procedures. 3) Some proteins escape identification in the proteomics studies because of their sequence properties. For example, small and hydrophobic proteins may not yield suitable peptides for identification by mass spectrometry when the common proteolytic enzyme trypsin is used. Trypsin cleaves C-terminally of arginine and lysine residues, and small/hydrophobic proteins may lack these residues at suitable positions.
As indicated in the Fig. 2, the fraction of identified proteins in the collective proteome studies increases with protein abundances. Whereas only 54% of yeast proteins with a copy number Ͻ500 are covered by the proteome analyses, nearly 95% of the proteins with copy number Ͼ20,000 have been identified. These numbers indicate that abundance is the dominant determinant for protein identification.
The coverage of the proteomes of the individual organelles, expressed as the percentage of identified proteins of each reference proteome, varies considerably (Fig. 3). The reference proteomes with the highest coverage are the mitochondrion (88%) and peroxisomes (84%). The proteomes of the nucleus, Golgi, ER, and lipid particles are covered to 70 -75%. The vacuolar and cytosolic proteomes with 60 and 65% coverage, respectively, are close to the average of 61%. The proteomes of the cell wall (46% coverage), the endosome (50% coverage), and the plasma membrane (51% coverage) are covered below average. The low coverage of the cell wall proteome is caused by the experimental design, which was aimed at identification of covalently bound (mainly glycosylphosphatidylinositol-anchored) proteins. Soluble cell wall proteins were discarded by applying harsh extraction methods (3). The proteins without known localization were covered to only 16%. The low coverage may indicate that these proteins are not usually expressed under the conditions tested. Just like for the entire yeast proteome, the coverage of proidentified proteins for the entire yeast proteome, and the large pie at the bottom right shows the annotated localizations of all 5,797 yeast proteins. The localization of the 3,516 yeast proteins identified in at least one proteome study and the localization of the 2,281 unidentified proteins are shown in the bottom pies (left and middle, respectively). Each color indicates a subcellular location: orange, cytosol; pink, plasma membrane (PM); beige, cell wall; red, mitochondria; green, vacuole; light violet, endosome; violet, Golgi apparatus; purple, endoplasmic reticulum;gray, nucleus; blue, peroxisome; magenta, lipid particles; yellow, no localization assigned; black, others (e.g. extracellular). teins in the individual organelles tends to be better for more abundant proteins (Fig. 3).
2,281 proteins (ϳ 1 ⁄3 of the predicted proteome) still remain to be identified by mass spectrometry. The largest group of unidentified proteins (785) consists of proteins without known localization of which the majority does not have assigned biological and/or molecular function. The abundance of unidentified proteins is biased toward low copy numbers: more than 1 ⁄3 (700 proteins) of the unidentified proteins have copy numbers of less than 2,000 per cell, and for more than a half (1,233 proteins), the copy number could not be determined. It seems that all high throughput methods (both mass spectrometry-based proteomics and tagging techniques such as GFP and TAP fusions) have so far failed to record these proteins. The reason could be that the copy numbers are below the detection limit (ϳ40 copies per cell) or that these proteins are not expressed under the conditions used. Among the unidentified proteins, there are many regulatory proteins that are involved in signal transduction, such as GTPases, and transcription and translation factors, which might be expressed only under specific conditions.
Sensitivity of Experimental Proteomes-The coverage of the reference proteomes described above is the sum of all the proteins identified in the different experimental studies. It does not imply that all of the identified proteins were found in experimental studies aimed at the organelle of interest; some proteins may have been found only as contaminants in experimental studies aimed at studying different organelles.
If the reference proteomes did not contain mistakes and the sensitivity of the analytical techniques was infinite, each experimental proteome study would find all proteins of the corresponding reference proteome. In reality, the overlap is never complete because the reference proteomes are not perfect, and the sensitivity of the analytical techniques is limited. For the eight cellular locations that have been targeted experimentally by proteomics studies, the overlap (apparent sensitivity) is indicated in Fig. 3 as striped areas of each organelle reference proteome. Clearly, there is a great variety in the apparent sensitivities of the different proteomics studies. The dedicated investigations of the plasma membrane, the cell wall, and Golgi found less than 1 ⁄4 of their respective reference proteomes  Fig. 2, the coverage of each organellar proteome is depicted as the total coverage of the reference proteome (the last bar in each small diagram) or for each abundance group separately (first six bars from left to right represent copy numbers: Ͻ500, 500 -2,000, 2,000 -5,000, 5,000 -20,000, 20,000 -2,000,000, and no abundance data available). The coverage is presented as the fraction of proteins in each reference proteome that was identified by any of the 18 experimental proteome studies. The fraction of each reference proteome that is covered by the dedicated organelle proteome study is indicated as the striped area of the total organelle coverage and represents the apparent sensitivity of the experimental proteomics experiments. by dedicated studies (striped parts of the bars) than by the ensemble of all 18 organelle proteome studies (total bars). Strikingly, in some organelles, the majority of covered proteins was found in different studies than the one(s) targeted at the specific organelle (i.e. endosome, cytosol, and ER). The extra proteins found in the ensemble could be contaminants from proteome studies dedicated at other organelles or could indicate incomplete or wrong database annotations of the reference proteomes. Here we discuss some striking examples, which highlight the difficulty of distinguishing between the two possibilities.
One way to obtain an indication about specificity is to calculate the fraction of proteins identified in each of the eight experimental proteomes that was indeed annotated in the SGD as localized in the organelle of interest. In Fig. 4, this apparent specificity is indicated. The apparent specificity ranges from 100% (all proteins identified in the lipid particle and cell wall proteomes have corresponding annotations) to less than 30% (more than 70% of the identified proteins in the plasma membrane proteome have annotations for different cellular locations). The high apparent specificities of the lipid particle and cell wall proteomes are a direct consequence of the biochemical fractionation methods, which resulted in very clean preparations because of the characteristic physical properties of these subcellular isolates. But a low apparent specificity does not necessarily indicate low quality of the biochemical fractionation. For instance, when highly sensitive analytical methods are used, the apparent specificity is likely to become lower, because low abundance contaminants are more efficiently detected (e.g. in the case of the nuclear proteome). Also, SGD annotations play a crucial role in the apparent specificity. For example, the outcome of the mitochondrial proteome analyses has been included in annotations for localization in the SGD, which obviously will increase the apparent specificity. In contrast, data from another proteome analysis have not (yet) been included in the SGD, which lowers the apparent specificity for those proteomics studies that identify a large number of new organelle proteins (for example, the vacuolar proteome; see below).
A rough indication about the overall specificity can also be drawn from the number of uniquely identified proteins in each of the eight experimental proteome analyses (Fig. 5). Nearly 74% of all identified proteins were found in one proteomics study only, and 22% were found in two experimental organelle proteomes. The remaining 4% of the identified proteins were found in three to five proteomes with a clear trend (ϳ 2 ⁄3) toward a high abundance (copy number Ͼ20,000) for those proteins. Some of these proteins are likely nonspecific contaminations, e.g. P-type ATPase Pma1 of the plasma membrane, ribosomal proteins, and many glycolytic enzymes. But among the proteins present in multiple proteomes are also proteins that are known to have multiple localizations, e.g. proteins involved in signal transduction like GTPases Rho1, YPT1, and CDC42 or the protein kinase TOR2. No proteins, not even very high abundance proteins, were found in more than five organelle proteomes.
The coverage of the reference proteomes for ER, endosome, and cytosol are above 50%, although no dedicated proteomics studies were performed for these organelles. This could be indicative of large amounts of contaminants in the organellar preparations. However, there are also alternative explanations for the detection of these proteins. The largest experimental organelle proteomes, the nuclear proteome, contributed the most to their coverage. Almost all proteins (236 of 253 identified proteins with ER annotations, 42 of 52 endosomal proteins, and 1,079 of 1,174 cytosolic proteins) were identified in the nuclear proteomics studies. There is a straightforward biological explanation for the appearance of ER proteins in the nuclear proteome as the nuclear envelope is continuous with the ER. The perinuclear ER has general ER function and covers ϳ20% of the ER surface (36,37).
We took a more detailed look into the coverage of the reference ER proteome and found that of the 253 identified proteins 160 ER-annotated proteins were uniquely detected in the nuclear proteomes. As indicated above, finding these ER-annotated proteins in the experimental nuclear proteomes is biologically relevant. Of the remaining proteins, 62 were detected in two experimental proteomes, nine proteins were detected in three proteomes, and five proteins were detected in four proteomes, mainly from the experimental nuclear, mi- FIG. 4. Apparent sensitivity and specificity of experimental organellar proteomics studies. The apparent sensitivity was calculated as the percentage of proteins from the reference proteome that was identified by the dedicated experimental organellar proteome study. The apparent specificity was calculated as the fraction of proteins identified in the dedicated experimental proteomics studies that was annotated as localized in the same subcellular location. tochondrial, vacuolar, and peroxisomal proteomes. The annotated functions of the ER proteins found in multiple experimental proteomes are biased toward sterol and phospholipid biosynthesis and protein trafficking across the secretory system. The proteins found in multiple organelle proteomes are relatively low abundance proteins and, in the light of the mentioned functions, might be of biological relevance; i.e. these proteins have multiple localizations in the cell.
The proteins with endosomal annotation make up 14% of the experimental Golgi proteome and 5% of the vacuolar proteome, respectively. In the other experimental proteomes, endosomal proteins account for a very minor fraction (Ͻ1%, even though the absolute number in the nuclear proteome is large). The endosome is an integral part of the secretory machinery and shares many components of the vesicle fusion and protein trafficking complexes with vacuoles and the ER-Golgi network. It is not surprising, therefore, that many of the proteins that are annotated as "endosomal" in the yeast da-tabase and that are found in the Golgi and vacuolar proteomes have functions in ER-to-Golgi transport. The identified endosomal proteins in the Golgi and vacuolar proteomes are therefore unlikely to be contaminants.
The experimental proteome of the plasma membrane has the lowest apparent sensitivity and specificity of all eight proteomes (Fig. 4). In two dedicated proteomics studies, 121 proteins could be identified of which 73 proteins are annotated as cytosolic in the SGD. Those 73 proteins are mostly high abundance contaminations such as ribosomal proteins and glycolytic enzymes.
How to Deal with Contaminants?-The above examples show the need to deal explicitly with contaminants because no biochemical separation technique will yield 100% pure organelles. In all organellar proteomics studies, the obtained list of identified proteins was evaluated very carefully by comparison with existing database and literature information. Proteins that had not previously been assigned to the studied organelle were either regarded as contaminants or considered as novel proteins. An example is the peroxisomal proteome (23) where the authors could identify ϳ240 proteins. After careful analysis, only 46 of these proteins were assigned as peroxisomal based on prior knowledge about the peroxisomal content. However, when aiming at identification of new proteins not previously annotated to be associated with an organelle, a database-and literature-based evaluation may not be sufficient.
One way to deal with contaminants would be to obtain purer organelle preparations. The mitochondrial proteome, which is of high apparent sensitivity and selectivity, indicates that very good purity can be achieved. However, for physical and biological reasons (38 -41), it is unlikely that organelles can be isolated to an absolute purity and homogeneity; rather, the final preparations are enriched to a variable extent depending on the fractionation methods. In many cases, marker proteins for the organelle of interest were used to assess the purity (either by Western blotting or by activity assays; Table  II). Marker enzymes for different organelles are indicated in Table I. Marker enzymes are useful for qualitative assessment of organelle enrichment, but the assays are not sufficiently sensitive to judge the purity of a preparation for MS purposes. Even when organelle preparations appear completely pure based upon immunological detection of the markers, they may still be contaminated, and these contaminations will be revealed by mass spectrometry because of the higher sensitivity of the method. The issue of contaminating proteins is particularly problematic because proteins in cells and organelles are present in abundances that can vary by 6 orders of magnitude. Consequently, even if a very small fraction of a contaminating organelle is present in the final preparation, highly abundant proteins originating from the contaminating organelle may be as easily detectable by MS as the low abundance proteins from the organelle of interest.

FIG. 5. Proteins identified in multiple experimental proteomes.
Each of the 3,516 identified proteins was found in one to five experimental proteomes. The bars indicate the total number of proteins found in one, two, three, four, or five proteomes. The inset shows an enlargement of the bars for the proteins identified in three, four, or five proteomes. The colors indicate in which experimental proteomics studies the proteins were identified: pink, plasma membranes; beige, cell walls; red, mitochondria; green, vacuoles; violet, Golgi apparatus; gray, nuclei; blue, peroxisomes; magenta, lipid particles.
Instead of trying to avoid contaminants by using (apparently) very pure preparations, the contaminations can also be accounted for. One way of doing so is to use additional quality assessments, for instance GFP-tagged fusions. However, as mentioned above, GFP-tagged proteins are not always targeted to their native destinations, which results in mislocalizations.
A robust approach to account for contaminants is to use relative quantification-based techniques (for reviews see, Refs. 16, 17, and 42). The original idea of this approach (for a review, see Ref. 43) is that proteins that are physically associated with an organelle become enriched upon purification of this organelle, for instance by density gradient centrifugation. The contaminants, in contrast, although still present and detectable by mass spectrometry, will become depleted upon purification. The specific enrichment or depletion can be quantified by comparing protein abundances in crude and pure preparations of the organelle of interest (7,21,44) or by comparing abundances in different fractions of a density gradient after centrifugation (Fig. 6). When the protein abundances in all of the fractions are compared, proteins with multiple localizations can be recognized because they will be enriched in more than one part of the gradient. Enrichment in multiple fractions of a density gradient may also occur when the preparation of an organelle is heterogeneous. Such heterogeneity may be the result of the presence of slightly different organelle composition within a single cell or because the final preparation contains organelles derived from cells of different age and cell cycle stage.
For quantification, isotope tags such as iTRAQ or ICAT can be used as developed by Lilley and co-workers (44 -46) to assess the localization of organelle proteins by isotope tagging (LOPIT). Alternatively, label-free quantification can be performed by protein correlation profiling, which is based on the intensities of the peptide peaks in different MS analyses (47)(48)(49) using the proteins abundance index, which represents the number of peptides identified divided by the number of theoretically observable peptides (50,51), or spectral counting where the number of mass spectra identified for a protein is used as a measure of the abundance of a protein (52). Use of the spectral count-based quantification for absolute and relative protein abundance determination in complex mixtures is described in Vogel and Marcotte (53). These different methods to account for contaminants are globally outlined in Fig. 6.
Label-based quantification in the yeast proteomics studies was successfully utilized for the vacuolar membrane and peroxisomal proteomes (7,21). The most valuable result of these studies is that many proteins with annotated localizations other then vacuolar or peroxisomal were found to be specifically co-enriched with vacuoles or peroxisomes, which strongly indicates that they are part of the true proteomes of these organelles. Thus, the low apparent specificities of the peroxisomal and vacuolar proteome studies (Fig. 4) appear FIG. 6. Principle of LOPIT and label-free quantification. The LOPIT technology is based on the notion that biochemical purification of an organelle never results in an absolutely homogeneous preparation. Instead, proteins that are specifically associated with the organelle of interest (green dots) become enriched upon purification (top layer; a), and contaminants (red dots), although still present in the enriched fraction a, become depleted. The LOPIT technology uses labeling of peptides with stable isotopes such as ICAT or iTRAQ (right-hand side of the figure). For this, peptides from the enriched (a) and depleted (b and c) samples are labeled with reagents of different masses, mixed together, and processed by liquid chromatography and mass spectrometry. The isotope labeling allows relative quantitation of the protein abundance in the enriched and depleted fractions simultaneously (a, b, and c). For the label-free quantitative approach (left-hand side of the figure), peptides from the enriched (a) and depleted (b and c) fractions are processed by liquid chromatography and mass spectrometry separately without labeling or mixing. In this case, the protein abundance in each fraction is quantified based on the number of observed peptides divided by the number of observable peptides (protein abundance index), the peptide peak intensity (protein correlation profiling), or the spectral count (number of identified spectra for each protein). not to be caused by contaminants but rather by incomplete database annotations for localization. Proteomics studies in which quantification was used to assess subcellular localization allow reassignment of the subcellular localizations of ambiguously annotated proteins.

CONCLUSIONS
Mass spectrometry-based proteomics is a powerful technique to study the subcellular localization of native, nontagged proteins. The currently available yeast organellar proteomes have yielded a tremendous amount of valuable information. The proteomics studies complement high throughput localization studies based on GFP tagging and imaging. Both techniques have their advantages: proteomics studies do not require tagging, which may cause targeting failure, and GFP tagging studies do not require subcellular fractionation. Subcellular fractionation of the cell lysate is an essential part of organellar proteomics. Robust methods to isolate subcellular proteomes are available and produce highly purified preparations. But the isolation procedures are lengthy and may lead to loss of proteins that are loosely attached or susceptible to degradation. Therefore, more rapid isolation procedures, e.g. based on affinity pullouts, as used for Golgi isolation (6) are desirable (54).
No organellar preparation will be completely free of contaminants from different subcellular locations. Quantitative proteomics techniques that assay enrichment profiles allow for the discrimination between contaminants and genuine organellar residents and facilitate the assignment of cellular localizations to the identified proteins (7,21,44,47,55,56). With the ongoing developments in quantification and identification methods, it will become feasible to perform a global analysis of protein localization in yeasts from a single culture. One could envision that such an experiment would use a multidimensional fractionation scheme and extensive analysis of co-fractionation profiles that follow from quantitative mass spectrometry using e.g. the multiple reaction monitoring method (57,58). We expect that combining these data with those from co-purification strategies where protein complexes are isolated using tagged proteins will provide a more complete and accurate description of protein localization. Ultimately, dynamics will also have to be included in the localization experiments because proteins may be associated only temporarily, and dependent on the environmental conditions, with an organelle proteome (59).