Charting the Protein Complexome in Yeast by Mass Spectrometry*

It has become evident over the past few years that many complex cellular processes, including control of the cell cycle and ubiquitin-dependent proteolysis, are carried out by sophisticated multisubunit protein machines that are dynamic in abundance, post-translational modification state, and composition. To understand better the nature of the macromolecular assemblages that carry out the cell cycle and ubiquitin-dependent proteolysis, we have used mass spectrometry extensively over the past few years to characterize both the composition of various protein complexes and the modification states of their subunits. In this article we review some of our recent efforts, and describe a promising new approach for using mass spectrometry to dissect protein interaction networks.

SCF (or Skp1, Cul1, F-box) ubiquitin ligases are heterotetrameric complexes (the fourth subunit being the recently identified RING-H2 protein Hrt1/Roc1/Rbx1) that mediate the attachment of ubiquitin to substrate proteins (1). The attachment of ubiquitin, in turn, leads to proteolysis of the ubiquitinated substrate by the 26 S proteasome (2). SCF Cdc4 , the progenitor of the SCF family of ubiquitin ligases, was discovered in budding yeast as a regulator of entry into the S phase of the cell cycle (3,4). Interestingly, Cdc4, which directly binds and recruits substrates to the SCF Cdc4 complex, is a member of a large family of proteins that contain a conserved domain of ϳ40 amino acids referred to as an F-box (5). The F-box mediates the association of Cdc4 with Skp1, which anchors the substrate-binding Cdc4 subunit to the Skp1-Cul1-Hrt1 catalytic core of SCF. Other proteins that contain F-box domains can substitute for Cdc4 to form a diverse array of SCF ubiquitin ligases, each of which is capable of attaching ubiquitin to a distinct set of proteins whose identity is determined by the substrate-binding specificity of the F-box-containing subunit (1).
In addition to forming multiple complexes whose specificities vary depending upon the identity of the F-box subunit, a second mechanism for achieving diversity in the SCF pathway derives from the requirement (in all cases that have been examined in detail so far) that substrates be phosphorylated to bind their respective F-box proteins. Multiple protein kinases can target substrates to a single SCF complex. For example, the cell cycle inhibitor Sic1 is targeted to SCF Cdc4 via its phosphorylation by G 1 cyclin-Cdk complexes (6), whereas the transcriptional regulator Gcn4 is targeted to SCF Cdc4 via its phosphorylation by the Srb10 and Pho85 protein kinases, which have been implicated in transcriptional regulation (7,8). In this manner, one SCF complex, SCF Cdc4 , can independently regulate transcription and the cell cycle.
The diversity-generating mechanisms described above enable SCF complexes to participate in multiple aspects of cell physiology, including the cell cycle, innate immune response, transcription, and signal transduction during development (1). This appreciation that SCF has a broad impact on cell biology motivated our interest in identifying novel components, regulators, and targets of SCF complexes in yeast and animal cells.
Discovery of a Function for COP9 Signalosome Based on Its Identification as an SCF-interacting Complex-To gain further insight into the targets and regulation of SCF ubiquitin ligases in mammalian cells, we sought to identify proteins that interact with the SCF subunit Cul1 (9). Wild-type and C-terminally truncated Cul1 containing nine repeats of the Myc epitope appended to their C termini were individually expressed in murine cells from recombinant retroviruses. Cultures of control (no tagged Cul1) and Cul1⌬CMyc9-expressing cells were lysed, and the soluble fractions of the lysates were subjected to affinity purification on an anti-myc monoclonal antibody resin. Bound proteins were eluted and displayed by SDS-PAGE followed by staining with silver. By comparing purifications from the control and experimental samples, proteins that associated specifically with Cul1⌬CMyc9 were visualized readily. These bands were excised from the polyacrylamide gel and subjected to sequence analysis by matrix-assisted laser desorption/ionization-time of flight and nanoelectrospray mass spectrometry. This effort yielded 16 Cul1⌬Cinteracting proteins. Among the Cul1⌬C-interacting proteins, all known subunits of SCF (Hrt1, Skp1, Cul1⌬C) were detected, as were several different F-box proteins. Intriguingly, eight polypeptides that belong to a single protein complex known as CSN (COP9 signalosome) were also found. CSN is a conserved protein complex that is found in plants, animals, and the fission yeast Schizosaccharomyces pombe (10). CSN contains eight subunits, each of which has a homolog in the eight-subunit lid subcomplex of the 26 S proteosome. The relationship between CSN and the lid has led several researchers to speculate that CSN is involved in the ubiquitin/ proteosome pathway, but the specific biochemical function of CSN remained unknown.
The best understanding of the physiological role of CSN comes from studies by Deng and colleagues (11) on the mustard weed Arabidopsis thaliana. This group has implicated CSN in photomorphogenesis, the far-ranging changes in development that occur upon exposure of seedlings to light. Exposure to light causes plants to express many new genes, including those that are involved in the assembly of chloroplasts. These light-induced genes are switched on by a battery of transcription factors, one of which is Hy5 (see Fig. 1 for model). In the dark, Hy5 is kept inactive by its association with Cop1, which, based on its RING finger and role in Hy5 turnover, may function as a ubiquitin ligase (12). The action of Cop1 prevents Hy5 from accumulating to detectable levels in the nucleus of cells from dark-grown seedlings. In mutants lacking subunits of CSN, or in lightgrown seedlings, Cop1 redistributes to the cytoplasm, enabling the accumulation of a nuclear pool of Hy5, which then activates transcription of a regulon of photomorphogenetic genes. These observations suggest that CSN promotes the accumulation of active Cop1 in the nucleus and that the action of CSN upon Cop1 is somehow short-circuited by sunlight. However, the biochemical function of CSN and the molecular mechanisms underlying its regulation by light remained unknown.
Our observation that CSN interacts with SCF provided a promising route to unravel the function of CSN, because much is known about the composition and function of SCF (1). To test the hypothesis that CSN in some way regulates the SCF ubiquitin ligase (as it is proposed to regulate the putative Cop1 ubiquitin ligase), we evaluated the abundance, modification state, and localization of SCF subunits in a fission yeast mutant (csn1⌬) that lacks the Csn1 subunit of CSN (13). This analysis revealed that in wild-type cells most of the Cul1 exists in an unmodified state with a small fraction conjugated to the ubiquitin-like protein Nedd8, whereas in csn1⌬ cells all Cul1 molecules accumulate as Nedd8-modified species (9).
Like ubiquitin, Nedd8 becomes attached to other proteins by a pathway that contains E1-and E2-like enzymatic activities (14). The only Nedd8-modified proteins discovered so far are members of the cullin family, and all cullins characterized to date (Cul1-Cul5) are modified with Nedd8. In the case of Cul1, this modification has been studied in detail, and is known to be essential for the genetic activity of cul1 ϩ in fission yeast (15). Moreover, attachment of Nedd8 to human Cul1 increases the ubiquitin ligase activity of SCF by approximately 4-fold (16). In agreement with this, we demonstrated that the catalytic core of SCF is hyperactive in csn1⌬ cells.
A simple explanation for our data is that CSN is required to cleave Nedd8 from Cul1, such that in cells lacking CSN activity the entire complement of Cul1 accumulates as Nedd8modified species. We confirmed this hypothesis by showing that CSN purified from pig spleen efficiently removes Nedd8 from fission yeast Cul1 (9). Based on these observations, we conclude that CSN either possesses intrinsic Nedd8-Cul1 conjugate cleavage activity or binds tightly to and activates the enzyme that carries out this reaction.
The Jab1 subunit of CSN has been implicated in a variety of biological processes, including control of the subcellular localization of the Cdk inhibitor p27 (17), signaling by the integrin LFA1 (18), and gene activation by the c-Jun transcriptional regulatory protein (19). Our findings imply that these diverse physiological activities may be rooted in a simple biochemical function, cleavage of Nedd8 from substrate proteins. The discovery of a specific biochemical activity for CSN should help unravel the physiological role of cycles of cullin neddylation/deneddylation. In addition, our findings will facil- itate identification of novel substrates for CSN and provide insight into how physiological signals (e.g. sunlight) regulate the action of CSN upon its substrates.
Identification of an Unexpected Function for Skp1 by Sequential Epitope Tagging, Affinity Purification, and Mass Spectrometry (SEAM) 1 -To gain further insight into the physiological role and regulation of SCF ubiquitin ligases in budding yeast, we sought to identify proteins that interact with the SCF subunits Skp1 and Cul1 (20,21). The SKP1 and CDC53 (yeast Cul1 is known as Cdc53) chromosomal loci in yeast were modified to encode proteins that contained a C-terminal tag with nine repeats of the Myc epitope. Cultures of SKP1myc9 and CDC53myc9 cells were lysed, and the soluble fractions of the lysates were purified and analyzed as described for Cul1⌬CMyc9. Mass spectrometry-based peptide sequencing revealed 16 total Skp1-and Cdc53-interacting proteins. Many of the proteins identified in our analysis are subunits of SCF, including nine F-box proteins that may form distinct substrate-specific SCF ubiquitin ligase complexes. However, our analysis also uncovered some proteins that were not previously known or predicted to associate with Skp1, including Hrt1, Yjr033 (renamed Rav1), and Ydr202 (renamed Rav2).
To explore further the power of mass spectrometry for deciphering the composition and function of multisubunit protein complexes, we sought to investigate the unique proteins identified in the first analysis by implementation of SEAM.
Accordingly, the RAV1 and RAV2 loci were modified to encode proteins appended with the Myc9 epitope tag, and the tagged proteins were purified and analyzed as described above for Skp1 and Cdc53 (21). Surprisingly, this analysis revealed that Rav1 and Rav2 associate with each other and with Skp1 to form a novel heterotrimeric protein complex that appears to lack Cdc53, Hrt1, and ubiquitin ligase activity. Given the lack of homology of Rav1 and Rav2 to any known proteins, however, the function of the Rav1-Rav2-Skp1 complex remained completely unknown.
To gain insight into the functions of the Rav1-Rav2-Skp complex, we sought to once again repeat the SEAM protocol but this time under reduced stringency. When Rav1Myc9 immunoprecipitates were washed with buffer containing a reduced concentration of non-ionic detergent (0.1 instead of 0.5%), we found that a larger number of polypeptides specifically co-precipitated with Rav1Myc9. Peptide sequencing by mass spectrometry revealed these proteins to be subunits of the V 1 subcomplex of the vacuolar membrane ATPase (V-ATPase). The V-ATPase is analogous to the mitochondrial F-ATPase, in that it is comprised of related multisubunit complexes, the membrane-embedded V 0 and cytoplasmic V 1 . V-ATPase is conserved in all eukaryotes and is implicated in a broad range of processes, including segregation of ligands from receptors (e.g. epidermal growth factor and epidermal growth factor receptor) within the endocytic pathway, acidification of extracellular microenvironments (e.g. in tumors), and solute transport across the lysosomal membrane (22).
Given our finding that Rav1-Rav2-Skp1 complex binds the V1 component of V-ATPase, we sought to test whether the 1 The abbreviations used are: SEAM, sequential epitope tagging, affinity purification, and mass spectrometry; V-ATPase, vacuolar membrane ATPase; TEV, tobacco etch virus; ORF, open reading frame.

FIG. 2. Summary of protein interactions revealed by mass spectrometry-based peptide sequencing of Cdc53-and Skp1-associated proteins.
Solid lines represent protein interactions uncovered by Seol et al. (21). Dotted lines represent protein interactions that were described elsewhere. Boxed areas that overlap the Skp1 oval represent distinct protein complexes that contain Skp1 (e.g. RAVE comprises Skp1, Rav2, and Rav1). Our analysis uncovered multiple F-box proteins (FBPs). For the F-box proteins followed by a question mark, it remains unclear whether they assemble to form SCF complexes or whether they form alternative complexes as do the F-box proteins Ctf13 and Rcy1. complex regulates V-ATPase function, because it is known that the activity of this enzyme is highly regulated by a glucose signaling pathway(s) that remains unidentified (23). A comprehensive series of genetic, biochemical, and cell biological studies confirmed that the Rav1-Rav2-Skp1 complex (dubbed RAVE for regulator of ATPase of vacuolar and endosomal membranes) regulates the assembly of V-ATPase holoenzyme from its component V 1 and V 0 domains. We propose that RAVE governs the reversible assembly/disassembly of V-ATPase from its component V 1 and V 0 domains. This dynamic function we propose for RAVE is likely to be critical for rapid regulation of compartment acidification by V-ATPase. Impressively, mass spectrometry not only identified the novel RAVE complex but also identified a substrate and provided a critical clue to the function of RAVE.
A summary of all Skp1-based complexes that were identified entirely or in part in our analysis of Skp1-and Cdc53associated proteins is presented in Fig. 2. The success of this experiment suggests that a genome-wide analysis of protein complexes by mass spectrometry may shed considerable light on the structure and regulation of the proteome.

Multidimensional Protein Identification Technology: A Tool to Identify the Components, Regulators, and Substrates of Protein Megacomplexes
In parallel with the effort described above, we sought to identify proteins in yeast that interact with the 26 S proteasome. The 26 S proteasome is an example of what can be considered a protein megacomplex. We propose to employ the term megacomplex to define assemblages that differ from common protein complexes like SCF in both the sheer number of subunits they contain and the number of other proteins with which they do business. Examples of megacomplexes include the proteasome, nuclear pore assembly, and spindle pole body. Because of their sheer size and complexity, and their extensive interdigitation within the regulatory fabric of the cell, it is difficult to imagine studying the proteomics of protein megacomplexes by conventional methods. For example, the 26 S proteasome contains 31 different proteins and interacts with potentially hundreds of ubiquitin ligases (24 -26) that in turn deliver thousands of different substrates to be degraded.
We sought to apply a method that would enable us to take detailed snapshots of the composition of the 26 S proteasome in different physiological states, to gain insight into the impact of the proteasome on the proteome, and vice versa. A promising technique for achieving our objectives was recently developed in one of our laboratories (27,28) and is referred to as MudPIT (for multidimensional protein identification technology). The basic idea behind MudPIT is to analyze the composition of immunoprecipitates directly without the interposition of an SDS-PAGE separation step (Fig. 3). A total eluate from an immunoprecipitation is digested with a mixture of proteases, and the resulting peptides are fractionated in two dimensions on a strong cation exchange resin followed by a reversed-phase resin. The eluate from the reversedphase column is introduced directly into an LCQ mass spectrometer via an electrospray interface. The mass spectrometer then identifies the most abundant ion and fragments it to collect peptide sequence data. To avoid sampling the same abundant ions over and over again, the masses of these ions are automatically added to a rolling exclusion list once data has been collected for them. This software maneuver causes the mass spectrometer to ignore a peptide once it has been sequenced and to go about looking for the next most abundant peptide. Using this approach, even very complex biological samples can be compositionally typed by mass spectrometry. For example, Washburn et al. (27) demonstrated that they could identify directly by sequence analysis the presence of ϳ500 proteins in each of three distinct fractions of yeast cell lysate. The application of MudPIT to affinitypurified 26 S proteasomes enabled the identification of every single subunit of this megacomplex, as well as at least one heretofore undetected subunit (2,24). Most interestingly, the MudPIT analysis identified not only authentic subunits of the 26 S proteasome but also a substoichiometric set of proteins that potentially interact with the proteasome (referred to as PIPs, for proteasome-interacting proteins) (24). Rigorously controlled immunoprecipitation/immunoblotting experiments revealed that six of seven PIPs that were analyzed indeed FIG. 3. Comparison of MudPIT with conventional techniques for determination of protein identity by mass spectrometry. Top half, proteins to be analyzed are fractionated by SDS-PAGE, excised, and digested with protease. Resulting peptides are fractionated by liquid chromatography and introduced into a mass spectrometer via an electrospray interface (or fractionated peptides are collected and spotted onto grids for analysis by matrix-assisted laser desorption/ ionization-mass spectrometry). Bottom half, proteins to be analyzed by MudPIT are digested with proteases as an unfractionated mixture, and the resulting peptides are subjected to two-dimensional fractionation on a column packed with strong cation exchange resin (SCX) overlying a reversed-phase resin (RP). Successive automated cycles of step elution from SCX (e.g. 0.1 M salt, 0.2 M salt, etc.) followed by gradient elution from RP reduces complexity of the sample by spreading the peptide mixture over two-dimensional space. Ab, antibody; 1D, one-dimensional; HPLC, high pressure liquid chromatography; mass spec, mass spectrometry; 2D, two-dimensional. specifically associate with the proteasome. The PIPs identified by MudPIT participate in many different aspects of cellular function, including protein assembly, transcription, signaling, cell cycle control, RNA biogenesis, and intermediary metabolism. We have not yet confirmed the physiological significance of the interaction of PIPs with the 26 S proteasome, but given that a majority of the interactions appear to be specific, detailed biological analyses most likely will eventually confirm the relevance of these interactions.
We believe that the results of our MudPIT analysis of the 26 S proteasome have significant implications for understanding how dynamic protein megacomplexes interface with the proteome. The non-proteosomal proteins identified in association with the 26 S proteasome were all present in substoichiometric amounts. Such proteins can be very easy to overlook in traditional approaches where polypeptides are excised from SDS-polyacrylamide gels prior to sequence analysis. Moreover, associated proteins can also be overlooked, because they co-migrate with subunits of the complex under investigation or with contaminants, they are of low molecular weight, they stain poorly with silver, or they are heterogeneous in molecular weight because of phosphorylation, glycosylation, or fragmentation by proteases during purification. Importantly, MudPIT is not susceptible to any of these problems that can bedevil a conventional analysis.

Application of MudPIT to Small Scale Samples of Medium to Low Abundance Proteins
Our analysis of affinity-purified 26 S proteasomes by Mud-PIT suggested that this might be a promising approach for the systematic evaluation of how protein complexes interact with each other to form protein networks in the yeast proteome. Thus, we sought to develop affinity purification methods that would enable us to perform MudPIT analysis on multiple protein complexes prepared in parallel from small scale cultures. However, there are several caveats to our MudPIT analysis of the 26 S proteasome. Most importantly, the 26 S proteasome is a protein complex of fairly high abundance (ϳ0.5% of total cell protein), and thus results obtained with the proteasome might not be representative of results that we could expect to obtain with moderate to low abundance regulatory proteins. As a test-bed for methods development, we employed the SCF subunit Skp1. We selected Skp1 for three reasons. First, we have already analyzed the cohort of Skp1associated proteins by mass spectrometry-based peptide sequencing of polypeptides excised from SDS-polyacrylamide gel slices (21). Thus, we already have a good idea of what proteins we should find. Second, unlike the proteasome, Skp1 is only a moderately abundant protein that we estimate to be present at roughly one part in 3000 of crude cell extract. 2 Third, Skp1 forms multiple different protein complexes, which vary considerably in abundance. For example, the cellular pools of both Cdc53 and Cdc4 associate quantitatively in complexes with Skp1. However, whereas Cdc53 is present at ϳ1/8000 of crude cell extract, Cdc4 comprises only ϳ1/ 50,000 of crude cell extract. 2 Thus, the spectrum of Skp1associated proteins identified by MudPIT should provide a semiquantitative estimate of the relative sensitivity of this technique.
To perform routine MudPIT analysis of purified protein complexes, we developed a tag that contains two distinct epitopes (His8 and Myc9) that flank two copies of the cleavage site for the tobacco etch virus (TEV) protease. A PCR fragment that encodes the His8-TEV2-Myc9 tag is targeted to the 3Ј end of any desired open reading frame in the yeast genome by homologous recombination. Extracts are prepared from the resulting transformant, and tagged protein is adsorbed to 9E10 resin, eluted with TEV protease, and adsorbed onto nickel-nitrilotriacetic acid resin (Fig. 4). Material bound to nickelnitrilotriacetic acid is eluted and lyophilized and then digested with protease prior to being analyzed by MudPIT. A detailed purification protocol will be published elsewhere.
The results of our analyses on Skp1 so far have been very encouraging. Starting with 300 -1000-ml cultures of yeast cells that express Skp1-His8-TEV2-Myc9 (from the endogenous SKP1 locus) we have successfully identified between nine and 19 Skp1-associated proteins in each of four different experiments. Among these, we identified new Skp1-associated proteins that were not identified in our prior efforts (21). Some of these proteins (e.g. Cdc4) are known to be low in abundance, and in two cases the identified proteins, like Rav1 2 R. J. D., unpublished data. and Rav2, were not anticipated but were subsequently confirmed to be authentic Skp1-associated proteins.
To evaluate whether the results obtained with Skp1 can be generalized to other proteins, we analyzed immunoprecipitates of Net1, Cdc15, Kic1, and Bck1. Net1 (29) and Cdc15 3 immunoprecipitates were analyzed previously by cutting specific bands from SDS-polyacrylamide gels and sequencing eluted peptides by mass spectrometry. Our prior analysis of Net1-associated proteins revealed its functional roles in cell cycle control, nucleolar transcription by Pol I, and silencing of nucleolar transcription by Pol II (by virtue of the association of Net1 with Cdc14, RNA Pol I, and Sir2, respectively) (29 -31). By contrast, our prior analysis of Cdc15 failed to identify any specifically associated proteins. MudPIT analyses recapitulated our prior results for these two targets; no credible Cdc15 partners were found, whereas Cdc14, Sir2, and subunits of RNA polymerase I were identified in Net1 immunoprecipitates. In addition, we recovered a strong candidate of interacting protein for the Kic1 protein kinase. Taken together, these results suggest that routine application of MudPIT to a large number of tagged proteins is likely to reveal many unanticipated protein interaction partners. A secondary benefit of this experiment is that it revealed that several background proteins contaminated multiple immunoprecipitates, whereas the authentic interactors were uniquely found in a single immunoprecipitate. Thus, it should be possible to use software to distinguish true interactors from contaminants in a global analysis of protein complexes by MudPIT.
At this juncture, it is reasonable to ask whether the effort required for a global analysis of the yeast protein complexome interactome (i.e. the network of interacting protein complexes) by mass spectrometry is justified, given that protein interactions can be evaluated more quickly and easily by the twohybrid method. Interestingly, a global two-hybrid analysis of protein-protein interactions in yeast evaluated all clonable yeast ORFs for their ability to interact with Skp1 and Cdc53 (32). This analysis yielded 18 potential partners. By contrast, our conventional mass spectrometry analysis summarized above yielded 16 partners, 15 of which were validated. Remarkably, only four overlapping proteins were recovered in 3 R. Azzam and R. J. D., unpublished data.
a C1 (modified with Rub1) and C2 are Cdc53; S10 is Skp1. b Ylr352 was identified as a Cdc53 interactor by affinity purification/protein sequencing and as a Skp1 interactor by genome-wide two-hybrid analysis.
c Yjl149, Yjl204, and Yol133 baits were employed in genome-wide two-hybrid screens but failed to consistently identify any specific interacting proteins. both analyses (Table I). Although the sample size is very small, this observation suggests that global two-hybrid analyses may access less than 20% of the proteome. This interpretation is supported by the surprisingly low degree of overlap between two different global two-hybrid screens (32,33). Thus, to gain a full appreciation of the protein networks that comprise a cell, it will unquestionably be necessary to bring more than one technique to bear on the systematic identification of interacting proteins.

Concluding Remarks
In summary, we have used mass spectrometry-based peptide sequencing to identify four novel protein complexes (Rcy1-Skp1, RAVE, SCF-Hrt, and RENT) (20,21,29) and to uncover two unexpected interactions between previously identified protein complexes (SCF and CSN, RAVE and V 1 -ATPase) (9,21). Most importantly, our collaborative efforts have spurred the discovery of six new biological functions (V-ATPase regulation by RAVE, mechanism of action of SCF and other RING-based ubiquitin ligases via discovery of Hrt1, Nedd8 conjugate cleavage activity of CSN, and the roles of Net1 in cell cycle control, rDNA transcription by Pol I, and silencing of Pol II transcription in the nucleolus). Based on this rich harvest, we believe that direct analysis of protein-protein interactions by mass spectrometry-based peptide sequencing of affinity-purified protein complexes is perhaps the single most productive means available to identify new protein complexes and to assign functions to them. Secondary mass spectrometry-based approaches like SEAM (21) may allow for the systematic characterization of protein networks. Finally, a new mass spectrometry-based technology, MudPIT (27), opens up exciting possibilities for the systematic characterization of the composition of dynamic protein megacomplexes and how they are woven into the regulatory fabric of the cell (24). Even restricted to existing tools, mass spectrometry has a bright future. However, considering the innovative new techniques and powerful new machines that mass spectrometrists will undoubtedly develop over the next few years, we confidently predict that the future will be truly spectacular.
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.