Applicability of Tandem Affinity Purification MudPIT to Pathway Proteomics in Yeast*S

A combined multidimensional chromatography-mass spectrometry approach known as “MudPIT” enables rapid identification of proteins that interact with a tagged bait while bypassing some of the problems associated with analysis of polypeptides excised from SDS-polyacrylamide gels. However, the reproducibility, success rate, and applicability of MudPIT to the rapid characterization of dozens of proteins have not been reported. We show here that MudPIT reproducibly identified bona fide partners for budding yeast Gcn5p. Additionally, we successfully applied MudPIT to rapidly screen through a collection of tagged polypeptides to identify new protein interactions. Twenty-five proteins involved in transcription and progression through mitosis were modified with a new tandem affinity purification (TAP) tag. TAP-MudPIT analysis of 22 yeast strains that expressed these tagged proteins uncovered known or likely interacting partners for 21 of the baits, a figure that compares favorably with traditional approaches. The proteins identified here comprised 102 previously known and 279 potential physical interactions. Even for the intensively studied Swi2p/Snf2p, the catalytic subunit of the Swi/Snf chromatin remodeling complex, our analysis uncovered a new interacting protein, Rtt102p. Reciprocal tagging and TAP-MudPIT analysis of Rtt102p revealed subunits of both the Swi/Snf and RSC complexes, identifying Rtt102p as a common interactor with, and possible integral component of, these chromatin remodeling machines. Our experience indicates it is feasible for an investigator working with a single ion trap instrument in a conventional molecular/cellular biology laboratory to carry out proteomic characterization of a pathway, organelle, or process (i.e. “pathway proteomics”) by systematic application of TAP-MudPIT.

A combined multidimensional chromatography-mass spectrometry approach known as "MudPIT" enables rapid identification of proteins that interact with a tagged bait while bypassing some of the problems associated with analysis of polypeptides excised from SDS-polyacrylamide gels. However, the reproducibility, success rate, and applicability of MudPIT to the rapid characterization of dozens of proteins have not been reported. We show here that MudPIT reproducibly identified bona fide partners for budding yeast Gcn5p. Additionally, we successfully applied MudPIT to rapidly screen through a collection of tagged polypeptides to identify new protein interactions. Twenty-five proteins involved in transcription and progression through mitosis were modified with a new tandem affinity purification (TAP) tag. TAP-MudPIT analysis of 22 yeast strains that expressed these tagged proteins uncovered known or likely interacting partners for 21 of the baits, a figure that compares favorably with traditional approaches. The proteins identified here comprised 102 previously known and 279 potential physical interactions. Even for the intensively studied Swi2p/Snf2p, the catalytic subunit of the Swi/Snf chromatin remodeling complex, our analysis uncovered a new interacting protein, Rtt102p. Reciprocal tagging and TAP-MudPIT analysis of Rtt102p revealed subunits of both the Swi/Snf and RSC complexes, identifying Rtt102p as a common interactor with, and possible integral component of, these chromatin remodeling machines. Our experience indicates it is feasible for an investigator working with a single ion trap instrument in a conventional molecular/cellular biology laboratory to carry out proteomic characterization of a pathway, organelle, or process (i.e. "pathway proteomics") by systematic application of TAP-MudPIT.

Molecular & Cellular Proteomics 3:226 -237, 2004.
To understand the function of a protein, it is crucial to characterize its physical environment: what other proteins is it interacting with under various conditions? Traditionally, this question has been addressed by biochemical fractionation of cell extracts under mild conditions and subsequent identification of the members of a purified protein complex by immunoblotting or peptide sequencing.
Primed by the dawning of the post-genomic era, genomewide yeast two-hybrid interaction screens (1,2) and protein chip-based methods (3) have supplemented traditional purification and identification techniques, allowing broader insight into the interaction networks that constitute a functional cell. Both of these approaches require the creation and maintenance of libraries of tagged proteins and in the case of protein chips the daunting task of purifying and spotting them under conditions that preserve their activity. The potential for detecting nonphysiological protein-protein interactions and the necessity to piece together interaction networks from a catalog of resulting binary interactions further complicate these approaches.
Developed in parallel with two-hybrid and protein chip technologies, mass spectrometry of protein complexes purified through single or tandem affinity steps eliminates the need for complex-specific immunochemicals and enables analysis of very small amounts of sample on a proteome-wide scale (4,5). This approach can be performed under more physiological conditions and substitutes whole-complex analysis for the reconstruction of interaction networks from binary interaction data. However, the Gavin et al. (4) and Ho et al. (5) studies employed SDS-PAGE to separate affinity-purified protein mixtures prior to mass spectrometric analysis, thereby encountering the problems linked to this technique including: limitations of dynamic range of detection, considerable sample parallelization, variable elution efficiency of peptides from the polyacrylamide matrix, and potential selection against proteins with properties that impede analysis by SDS-PAGE (e.g. unusually high or low molecular mass, diffuse migration, comigration with contaminants, and poor binding to stain).
To circumvent these problems, McCormack et al. (6) demonstrated the possibility of analyzing digested protein complexes directly using single-dimensional liquid chromatography. An improvement of this method, multidimensional protein identification technology (MudPIT) 1 (7), extended its applicability to large protein complexes and is a bona fide alternative to gel-based protein separation. MudPIT relies on digestion in solution of the protein mixture to be analyzed and separation of the resulting complex peptide mixture by multidimensional capillary chromatography connected in-line to an ion trap mass spectrometer. Owing to its unique advantages, MudPIT is an attractive alternative to traditional methods for the rapid identification of protein-protein interactions for stoichiometric and substoichiometric partners. MudPIT can also be applied to deconvolve complex sets of proteins related by a common property. For example, Peng et al. (8) applied a multidimensional approach similar to MudPIT to identify hundreds of candidate ubiquitinated proteins in budding yeast cells.
Despite its considerable power, some potential limitations to MudPIT remain to be addressed. For example, it is unclear how reproducible such analyses are. This is of particular concern for analysis of samples that contain many proteins, like that reported by Peng et al. (8). Second, because only individual analyses have been reported to date, it remains unclear what the likelihood of success is for any given MudPIT experiment. The success rate of individual experiments, in turn, is important for the question of whether it will be profitable to scale the MudPIT approach to the rapid analysis of multiple baits. Third, because the issues of reproducibility and scalability have not been addressed, it is not known if the parallel application of MudPIT to multiple proteins will enable filtering approaches to separate bona fide interactors from nonspecific contaminants. Finally, it remains unclear how feasible it will be to transfer cutting-edge proteomic technologies like MudPIT from specialized environments to a conventional cell biology laboratory.
In this study, we address these various issues. We show that the combination of a bipartite affinity tag with MudPIT allows for the rapid analysis of protein complexes. Pilot experiments with Gcn5p confirmed the reproducibility of the technique. Application of MudPIT to a set of 22 expressed baits revealed a success rate comparable to conventional approaches and confirmed the scalability of the approach. Comparison of proteins identified across all MudPIT analyses, comprising diverse baits from different subcellular compartments and pathways, also enabled a filtering strategy to cull nonspecific contaminants. Our experience indicates that multidimensional chromatography in combination with mass spectrometry technology can be readily transferred from a specialized analytical chemistry environment to a traditional molecular cell biology laboratory. Routine application of Mud-PIT may thus enable cell biologists to dissect dynamic changes in protein interactions in response to specific chemical or biological ligands, environmental perturbations, or mutations.

Construction of a Bipartite Affinity Purification Tag
To construct pJS-HPM53H, a 940-bp fragment was PCR-amplified from pJS-TM53H (RDB1344) (9) with the primers HTM A and B (see supplemental Table I). This was used as a template to PCR-amplify a HPM tag containing a 670-bp fragment with the primers HPM C and D (see supplemental Table I), which replaced the XhoI-EcoRI restriction fragment of pJS-TM53H.

Strain Construction
The bipartite affinity purification tags were amplified by PCR from pJS-HPM53H (HPM tag) or pKW804 (modified tandem affinity purification (TAP) tag) ( Table II) with a modified lithium acetate method (11). Integration and expression of the tagged gene product were checked by anti-myc Western blotting of whole-cell lysate using 9E10 monoclonal antibodies (12). Strain RJD2067, carrying a TAP-tagged (13) GCN5 allele, was a gift from Erin O'Shea (University of California, San Francisco, CA).
To knock out SNF2, ARP9, and RTT102, a HIS3 carrying cassette was PCR-amplified from pFA6a-His3MX6 (14) and transformed into RJD415. The primers used (see supplemental Table I) allowed for complete replacement of the respective open reading frames by homologous recombination.

Preparation of Protein Complexes by Dual-Step
Affinity Purification HPM Tag-Yeast cells carrying a HPM-tagged gene were grown in 2.5 liter YPD (1% yeast extract, 2% bacto-peptone, 2% glucose) to OD 600 nm Ϸ 1.5. Cell extract was prepared by glass beading in TNET (20 mM Tris⅐HCl, pH 7.5, 150 mM NaCl, 0.1 mM EDTA, 0.2% Triton X-100) supplemented with 10 g/ml aprotinin, 10 g/ml leupeptin, 10 g/ml chymostatin, and 2 g/ml pepstatin A. The extract was cleared by centrifugation at 100,000 ϫ g and 4°C for 30 min. Crude extract (300 mg of total protein in a 14-ml volume) was incubated with 200 l of 9E10 ␣-myc (12)-coupled protein A Sepharose beads (Sigma, St. Louis, MO) for 1.5 h at 4°C. The beads were washed three times in 50 bead volumes cold TNET, resuspended in 300 l of TNET, and adjusted to 1 mM dithiothreitol. Protein complexes were eluted for 25 min at room temperature by addition of 10 U of glutathione Stransferase-tagged PreScission Protease (Amersham, Piscataway, NJ), and protease carryover was reduced by 10 min of further incubation with 1/10 9E10 bead volumes of glutathione Sepharose 4B beads (Amersham).
For the second affinity purification step, 20 l of nickel nitrilotriacetic acid (Ni-NTA) agarose beads (Qiagen, Valencia, CA) were added to 200 l of supernatant from the first step, and the sample was rotated for 1 h at 4°C. The beads were washed three times with 25 bead volumes of cold TNET and twice with 25 bead volumes of cold TNE (20 mM Tris⅐HCl, pH 7.5, 150 mM NaCl, 0.1 mM EDTA). Proteins were eluted by addition of 50 l of 100 mM EDTA, and the resulting supernatant was lyophillized.
After protein extraction, 200 l of immunoglobulin G (IgG) Sepharose (Amersham) was added to 300 mg total protein in a volume of 14 ml. This slurry was incubated at 4°C, rotating for 2 h. After incubation, the resin was washed three times with 50 bead volumes of IPP150 and once with 50 bead volumes of tobacco etch virus (TEV) protease cleavage buffer (10 mM Tris⅐HCl, pH 8.0, 150 mM NaCl, 0.1% Nonidet P-40, 0.5 mM EDTA, 1 mM dithiotreitol). The IgG Sepharose was resuspended in 300 l of TEV protease cleavage buffer containing 100 U of TEV protease (Invitrogen, Carlsbad, CA) and incubated at room temperature, rotating, for 45 min. The bead supernatant (280 l) was then retrieved and mixed with 840 l of calmodulin binding buffer (10 mM ␤-mercaptoethanol, 10 mM Tris⅐HCl, pH 8.0, 150 mM NaCl, 1 mM magnesium-acetate, 1 mM imidazole, 2 mM CaCl 2 , 0.1% Nonidet P-40), 0.84 l of 1 M CaCl 2 , and 200 l of calmodulin beads (Stratagene, La Jolla, CA). This mixture was incubated for 1 h at 4°C, with rotating. After incubation, the beads were washed three times with 5 bead volumes of calmodulin binding buffer and eluted two times with 250 l of calmodulin elution buffer (10 mM ␤-mercaptoethanol, 10 mM Tris⅐HCl, pH 8.0, 150 mM NaCl, 1 mM magnesium-acetate, 1 mM imidazole, 2 mM EGTA, 0.1% Nonidet P-40). The eluate was trichloroacetic acid-precipitated, and the pellet was washed two times with ice cold acetone.
Modified TAP Tag-The protocol for affinity purification of Gcn5p tagged with the modified TAP tag was adapted from Cheeseman et al. (10) and was identical to the TAP protocol up through the TEV protease treatment. After TEV protease digestion, 50 l of protein S Agarose (Novagen, Madison, WI) was added to 280 l of the supernatant, and the slurry was incubated, rotating, at 4°C for 1.5 h. The beads were washed three times with 10 volumes of IPP150, once with IPP150 without Nonidet P-40, and then with 50 mM Tris⅐HCl, pH 8.5, 5 mM EGTA, 1 mM EDTA, 75 mM KCl. The protein was eluted in 50 l of 100 mM Tris⅐HCl, pH 8.5, 8 M urea for 30 min at room temperature.
Proteolytic Digest-Protein samples were proteolytically digested as follows: lyophillized protein mixtures were resolubilized in 40 l of 8 M urea, 100 mM Tris⅐HCl, pH 8.5, and reduced by incubation at a final concentration of 3 mM tris(2-carbosyethyl)phosphine (Pierce, Rockford, IL) for 20 min at room temperature. Reduced cysteines were subsequently alkylated by addition of iodoacetamide (10 mM final concentration) and incubation for 15 min at room temperature. Proteolysis was initiated with 0.1 g endoproteinase Lys-C (Roche) and allowed to proceed for 4 h at 37°C. The sample was then diluted 4-fold by addition of 100 mM Tris⅐HCl, pH 8.5, and adjusted to 1 mM CaCl 2 . Next, 0.5 g of sequencing grade trypsin (Roche) was added and the mixture incubated overnight at 37°C. The digest was quenched with the addition of formic acid to 5% and stored at Ϫ20°C.
MudPIT-The peptide mixtures were separated utilizing a triphasic microcapillary column as described in McDonald et al. (15). A fused silica capillary with an inner diameter of 100 m (PolyMicro Technology, Phoenix, AZ) and a 5-m diameter tip pulled with a P-2000 capillary puller (Sutter Instrument Company, Novato, CA) was packed with 6.5 cm of 5-m Aqua C18 reverse phase material (Phenomenex, Ventura, CA), 3.5 cm of 5-m Partisphere strong cation exchanger (Whatman, Clifton, NJ), and another 2.5 cm of 5-m Aqua C18 (in this order from the tip). The sample was pressure-loaded onto the column. In the event of irreversible column clogging, the 6.5-cm 5-m Aqua C18 separation phase was replaced by an inline microfilter assembly (UpChurch Scientific, Oak Habour, WA) and a 250-m ID fused silica collection capillary to reduce the overall back pressure. A 6.5-cm 5-m Aqua C18 separation phase was spliced onto the setup after completion of loading. We noted that the presence of EDTA in the sample may increase the risk of clogging events.
Step 1 consisted of an 80-min gradient to 40% buffer B followed by a 10-min gradient to 100% buffer B and 10 min of 100% buffer B. Chromatography steps 2-5 followed the same pattern: 3 min of 100% buffer A followed by a 2-min buffer C pulse, a 10-min gradient to 15% buffer B, and a 100-min gradient to 45% buffer B. The buffer C percentages used were 5, 12.5, 25, and 40%, respectively, for the steps. The terminal step consisted of 3 min of 100% buffer A, 20 min of 100% buffer C, a 10-min gradient to 15% buffer B, and a 100-min gradient to 55% buffer B. The flow rate through the column was ϳ150 nl/min.
Eluting peptides were electrosprayed into the mass spectrometer with a distally applied spray voltage of 2.4 kV. The column eluate was continuously analyzed during the whole six-step chromatography program. One full-range mass-scan (400 -1400 m/z) was followed by three data-dependent tandem mass spectrometry (MS/MS) spectra at 35% collision energy in a continuous loop.
Both HPLC pump and mass spectrometer where controlled by the Xcalibur software (ThermoElectron).

Data Analysis
In a first step, MS/MS spectra recorded by Xcalibur were analyzed for their charge state and controlled for data quality by 2to3 (16). The data was then searched by SEQUEST (17) against the translated Saccharomyces Genome Database (SGD; release time stamped 05/ 23/03) (18) supplemented with common contaminants (e.g. keratins) on a Linux cluster comprised of 20 1.8-GHz Athlon CPUs (Racksaver, San Diego, CA). DTASelect (19) filtered the SEQUEST results according to the following parameters: minimum cross correlation coefficients of 1.8, 2.5, and 3.5 for singly, doubly, and triply charged precursor ions, respectively, minimum ⌬C n of 0.08, and a minimum requirement of two peptides per protein.
The resulting data was annotated and sorted with the Python script RAYzer. Annotation was added from SGD annotation tables (table release time stamped 06/07/03) (18) and interaction data curated by the Munich Information Center for Protein Sequences (MIPS) Comprehensive Yeast Genome Database (CYGD; release time stamped 04/29/03) (20,21), the General Repository for Interaction Datasets (GRID; release 1.0) (22), and the Yeast Protein Database (YPD; as of 06/09/03) (23). Based on known interaction annotation and the frequency of appearance in a reference dataset containing one representative experiment for every tagged open reading frame in this study (n ϭ 22), the data were then sorted into three tables: previously reported interactors retrieved in the experiment, potential new interacting proteins detected, and likely contaminants (see supplemental material). Proteins recovered in greater than 20% of the experiments in the reference dataset were automatically considered contaminants (see "Discussion").

HPM Tag
We constructed a bipartite affinity tag composed of nine histidines and nine myc-epitopes separated by two PreScission protease (24,25) cleavage sites (HPM tag, Fig. 1; see "Experimental Procedures"). Homologous recombination enables chromosomal integration of the PCR-amplified cassette in S. cerevisiae his3 strains at the 3Ј end of open reading frames targeted for affinity purification.
Using this cassette, we tagged a test set of 25 gene products involved in transcription and progression through mitosis (see supplemental Table II) and established a variant of the TAP protocol (13) that employs affinity chromatography on a 9E10 monoclonal antibody resin followed by elution with Pre-Scission Protease and adsorption to Ni-NTA resin (see "Experimental Procedures"). For simplicity's sake we refer to our protocol as "TAP" even though our tandem tag design requires a different purification protocol. Preliminary mass spectrometric analyses showed that the eluates from the 9E10 resin still retained a high level of contaminating protein background (data not shown), and thus subsequent analyses were performed only on samples that were subjected to the complete TAP protocol. A representative SDS-PAGE analysis of the purification of four gene products is shown in Fig. 2.
The effectiveness and reproducibility of our overall approach was evaluated by analyzing the intensively studied histone acetyltransferase (HAT) Gcn5p (see Fig. 3). Of the 23 previously reported interactors that were identified here, our experiments captured 15 (65%) in all three replicates and an additional 5 (22%) in two out of three attempts, including 18 known members of the Spt-Ada-Gcn5-acetyltransferase complex (SAGA)/SAGA-like acetyltransferase complex (SLIK) and ADA-HAT complexes (26 -29). The majority of these validated partners ranked at the top of the list when the recovered proteins were sorted based on the size-normalized number of unique peptides sequenced per protein. These data indicate that TAP-MudPIT shows a high degree of reproducibility and robustness independent of fluctuations in the sample quality of the individual experiment (see e.g. varying peptide recovery for the bait in Fig. 3).
Previous reports employed the original bipartite TAP tag and a modified TAP tag for TAP (4,10,13). A direct comparison of Gcn5p-TAP, Gcn5p-modified TAP, and Gcn5p-HPM revealed that the set of previously known interactors identified with the different tags are well within the margins of variability between independent experiments performed with the HPM tag (Table I).
Remarkably, our comparative analysis of Gcn5p purifications yielded strong candidates for six new Gcn5p interactors. YCR082W, a nonessential gene product (30,31) with unknown function, was found in all five Gcn5p purifications but was not recovered with any of the other baits that we analyzed. YCR082W exhibits a two-hybrid interaction with Ahc1p (1, 2), which together with Gcn5p is a member of the ADA-HAT complex (27). Another candidate is Msn4p, a nonessential (31, 32) major transcriptional regulator of stress responses (33). Msn4p was recovered in four of the five Gcn5p pull-down experiments but was not recovered with any of the other baits. This finding is interesting in the light of evidence that promoters activated by Msn4p and its partner Msn2p show increased histone H4 acetylation (34). Other potential interaction partners include YPL047W (present in two of the HPM purifications and the TAP purification) and histones Hta1p/Hta2p and Imd4p (in TAP, modified TAP and one HPM pull-down). Other gene products recovered in more than two of the experiments are mostly ribosomal proteins that are likely contaminants. Finally, the interaction observed between Gcn5p and Swi1p in the TAP tag experiment was previously proposed only on the basis of their synthetically lethal genetic interaction (35).

Screening for Interactions
Having established the relative reproducibility of TAP-Mud-PIT and the comparability of the HPM tag to other available bipartite affinity tags, we set out to address three issues. First, Samples were prepared and analyzed as described under "Experimental Procedures." The column "Known Interactor?" indicates whether the gene product is a previously known Gcn5p interactor according to MIPS, GRID, and YPD. The column "Gene Product" represents the name of the protein according to SGD. Red, yellow, and plain background indicate recovery of the protein in three, two, or one experiment out of three, respectively. The column "Frequency in Reference Set" lists the frequency with which the gene product was retrieved in the complete dataset (n ϭ 22). The column "Length (AA)" represents the length of the open reading frame (ORF) in amino acids according to SGD. Columns "Exp. 1-3" list the number of unique and total peptide hits assigned to the ORF for each of the three experiments. Gene products are listed in descending order starting we wished to determine what fraction of TAP-MudPIT experiments yield usable results. Second, we hoped to determine whether the parallel application of MudPIT to numerous baits would enable us to cull nonspecific contaminants by comparing protein identifications across multiple experiments. Third, we wanted to test whether it will be feasible for an investigator in a cell biology laboratory to work at the scale needed to dissect a biological pathway or process by systematic application of MudPIT to a few dozen gene products. To addresses these questions, we screened for new protein-protein interactions in a test set of 25 gene products involved in transcription and progression through mitosis. Table II summarizes the results and gives an overview of potential new interactors. The complete dataset may be found in the supporting online material.
Of the original set of 25 gene products that we set out to tag and purify, 21 yielded utilizable results. We were unable to amplify the HPM cassette with primers to tag CDC5 and ESS1, while TAP-MudPIT experiments for Bir1p-HPM and Nbp1p-HPM resulted in little or no recovery of the tagged baits themselves. Of the 21 "successful" purifications that yielded sequence coverage for the tagged bait, 20 of the experiments (95%) yielded interacting proteins that are either true binding partners validated by other direct approaches, probable binding partners that display genetic interaction with the bait, or candidate binding partners that were found in association with only one bait. The Pho2p-HPM experiment yielded "hits" only from proteins that were found associated with other, unrelated baits or were otherwise deemed to be likely contaminants.
The set of bait proteins evaluated in this study overlaps considerably with the Ho et al. effort (5). Fig. 4 compares the retrieval of physical interactors for 13 gene products used as baits in both studies. Notably, in each case our approach identified at least as many or more of the previously known with the highest average-length normalized number of unique peptide identifications. Data for highly homologous ORFs with identical length, identical peptide representation across experiments, and identical frequency in the reference set have been merged. Ty-element-related ORFs have been excluded from the analysis.

TABLE I Comparison of TAP-MudPIT analyses using different bipartite affinity tags to Gcn5p
Samples were prepared and analyzed as described in "Experimental Procedures." The column "Gene product" represents the name of the gene product recovered and known to interact with Gcn5p according to GRID, MIPS, and YPD. "Exp. 1-3" represent three independent affinity purifications of Gcn5p-HPM. "TAP tag" and "Modified TAP tag" represent TAP-MudPIT experiments performed with strains in which the GCN5 locus was tagged with either the TAP (13) or modified TAP tag from (10). The numbers of unique peptides from each open reading frame that were sequenced are shown (with the total number of sequenced peptides in parentheses). The last column lists the frequency with which the gene product is found in the entire dataset (n ϭ 22). For example, a gene product found in association with a single bait has a frequency of 4.55% (1/22). The GRID, MIPS, and YPD interaction databases contain 83 additional gene products classified as interacting with Gcn5p, but not recovered in our analyses. ---6 (6) 12 (12) 0.00%

TABLE II Potential new interactors for a test set of HPM-tagged proteins
Samples were prepared and analyzed as described in "Experimental Procedures." The column "Known interactors-Total" lists the number of physical/genetic interactions reported for the bait in the combined GRID/MIPS/YPD databases. "Known interactors-Recovered" represents the number of known physical/genetic interactors experimentally retrieved in this study. Partners marked "*" are reported to interact physically as well as genetically. The column "Potential new interactors" contains all gene products identified by TAP-MudPIT, which are not listed as known interactors and are recovered in association with less than 20% of the baits analyzed (n ϭ 22). only in that study should be considered as tentative, pending verification by independent methods. The second issue that we addressed was the feasibility of using a filtering approach to cull nonspecific contaminants  from the list of proteins identified in each TAP-MudPIT experiment. The idea is that nonspecific proteins should show up in a high fraction of experiments, whereas specific interactors should only show up in one or a small number of experiments (depending upon the degree of functional relatedness of the tagged genes in the query set). We found that proteins that were identified in five or more TAP-MudPIT experiments tended to have a high codon adaptation index (36; data not shown), which is a rough measure of abundance (37). Based on this correlation, we automatically considered proteins found in more than five experiments to be probable contaminants. A similar filtering approach was employed by Gavin et al. (4) and Ho et al., (5) but because their datasets were much larger they were able to employ lower thresholds.

Bait
To showcase the possibility of identifying new potential interacting partners in any given TAP-MudPIT experiment, we analyzed in more detail our results for Snf2p-HPM. Snf2p is a subunit of the Swi/Snf complex and founding member of the ATP-dependent family of chromatin remodeling factors (38). TAP-MudPIT analysis of Snf2p-HPM yielded eight of the nine known members of this complex (Arp7p, Arp9p, Snf5p, Snf6p, Swi1p, Swi3p, Snf12p, Taf14p; missing: Snf11p) (39 -41) as well as YFL049W, a protein of unknown function reported to copurify with Snf2p via its interaction with Snf5p (4). A prominent Snf2p-HPM copurifying protein that was not commonly retrieved by other baits was Rtt102p, a protein of unknown function, whose inactivation results in a slight increase in Ty1 retrotransposon mobility (42). To check whether the interaction of Snf2p with Rtt102p was reciprocal, we tagged the Rtt102p locus with sequences encoding the HPM epitope and performed TAP-MudPIT analysis for Rtt102p-HPM. This experiment yielded all of the subunits of the Swi/Snf chromatin remodeling complex that copurified with Snf2p-HPM (see above), as well as all subunits of the RSC chromatin remodeling complex (Npl6p, Rsc1p, Rsc2p, Rsc3p/Rsc30p, Rsc4p, Rsc58p, Rsc6p, Rsc8p, Rsc9p, Sfh1p, Sth1p) (26,41,43). YFL049W copurified with Rtt102p-HPM as well as with Snf2p-HPM, further strengthening the case that it is a bona fide Swi/Snf component. These results suggest that Rtt102p, like Arp7p and Arp9p (41,44), is specifically associated with the Swi/Snf and RSC chromatin remodeling complexes and may be an integral component of both.
Knockouts of Swi/Snf complex members show reduced growth on sucrose/antimycin, galactose/antimycin, and glycerol (44). When tested for growth on these carbon sources, a rtt102⌬ strain grew similar to wild type on glucose, sucrose/ antimycin, and galactose/antimycin but exhibited a severe growth phenotype on glycerol (see Fig. 5), further supporting a functional Rtt102p-Swi/Snf connection. DISCUSSION A key goal of proteomics research is to identify and characterize protein interaction networks. Several approaches have been taken to achieve this goal, including genome-wide two-hybrid analyses and protein chip-based approaches (1)(2)(3). A limitation of both of these methods is that they primarily reveal binary interactions. Large-scale mass spectrometric analyses of affinity-purified protein complexes have been reported by two different groups (4,5). Whereas this approach bypasses some of the key limitations of two-hybrid and protein chip assays, the efforts reported so far were based on gel separation of purified proteins, which both greatly increased the number of mass spectrometry runs required to analyze each bait and limited the dynamic range to proteins that could be stained and visualized on the same gel. Indeed, both efforts were carried out in an industrial context that cannot be readily adapted to a conventional molecular/cellular biology laboratory. We believe this is an important issue, because unlike the genomic sequence, the protein interactions that exist in a cell or organism are not a finite and bounded set that can be determined as a complete "reference" knowledge set. Rather, their most important feature is that they change as a function of intracellular and extracellular signals, and learning how they change is essential for probing the cellular processes of interest. Thus, to characterize fully the protein interaction networks in a cell and their dynamic changes over time, it will be necessary to perform multiple analyses under different conditions and in different genotypes. In this sense, mass spectrometry-based proteomics resembles microarraybased transcriptomics. This fact underscores the need for simple, reproducible, rapid, portable (i.e. can be performed outside of a specialized mass spectrometry environment), yet powerful methods for exploring protein interaction networks.
We show here that a combination of double affinity purification and multidimensional capillary chromatography in line to mass spectrometry (TAP-MudPIT) fulfills these criteria. TAP-MudPIT can be applied to rapidly identify interacting proteins for any given bait in a single mass spectrometry analysis. Using this approach, a single investigator working with a single mass spectrometer and performing the complete protocol from affinity purification to data analysis can readily screen 20 samples per month (i.e. 20 different baits or one bait evaluated under 20 different conditions). Thus, it is feasible for a single investigator to perform, in a reasonable time frame, a thorough analysis of a focused collection of baits that define a particular organelle, pathway, or process.
It should also be noted that in addition to protein identification, the TAP-MudPIT approach enables the parallel analysis of posttranslational modifications (45).
Although an exhaustive analysis of every one of the 22 TAP-MudPIT experiments that we performed (21 from the original collection of baits plus Rtt102p) is beyond the scope of this paper, we wish to highlight several interesting points. First, our analysis of Swi2p/Snf2p identified a new interacting partner, Rtt102p, which is remarkable given the large body of work that has already been performed on this extensively characterized protein and its interacting partners. Second, we uncovered Trf4p as a candidate partner of the cohesin Mcd1p/Scc1p. Trf4p was originally reported to function as an alternative DNA polymerase that mediates sister chromatid cohesion (46), but this proposal has been the subject of controversy following the report that Trf4p can catalyze polymerization of poly(A) tails on mRNA transcripts (47). Third, Bub3p was found as a Cdc20p-associated protein and Mcd1p/Scc1p was found as a Pds5p-associated protein. Although these pairs of proteins were already known to function together in mitotic checkpoint signaling and sister chromatid cohesion, respectively, a physical association of the yeast proteins has not been reported. Finally, in addition to Trf4p, Mcd1p/Scc1p retrieved the Csm1p subunit of monopolin and the Nuf2p subunit of the Tid3p/Nuf2p/Spc24p/Spc25p centromere-binding complex (48). Both interactions are excellent candidates to subserve a role in chromosome segregation given the known functions of the proteins involved.
Analysis of Rtt102p, identified here as a Swi2p/Snf2p interactor, illustrated the power of this system for making fast and simple first-order interaction validation. This was accomplished by a reciprocity test, in which Rtt102p was shown to specifically retrieve Swi2p and other known components of the Swi/Snf complex. Because this is an independent determination, it provides a more convincing confirmation for an interaction than a mere repetition of the initial measurement. The experiment also illustrates how TAP-MudPIT can be used for directed interaction "walks" (9), in this case showing that Rtt102p also interacts with, or is a component of, the RSC chromatin remodeling complex.
Whereas TAP-MudPIT is sufficiently robust to be applied in a nonspecialized environment, two substantial problems remain to be addressed. First, the interpretation of the data that is generated would benefit from improvement. The combination of 2to3 (16), SEQUEST (17), and DTASelect (19) enables analysis and display of raw mass spectrometric data. What is missing, however, are tools that simplify interpretation of the massive amount of data generated by the analysis of a protein interaction network of even modest size. In particular, separating good candidates for novel interaction partners from the contaminating chaff is a major challenge. We followed the approach used by Gavin et al. (4) and Ho et al. (5) by excluding from consideration any protein that was found associated with more than 20% of the baits analyzed (the comparable thresholds were 3% in Ho et al., 3.5% in Gavin et al.). When applied to the proteins found in all three independent Gcn5p-HPM TAP-MudPIT analyses shown in Fig. 3, our filter threshold retains only the previously known interactors and the potential new Gcn5p-interacting protein YCR082Wp.
A problem with excluding candidates by this criterion is that we were not using an unbiased reference dataset. Because the proteins that we analyzed are all involved in either transcription or mitosis, it is possible that some true interacting proteins were improperly excluded.
The complete dataset contains a total of 464 potential interactions passing the requirement of being associated with less then 20% of the baits analyzed. However, this subset includes ribosomal, cytoskeletal, and other proteins that, due to their abundance, have a high probability of being contaminants. Discarding Ty-Element-related proteins and applying a filter that allows a maximum codon adaptation index of 0.6 eliminates these problematic candidates and reduces the number of potential new interaction partners identified to 279.
In addition to "post hoc" approaches, honing the purification protocol and making it more stringent may lessen the problem posed by contaminating proteins. However, this comes at the possible expense of disrupting specific interactions. When analyzing a single bait under varying conditions, optimizing the purification may greatly improve the specificity of the purification, but optimization becomes a daunting task when dealing with multiple baits.
The second major problem arises from the databases used to biologically annotate the gene products identified by Mud-PIT. Given the amount of data produced by a MudPIT experiment, machine readability of databases is of great value. Unfortunately, of the databases used in this study only the regularly updated data in SGD and MIPS CYGD is readily accessible in an automated manner (ftp). GRID data can be manually downloaded in a tab-delimited file, but YPD does not allow any such access, and thus requires manual merging of its annotation data into a computationally annotated dataset.
As more and more large-scale analyses are performed, an issue that looms large for the future is how to evaluate the quality of the datasets. Even relatively small-scale analyses like the one reported here are prone to produce false positives (e.g. the large number of ribosomal proteins classified as potential interactors for Pho4p in Table II). As a specific example of this problem, consider Adh1p (alcohol dehydrogen-ase). Adh1p is annotated in YPD as a protein in complex with Gcn5p and Snf2p, because Adh1p was reported to copurify with these proteins in TAP experiments using Spt15p and Med2p (Gcn5p) or Enp1p (Snf2p) as bait proteins (4). However, given that we found Adh1p associated with 86% of our baits, it is most likely a common contaminant that nevertheless cleared the filter imposed by Gavin et al. (4). An important challenge is to generate databases that express the likelihood that a protein-protein interaction is relevant based on the number of independent analyses (and methods) upon which the conclusion is based.
In conclusion, we report the application of TAP-MudPIT, tandem affinity purification coupled with multidimensional capillary chromatography in line to mass spectrometry, to identify binding partners for a set of 22 budding yeast proteins involved in gene regulation or progression through mitosis. Our analysis uncovered 102 previously known and 279 potential physical interactions. TAP-MudPIT is simple, rapid, reproducible, and can be carried out in a traditional cell biology laboratory. The simplicity and power of this method enables a depth of analysis that will facilitate thorough characterization of protein interaction networks.