Localization of Organelle Proteins by Isotope Tagging (LOPIT)*S

We describe a proteomics method for determining the subcellular localization of membrane proteins. Organelles are partially separated using centrifugation through self-generating density gradients. Proteins from each organelle co-fractionate and therefore exhibit similar distributions in the gradient. Protein distributions can be determined through a series of pair-wise comparisons of gradient fractions, using cleavable ICAT to enable relative quantitation of protein levels by MS. The localization of novel proteins is determined using multivariate data analysis techniques to match their distributions to those of proteins that are known to reside in specific organelles. Using this approach, we have simultaneously demonstrated the localization of membrane proteins in both the endoplasmic reticulum and the Golgi apparatus in Arabidopsis. Localization of organelle proteins by isotope tagging is a new tool for high-throughput protein localization, which is applicable to a wide range of research areas such as the study of organelle function and protein trafficking.

Determining the subcellular localization of novel proteins is an important step toward elucidating their role in the cell, because proteins are spatially organized according to their function (1). The enrichment of subcellular compartments followed by the identification of their protein contents by proteomics is a powerful method for rapid protein localization. However, the precise localization of proteins has been hindered by difficulties in preparing pure organelles. This is essential, because confident localization of a protein to a specific organelle requires that the preparation is free of contamination from other organelle types (2). This is particularly problematic in the case of the endomembrane system, owing to the similar densities of its component organelles and the continuous cycling of membranes and proteins between these compartments. One solution to this problem is the use of analytical rather than preparative centrifugation. Analytical centrifugation is an established method for assigning individ-ual proteins to subcellular compartments that have eluded purification and refers to the analysis of protein distributions in density gradients, as opposed to preparative centrifugation, which is the analysis of single organelle-enriched fractions (3,4). A recent proteomic study of the human centrosome used this approach to distinguish contaminants from genuine components of this complex (5).
We have developed a proteomics technique for the localization of integral membrane proteins to compartments of the endomembrane system in Arabidopsis thaliana that is based upon analytical centrifugation and hence is not dependent on the production of pure organelles. The technique involves partial separation of organelles by density gradient centrifugation followed by the analysis of protein distributions in the gradient by ICAT and MS. Multivariate data analysis techniques are then used to group proteins according to their distributions and hence localizations. In this study, we demonstrate that the localization of organelle proteins by isotope tagging (LOPIT) 1 can be used to discriminate Golgi, endoplasmic reticulum (ER), plasma membrane (PM), and mitochondrial/plastid proteins. We validate this technique by confirming the localization of a set of predicted ER and Golgi membrane proteins.

EXPERIMENTAL PROCEDURES
Membrane Fractionation-A. thaliana liquid callus cultures expressing a myc-tagged version of gtl-6 were established and maintained as described (6,7). Approximately 60 g of callus were homogenized in homogenization buffer (0.25 M sucrose, 10 mM HEPES-NaOH, pH 7.4, 1 mM EDTA, 1 mM DTT) at 4°C, using a polytron with a 94 PTA 10EC head attachment (Kinematica, Littav, Switzerland) for two pulses of 7 s. The homogenate was then centrifuged twice for 5 min at 2,200 ϫ g. The supernatant was loaded onto a 6-ml 18% iodixanol cushion and centrifuged at 100,000 ϫ g for 2 h in a SW28 rotor (Beckman Coulter, Fullerton, CA). Crude membranes were then collected from the interface and adjusted to 16% iodixanol. These membranes were then fractionated according to their density by centrifuging at 350,000 ϫ g for 3 h in a VTi65.1 rotor (Beckman Coulter), during which time an iodixanol gradient is generated. Fractions of 0.5 ml were harvested from the top of the gradient using an Auto Densi-flow collection device (Labconco Corporation, Kansas City, MO). fraction were diluted with 0.8 ml of cold deionized water, pelleted by centrifugation at 100,000 ϫ g for 25 min in a TLA 100.3 rotor (Beckman Coulter) and resuspended in 600 l of 4% SDS sample buffer. Then 7 l of each fraction was heated at 60°C for 3 min and then resolved on a 12% SDS-PAGE gel, processed for Western analysis according to Wee et al. (8), and probed with rabbit anti-myc (A14, Santa Cruz Biotechnology, Santa Cruz, CA) and anti-Sec12 antibodies (a gift from S. Mogelsvang).
Membrane Protein Preparation and Labeling-To remove soluble and membrane-associated proteins, 0.8 ml of cold 160 mM Na 2 CO 3 was added to each density gradient fraction. After 30 min on ice, the washed membranes were pelleted as above. After removing the supernatant, the pellet was washed with 1 ml of cold de-ionized water and then re-pelleted by centrifuging at 100,000 ϫ g for 10 min. The second supernatant was removed and the membrane pellets solubilized in 100 l of labeling buffer (50 mM Tris-HCl, pH 8.5, 8 M urea, 2% Triton X-100, 0.1% SDS). The protein concentration of each fraction was then determined using the DC Protein Assay kit (Bio-Rad, Hercules, CA). Using cleavable ICAT reagents (Applied Biosystems, Framingham, MA), six pair-wise comparisons were performed on the first 12 fractions collected, where 1 is the least dense fraction and 12 the most dense: 1 versus 4, 2 versus 5, 3 versus 6, 4 versus 7, 5 versus 8 and 7 versus 11 and 12 (which were pooled). For each comparison, the lighter fraction was labeled with the light ICAT reagent and the denser fraction with the heavy ICAT reagent. Labeling was performed according to the standard Applied Biosytems ICAT protocol (www. appliedbiosystems.com) with the following exceptions: labeling buffer (50 mM Tris-HCl, pH 8.5, 8 M urea, 2% Triton X-100, 0.1% SDS) was used in place of denaturing buffer (50 mM Tris-HCl, pH 8.5, 0.1% SDS); reduction was performed for 30 min at 20°C; labeling was performed at 20°C; excess ICAT reagents were quenched by the addition of 10 mM cysteine after labeling and then incubated at 20°C for 30 min before pooling the samples; prior to digestion the pooled samples were diluted to 2 ml with 25 mM Tris-HCl, pH 8.5, 10 g of modified trypsin were then added (Promega, Madison, WI); prior to the cation exchange step samples were diluted to 25 ml with 10 mM KH 2 PO 4 , pH 3, 25% ACN. After cleavage samples were resuspended in 25 l of 2% ACN, 0.1% TFA.
Analysis of Peptides by MS-Peptide separation by LC-MS/MS was performed using an Ultimate Nano-LC system (Dionex, Sunnyvale, CA) and a QSTAR XL Q-TOF hybrid mass spectrometer (Applied Biosystems). Samples (6.4 l) were delivered to the system using a FAMOS autosampler (Dionex) at 30 l/min, and the peptides were trapped onto a PepMap C18 pre-column (5 mm ϫ 300 m i.d.; Dionex). Peptides were then eluted onto the PepMap C18 analytical column (15 cm ϫ 75 m i.d.; Dionex) at ϳ100 nl/min and separated using a 4-h gradient of 5-32% ACN. The QSTAR XL was operated in information-dependent acquisition mode, in which a 1-s TOF MS scan from 350 -1,600 m/z, was performed, followed by 3-s product ion scans from 65-1,600 m/z on the two most intense doubly or triply charged ions. Each sample was analyzed three times to improve the quantitation accuracy and identification confidence.
Peptide Identification and Quantification-MS data files were analyzed using ProICAT software version 1.0 service pack 2.0 (Applied Biosystems) and searched against the MIPS Arabidopsis protein database (ftpmips.gsf.de/cress/arabiprot/version 110903 containing 26643 entries). Peptides identified with less than 90% confidence were excluded. Multiple peptide matches resulting from the same MS/MS data were removed using Microsoft Access with the queries included in the ProICAT software. To further increase identification stringency, the proteins identified using ProICAT were verified by analyzing the MS data using MASCOT Daemon version 2.0.01 (Matrix Science, London, United Kingdom). Proteins with scores above 33 were identified with 95% confidence. Proteins positively identified by ProICAT but with MASCOT scores below this threshold were discarded. In addition, the 14 proteins with MASCOT scores below 40 were verified by manual inspection of the corresponding MS/MS spectra. ProICAT automatically measures the ratio of the heavylabeled peptide to the light-labeled peptide. The heavy-to-light ratio for each protein is the mean ratio of its component peptides from the three MS runs. Only proteins with quantification information from at least four of the six ICAT comparisons were included in the final results table. Prediction of trans-membrane helices was performed using the web-based TMHMM software (www.cbs.dtu.dk/services/ TMHMM-2.0/) (9).
Multivariate Data Analysis-The logged ICAT ratios obtained for each protein in the initial experiment and the repeat were imported into the SIMCA 10.0 package (Umetrics, Umea, Sweden) and preprocessed with mean centered scaling. The data was analyzed by principle components analysis (PCA) for the two experiments, and the proteins represented in two-dimensional scores plots. This approach summarizes the correlated variation in the dataset in an unsupervised manner, allowing the visualization of clusters within the dataset.
To measure the significance of these clusters in terms of the proteins of known sub-compartmentalization, the 12 known ER, Golgi, and mitochondrial/plastid proteins identified in the first experiment were then assigned to three classes and used to build a partial least squares-discriminant analysis (PLS-DA) model. This is a supervised pattern recognition approach that uses the class membership information (i.e. where a protein is localized) to maximize the separation between these proteins. The success of this approach was measured using a goodness of fit algorithm that produced a score (Q 2 ) where 0 is no better than chance and 1 is the theoretical maximum for a component that explains all the variation in the data and correctly classifies the dataset. This model was then used to predict the localization of the remaining 158 proteins. The PLS-DA results consist of three values for each protein that indicate how well that protein fits the three localizations, where values closest to one indicate a good fit with the pattern associated with that localization. The lowest scoring known organelle resident in each class was used as the threshold for assigning confident organelle localizations to proteins. The thresholds were therefore 0.93 for the Golgi, 0.88 for the ER, and 0.83 for the mitochondria/plastid class. Fig. 1. Organelles were prepared from Arabidopsis callus and partially separated according to their density using self-generating iodixanol gradients. Iodixanol is an alternative media to sucrose, which has the advantage of being able to form density gradients from homogenous solutions within a few hours of high-speed centrifugation. These self-generating gradients are extremely reproducible and simple to manipulate because their shape depends only on the starting iodixanol concentration and the centrifugation time (10,11). To determine the distribution of the ER and Golgi apparatus, gradient fractions were analyzed by Western blot, using antibodies against the ER marker AtSec12 and the Golgi marker gtl-6. The density gradient centrifugation only partially separated these organelles. Furthermore, proteomic analysis of the peak Golgi fraction, by SDS-PAGE followed by LC-MS/ MS, resulted in the identification of ER and Golgi proteins, as well as numerous proteins from other compartments including the PM, mitochondria, and plastids, making it impossible to assign localizations to novel proteins (data not shown). In contrast, LOPIT involves the analysis of protein distribution across the gradients rather than the cataloguing of single fractions. This enables proteins from different organelles to be distinguished even if their distributions overlap. A comparison of fractions three and six, for example, could be used to distinguish ER and Golgi residents because the Golgi is more abundant in fraction six than in fraction three, while the ER exhibits the opposite distribution. By comparing the distributions of novel proteins to those of known organelle residents using the multivariate data analysis techniques PCA and PLS-DA, the subcellular localization of novel proteins can be determined.

Basis of the LOPIT Technique-The basis of the LOPIT technique is outlined in
Determination of Protein Identities and Distributions-To determine the distributions of the proteins in the density gradient fractions, we utilized the cleavable ICAT reagents from Applied Biosystems. ICAT enables the relative quantitation of protein levels in two samples by MS (12). In total, six pair-wise comparisons were performed across the density gradient. The integral membrane proteins in each fraction were enriched by subjecting the membranes to a high pH wash, which removes the majority of soluble and peripheral membrane proteins (13). The proteins were then labeled with cleavable ICAT reagents, which specifically react with cysteine sulfhydryls. In each comparison, the least dense fraction was labeled with the light ICAT reagent and the denser fraction with the heavy reagent. The two fractions in each comparison were then pooled, and protein digestion followed by avidin affinity purification of the labeled peptides was performed. Each sample was then analyzed by LC-MS/MS, resulting in the confident identification and quantification of 170 proteins (see supplemental information). Of these proteins, 70% were predicted to be integral membrane proteins with 28% having two or more trans-membrane spans, re-confirming the applicability of ICAT technology to the analysis of hydrophobic integral membrane proteins (14). Thirteen of the proteins identified have been previously localized to specific organelles in Arabidopsis, while a further 15 had predicted subcellular locations based on homology to proteins in other plants, animals, or yeast. These proteins are listed in Table I. For the purpose of validating the LOPIT technique, the remainder of this study focuses on this set of 28 proteins. A detailed discussion of the remaining proteins is beyond the scope of this article and will be reported elsewhere.
Clustering of Proteins According to Their Localization Using PCA-To validate the LOPIT technique, the localization of the predicted ER and Golgi proteins, listed in Table I, was attempted. Manual analysis of the large datasets generated in these experiments is time consuming and error prone. Therefore, PCA was used to condense the dataset and identify patterns within it (15). In this case, PCA was used to reduce the six variable dataset (the six pair-wise comparisons) to a two-variable scores plot (Fig. 2). Principal component (PC) 1 (the x-axis in Fig. 2) is a best-fit vector that represents the greatest correlated variation in the dataset. PC2 (the y-axis in Fig. 2) represents the greatest variation at right angles to PC1, and is also the next most amount of correlated variation in the dataset. The scores for the two PCs for an individual protein describe the position of that protein in relation to these two new axes and the other proteins contributing to the model. Proteins that have similar distributions across the density gradient will cluster with one another in the PC scores plot. Although the two PCs represent variation across the total group analyzed, PC1 can be loosely defined as a measure of the likelihood a protein is either Golgi, mitochondria, or plastid FIG. 1. LOPIT workflow. Organelles from Arabidopsis callus were partially separated by centrifugation through self-generating iodixanol density gradients. Golgi and ER membrane distributions were determined using Western blots with anti-myc antibodies to detect the Golgi marker gtl-6-myc and anti-sec12 antibodies to detect the ER marker AtSec12. Protein distributions were determined by performing six pair-wise comparisons across the gradient in which the lighter fraction was labeled with ICAT light and the denser fraction with ICAT heavy. Samples were then pooled, digested, and the labeled peptides avidin-affinity purified. The ICAT-labeled peptides were then analyzed by LC-MSMS. Proteins were assigned to organelles by matching their distributions to those of proteins that have known subcellular localizations. This was achieved using the multivariate data analysis techniques, PCA and PLS-DA. localized (positive score in PC1) or found localized in the ER (negative score in PC1). In a similar manner, PC2 distinguishes mitochondrial and plastid proteins (positive scores) from Golgi proteins (negative scores). Fig. 2 clearly shows that the known ER and Golgi proteins form clusters that are distinct from one another and from the mitochondrial, plastid, and PM proteins. In this analysis, the predicted ER and Golgi proteins also cluster tightly with the known residents of their respective organelles, thus supporting the predicted localizations of these proteins. These results demonstrate that using this method it is possible to distinguish proteins that are localized to different subcellular compartments, without the need to purify individual organelles. We can also begin to assign the unknown proteins to subcellular compartments, because the majority of them cluster with the known organelle residents. The remaining proteins, which do not cluster with the marker proteins, may be residents of organelles for which no known resident proteins have been identified, such as the endosome. Alternatively, these proteins may be localized to multiple compartments.
In order to assess whether protein clustering was reproducible, the experiment was repeated using a second membrane preparation. A total of 114 of the proteins that were identified in the original experiment were also found in the repeat. The distributions of these proteins, in the two experiments, were again analyzed using PCA. Fig. 3 shows the PC1 scores obtained for each protein in the two repeats of the experiment. In each case, PC1 represents the maximum amount of correlated variation in the two PCA models. In both models, PC1 distinguished the combined group of the Golgi, mitochondrial, and plastid proteins from the ER proteins. The strong correlation observed between the original scores and those obtained in the repeat indicates that the clustering, and therefore the localization, of proteins by LOPIT is reproducible.

Conformation of the Localization of the Predicted ER and Golgi Proteins Using PLS-DA-Visual clustering of proteins TABLE I Localization of proteins to the ER and Golgi by LOPIT
The 28 proteins identified with known or predicted localizations are shown. A PLS-DA model was built based on the known ER, Golgi, and mitochondrial/plastid proteins. This model was then used to rank the remaining proteins according to how well they fit these three localizations. Values closest to 1 indicate a good fit. Proteins with Golgi values above 0.93 were assigned to the Golgi, and proteins with ER values above 0.88 were assigned to the ER. The highest PLS-DA result for each protein is shown in bold. Asterisks indicate proteins that fall just below the threshold for confident assignment to the Golgi.

Name
Accession no. based on their positions in the PC scores plot is a useful method for initial data analysis. However, the assignment of novel proteins to organelles requires a discriminate approach that can predict class membership. As a result, PLS-DA, which is a regression extension of PCA, was used. PLS-DA is a technique that can be used to predict the class membership of observations in a large multidimensional dataset based on a training set of observations with known class membership (16). In this case, the 12 known ER, Golgi, and mitochondrial/ plastid proteins identified in the first experiment were as-signed to three classes and used to build a PLS-DA model (Q 2 ϭ 0.909), which was used to determine the localization of the remaining 158 proteins (see supplemental information). The PLS-DA scores for the proteins in the training (known localization) and test (predicted localization) sets are listed in Table I. Using the stringent cut-off described above, LOPIT confirmed the Golgi localization of five of the seven predicted Golgi proteins and five of the six predicted ER proteins. The remaining two predicted Golgi proteins are also very likely to be Golgi localized, given their high scores for Golgi localization, which fall just below the threshold, and their low scores for the other compartments. Ribophorin I, the sole predicted ER protein whose localization was not confirmed, has a high ER score of 0.63 but also a reasonable Golgi score of 0.43. This protein may be present in a different subdomain of the ER than the proteins in the training and test sets. ER membrane proteins continuously escape from the ER to the Golgi and are retrieved in retrograde vesicles (17). The magnitude of the ribophorin I Golgi score could therefore reflect the steady state distribution of this protein, with the majority resident in the ER and a minority in the Golgi awaiting retrieval.

DISCUSSION
In this study, 170 proteins were identified and their density gradient distributions determined. Of these, a subset of 28 proteins, with known or predicted localizations, was used to validate the LOPIT technique. Fig. 2 shows the identified proteins clustered according to their distributions and hence localizations. The clear separation of the known ER and Golgi proteins, in this plot, demonstrates that LOPIT can be used to discriminate between the residents of these two organelles. The ER and the Golgi apparatus have very similar densities. In addition, proteins from the ER continuously escape to the Golgi, so that even if an ER-free Golgi preparation could be obtained, the identification of ER resident proteins is inevitable (18). Previous Golgi proteome studies in rat liver have identified the ER as the major contaminant organelle (19,20). Therefore, the ability to distinguish Golgi resident proteins from ER contaminants using LOPIT represents a significant advance in the localization of Golgi proteins by organelle proteomics.
To demonstrate the ability of LOPIT to determine the subcellular localization of previously uncharacterized proteins, a PLS-DA model was built using the proteins with known localizations. Using this model, the test set of 16 proteins with predicted localization were then assigned to the ER, Golgi, or the mitochondria/plastid categories. This resulted in the confident assignment of five proteins to the ER and five proteins to the Golgi, none of which had previously been localized in Arabidopsis. The subcellular localization of these proteins individually by microscopy would require substantial work because either antibodies would need to be raised or green fluorescent protein (GFP) fusions constructed. In addition, these methods have serious limitations in that the specificity of antibodies is often difficult to determine. Moreover, both the addition of GFP and its ectopic expression can affect the localization of a protein (21). LOPIT avoids these problems because the precise nature of the protein identifications by MS can be determined by examining the MASCOT and Pro-ICAT search results, which indicate when peptides are identified that correspond to multiple protein isoforms. LOPIT also involves the analysis of proteins that are expressed at their endogenous levels and in their native forms, thus avoiding the possibility of protein mislocalization.
ICAT was chosen as the quantification technique used in this study because the reagents are commercially available and because the ICAT protocol was readily adapted to enable the labeling of hydrophobic membrane proteins. An alternative approach to protein quantification is the analysis of sample digests in consecutive LC-MS/MS runs and the subsequent comparisons of the peak intensities obtained in each experiment. ICAT has several advantages over this approach: samples are analyzed simultaneously so changes in LC conditions or electrospray efficiency that occur over time do not impair the quantification quality; samples are also prepared simultaneously so sample losses during digestion, and any pre-fractionation, will effect both samples equally; the isolation of cysteine-containing peptides, in the ICAT protocol, reduces sample complexity, thereby increasing the likelihood of identifying low-abundance proteins and reducing the length of the LC runs required to achieve sufficient peptide separation. In addition, the cost of isotopic labeling is partially offset by the fact that two samples are analyzed simultaneously, thus halving the MS time required for analysis.
Further refinements in the LOPIT methodology will include the use of two-dimensional LC to improve peptide separation prior to MS. This will result in the identification of an increased number of low-abundance proteins, which should include markers for organelles that were not included in this study, such as the endosome and the vacuole. In addition, the gradients used for membrane separation could be manipulated with the aim of resolving endomembrane subcomponents or any poorly resolved organelles. For this reason, the use of self-generating iodixanol density gradients is advantageous because their shape and range can be manipulated by adjusting the initial iodixanol concentration and modifying the centrifugation time or speed (10,11).
In conclusion, we have developed and validated a proteomics technique for the localization of organelle proteins. This method has been used to localize proteins to the ER and the Golgi apparatus. To our knowledge, this is the first example of a proteomic method that can discriminate between proteins resident in these organelles. LOPIT provides a new tool that will enable preliminary functional assignment to uncharacterized proteins, based on their determined localization. LOPIT can also be used for the accurate determination of an organelle's protein complement, which will facilitate the investigation of organelle function and protein trafficking.