If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Michael Smith Laboratories and Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, CanadaDepartment of Chemistry, Simon Fraser University, Burnaby, BC, Canada
* This work is supported by funding to L.J.F. from NSERC, Genome Canada and Genome British Columbia (project 214PRO). The authors declare that they have no conflicts of interest with the contents of this article.
Understanding how proteins interact is crucial to understanding cellular processes. Among the available interactome mapping methods, co-elution stands out as a method that is simultaneous in nature and capable of identifying interactions between all the proteins detected in a sample. The general workflow in co-elution methods involves the mild extraction of protein complexes and their separation into several fractions, across which proteins bound together in the same complex will show similar co-elution profiles when analyzed appropriately. In this review we discuss the different separation, quantification and bioinformatic strategies used in co-elution studies, and the important considerations in designing these studies. The benefits of co-elution versus other methods makes it a valuable starting point when asking questions that involve the perturbation of the interactome.
isobaric tagging for relative and absolute quantitation
TOF
time of flight
MS/MS
tandem mass spectrometry
SILAC
stable isotope labeling by/with amino acids in cell culture
SWATH-MS
sequential windowed acquisition of all theoretical fragment ion mass spectra
SILAM
stable isotope labeling of mammals
CORUM
comprehensive resource of mammalian protein complexes
TPP
thermal proteome profiling
TMT
tandem mass tag.
1The abbreviations used are:PPIs
protein–protein interactions
Y2H
yeast two-hybrid
MS
mass spectrometry
AP
affinity purification
BioID
proximity-dependent biotin identification
BN-PAGE
blue-native polyacrylamide gel electrophoresis
PrInCE
prediction of interactomes from co-elution
EPIC
elution profile-based inference of complexes
SEC
size-exclusion chromatography
IEX
ion-exchange chromatography
HIC
hydrophobic interaction chromatography
SAX
strong anion exchange
WAX
weak anion exchange
WCX
weak cation exchange
iTRAQ
isobaric tagging for relative and absolute quantitation
TOF
time of flight
MS/MS
tandem mass spectrometry
SILAC
stable isotope labeling by/with amino acids in cell culture
SWATH-MS
sequential windowed acquisition of all theoretical fragment ion mass spectra
SILAM
stable isotope labeling of mammals
CORUM
comprehensive resource of mammalian protein complexes
TPP
thermal proteome profiling
TMT
tandem mass tag.
, often involving higher-order complexes. Understanding the architecture of this interactome from a dynamic, topological and quantitative perspective is key to discerning biological processes and their involvement in disease (
). These have evolved from the classic yeast two-hybrid (Y2H) method to mass spectrometry (MS) approaches based on the co-purification of interacting proteins. Currently, the most widely used technique is affinity purification (AP-MS), thanks to its simplicity and improvements made in quantification and data analysis (
). For a systems-level analysis, the ideal interactome mapping method should be high-throughput, quantitative, simple, physiologically relevant and give information about stoichiometry, topology and dynamics. However, current techniques show limitations in at least a few of these characteristics so the key is to use complementary methods to corroborate results. One approach particularly useful for exploratory studies are co-elution methods.
Co-elution or co-fractionation methods are collectively a global approach used to simultaneously study the whole interactome (as opposed to piece-by-piece, as in AP-MS) and will be the focus of this review. Co-elution methods all rely on separation of protein complexes under native conditions, with the fundamental idea being that proteins belonging to the same complex co-elute or migrate together during separation, showing the same migration profile (Fig. 1A). Co-elution strategies were originally introduced to assign proteins to the same subcellular localization if these displayed similar profiles across a density gradient (
) (BN-PAGE) to generate high-resolution elution profiles for thousands of proteins. The analysis of co-elution data involves plotting the MS1 intensities of proteins across many fractions, matching and scoring those profiles to detect binary protein interactions and provide an interactome map from those interactions (Fig. 1B). Current advances in the analysis of co-elution data include the development of a bioinformatics pipeline (PrInCE) (
), both freely available. In this review we discuss the comparable advantages of co-elution, the different separation strategies used and design considerations with an emphasis on the separation method, quantification and data analysis.
Fig. 1Schematic of the steps commonly involved in co-elution methods.A, General workflow of a co-elution experiment. The lysed sample containing protein complexes under native conditions is separated in a set number of fractions. Proteins from the same complex show the same co-elution profile after a bioinformatic analysis to extract an interactome map. B, Co-elution data representative from an experiment where lysed HeLa cells were fractionated by size exclusion chromatography (SEC), quantified by MS/MS, and finally used to construct an interactome network through data analysis.
Existing Co-elution Strategies Are Well-suited for Global Exploratory PPI Studies When Compared with Other Methods
The main comparative benefit of co-elution strategies (Table I) is that hundreds to thousands of protein complexes can be simultaneously and rapidly analyzed in a single experiment (
). Because the primary measurement in these experiments is the abundances of thousands of proteins across the elution gradient, rather than a focus on bait proteins, co-elution studies scale much more easily than Y2H or AP-MS (
). Thus, co-elution can identify all the interactors for many proteins simultaneously, as well as to identify when a single protein participates in multiple complexes (
), which is more difficult to determine by AP-MS. A similar and recently developed complementary approach to co-elution is thermal proteome profiling (TPP), which can provide the proteome-wide detection of protein complexes and their rearrangements (
An added attraction of co-elution studies is that the generated interactome should be more physiologically relevant than results from studies involving protein tagging because modifying proteins can perturb endogenous interactions by the presence of the tag or overexpression of the bait (
). In this sense, co-elution is similar to immunoprecipitation-type AP-MS studies, where proteins are purified with an antibody against the bait itself, rather than an added epitope, but co-elution is not dependent on the existence of a specific and high affinity antibody (
). In AP-MS, the bait is fused to an affinity tag allowing the purification of this bait and its interacting partners without the need for a specific antibody, but it still relies on the fusion step.
Co-elution studies take considerably less time and resources than an equivalently scaled AP-MS study, so replicates can be conducted more easily. This also means that biological perturbations of the network can be measured. This has so far only been done globally using SILAC (
), but in principle could also be accomplished using label-free quantitation. Improved quantitation in AP-MS has allowed the comparison of proteins that co-purify with a bait protein under normal and perturbed conditions in a quantitative manner (
), but not nearly at the scale enabled by co-elution.
The various interactome methods provide fundamentally different types of information. Co-elution, at its heart, identifies binary interactions, but these interactions do not necessarily represent direct physical connections, and can include proteins that co-elute because they are members of the same complex but not in direct physical contact (Table I, “Indirect” interactions). AP-MS targets only the complexes co-purified with a specific protein. The BioID method is similar in that it focuses on the potential interactors of a specific protein. However, the candidates identified can be direct or indirect interactors, and/or vicinal proteins that do not physically interact with the fusion protein (
False positive interactions are a problem for all PPI technologies, to a greater or lesser degree. In co-elution strategies, functionally unrelated complexes can co-elute, leading the user to conclude that all the component proteins interact and thus manifesting as false positives. Therefore, co-elution results should be regarded with caution. Potential novel complexes provide a good seed for follow-up analyses to obtain more detailed and high confidence biochemical information. These types of false positives can be mitigated by using as high resolution separation conditions as possible. The use of multiple orthogonal separation strategies can also decrease the effect of co-elution by chance. Targeted complex quantification should also be helpful for follow-up experiments. In addition, rigorous bioinformatic analyses lower the chances of predicting false positive interactions.
The chromatograms or electropherograms generated in co-elution studies can also be used to quantify the relative distribution of a protein into multiple different protein complexes. That is, if one protein participates in more than one complex, the relative amounts of those different complexes can be derived. This can yield information about the dynamics of PPIs as substoichiometric interactors will, e.g., more likely be dynamic partners in the complex (
When compared with other methods, co-elution stands out as a global approach capable of producing vast information of the interactome. It is therefore particularly suited for exploratory studies that can later be validated with complementary approaches.
Separation Strategies Used for Co-elution
In co-elution studies, tissues or cells are lysed to extract protein complexes that are subsequently fractionated under conditions that are designed to preserve the PPIs within the complexes. Different separation techniques have been used for fractionation, including size-exclusion chromatography (SEC), ion-exchange (IEX) and hydrophobic interaction chromatography (HIC), and BN-PAGE (Table II) (
A “tagless” strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking.
). However, considering that sucrose gradients have low resolution and IEF is mostly coupled to native MS and can be challenging for whole lysates, we recommend the use of these for orthogonal separations or complementary experiments.
Table IIComparison between co-elution separation strategies: SEC, IEX, HIC, BN-PAGE
Salts commonly used for buffers: NaCl, HEPES, KCl, MgCl2, Tris, PBS, NaCH3COO, NaN3, (NH4)2SO4. May contain additives like proteases inhibitors, dithiothreitol or glycerol.
Benefits/Drawbacks
Applications
SEC
Material with different pore sizes (silica, polymeric or cross-linked agarose)
Hydrodynamic size
Millimolar salt buffers at neutral pH
Less buffer requirements, isocratic/Modest resolution
Soluble complexes
IEX
Material with ionic groups (silica-based or polymeric) with SAX, WAX, WCX or mixed-bed (WAX & WCX) properties
Salt gradient
Increasing molar salt gradient (NaCl) at neutral pH
Higher resolution, more chemistries available/Higher salt concentrations required
Soluble complexes
HIC
Hydrophobic
Salt gradient
Decreasing molar salt gradient (e.g. (NH4)2SO4) at neutral pH
Higher salt concentrations required
Multiple orthogonal separations
BN-PAGE
Polyacrylamide gel
Electrophoretic mobility
Coomassie blue G, salt buffers
Not limited to soluble complexes/Less reproducible
Membrane protein complexes
a Salts commonly used for buffers: NaCl, HEPES, KCl, MgCl2, Tris, PBS, NaCH3COO, NaN3, (NH4)2SO4. May contain additives like proteases inhibitors, dithiothreitol or glycerol.
A “tagless” strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking.
) demonstrated how E. coli polypeptides from protein complexes had the same elution profile through multiple orthogonal chromatographic steps (including IEX, HIC and SEC) performed successively. A simpler approach using only SEC (
Megadalton complexes in the chloroplast stroma of Arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry, and hierarchical clustering.
) provided a biologically relevant map of soluble chloroplast-localized complexes of Arabidopsis thaliana, showing the potential of the approach for interactome study. The use of SEC in global monitoring of protein complexes was limited until the introduction of the first co-elution study using SILAC and SEC (
) used multiple orthogonal separations including weak-anion exchange and mixed-bed ion exchange, sucrose gradient centrifugation and IEF. This strategy was later used to examine complexes among diverse metazoan models, studying eight different organisms in total (
). Recently, both SEC and IEX were used in parallel to separate the same samples and obtain an overlapping data set to hopefully reduce the confounding effect of chance co-elution (
). A recent study was based on the IEX approach but using SILAC instead of label-free quantification to monitor interactome changes following perturbation assays (
Global proteomic analyses define an environmentally contingent Hsp90 interactome and reveal chaperone-dependent regulation of stress granule proteins and the R2TP complex in a fungal pathogen.
), as lysis is done under mild, complex-preserving conditions. To allow the study of soluble and membrane-bound complexes of entire mitochondria, Heide et al. (
) used BN-PAGE and large-pore BN-PAGE after digitonin solubilization. With this approach, they also resolved large complexes (up to a molecular mass of 30 MDa) that cannot be resolved by SEC. More recently, Scott et al. (
). Detergent-free solubilization strategies have been recently introduced to improve the study of membrane proteins, where amphipathic scaffold proteins (
) wrap around the hydrophobic parts of the target membrane protein and shield them from the aqueous solution.
Design Considerations
Choice of Separation Method
Perhaps the first question to ask when designing a co-elution protocol is which type of proteins are the focus of study: soluble or membrane proteins. The mild, detergent-free lysis conditions at neutral pH and physiological salt concentration used to preserve protein complexes are not suitable to solubilize membrane complexes because of their hydrophobicity (
). Beside the soluble cytosolic protein complexes, these conditions extract soluble intra-organellar protein complexes such as nuclear, mitochondrial and lysosomal ones. Thus, for initial and exploratory investigations, the soluble interactome provides a large map of the biological processes of an organism (
). However, membrane proteins are involved in important cell processes and they can be the focus of study. Some studies have used mild and non-denaturing detergents to solubilize membrane complexes, which are then separated by SEC or BN-PAGE (
) or interfere with solvent access to charged proteins. Instead, BN-PAGE has the advantage of being an established method for membrane protein separations and has proved to be well suited for co-elution interactome studies (
). A recent co-elution method used in vivo formaldehyde protein crosslinking with denaturing SEC separation which identified membrane and membrane-associated protein complexes compared with the only-soluble-complexes approach (
). No current method allows the simultaneous study of native soluble and membrane proteins as mild detergents can disrupt soluble PPIs. However, new detergent-free technologies to solubilize membrane proteins might lead to a global method (
). Potentially, the use of crosslinking could also help overcome the limitation of co-elution (shared with other lysis-based methods) of possibly missing important transient and weak interactions. However, this adds a layer of complexity to the bioinformatics analysis involving the identification of crosslinked peptides.
In theory, soluble proteins can be effectively separated in any type of chromatography that allows separations in aqueous conditions with proper column dimensions to accommodate protein complexes. Traditional reversed-phase or hydrophilic interaction LC require the use of organic solvents that denature proteins and disrupt PPIs. The biggest advantage of SEC is precisely that separations can be performed under aqueous and isocratic conditions, as separation only depends on the hydrodynamic volume of the complexes (SEC columns have pores of different sizes where small hydrodynamic volumes equilibrate more often than large ones and therefore smaller complexes elute later (
)). The mobile phase can be the same buffer used for lysis at neutral pH and physiological salt concentration. One downside of SEC is that it has modest resolution and is thus prone to co-elution by chance. One way to increase resolution in SEC separations (applicable to any LC) is to use two long columns (300 mm) in series.
IEX separation is based on the charge attraction between column and protein, which carries surface charges depending on their isoelectric point and buffer pH. Salt concentration is controlled to drive the actual separation by ion displacement of immobilized proteins by mobile phase ions. Compared with SEC, IEX might show enhanced retention and therefore more characteristic profiles. There are also more columns available with different chemistries. However, the increased salt concentration required for separation might disrupt some PPIs. To minimize this, shallow salt gradients are used to not perturb nonionic protein associations and maintain non-denaturing conditions. In HIC, separation is also driven by salt concentration, where high concentrations reduce solvation of proteins, promoting interaction of the protein's hydrophobic parts with the hydrophobic stationary phase. HIC requires higher salt concentrations to promote retention, which is why HIC is less used than other chromatographies (
As mentioned before, several studies have combined several of the above techniques in sequence or in parallel to obtain multiple orthogonal fractionation (
A “tagless” strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking.
). The main advantage of these approaches is that complexes that might be lost by one strategy can be rescued by another one (e.g. salt in IEX may disrupt some complexes that can be rescued by SEC). Multiple separations also further separate protein complexes that might be poorly resolved by a single separation. These methods are however time-consuming, and they still require validation experiments by complementary approaches.
LC stationary phases require a suitable column (typically high resolution, analytical-grade), particle and pore dimensions to separate protein complexes with high efficiency. Large biomolecules require large pore sizes to allow unrestricted diffusion inside the pores and larger columns with smaller particles (e.g. 500Å, 300 mm, ≤5 μm) give narrower peaks, with limits imposed by separation time, column backpressure and material synthesis (
Superficially porous particles with 1000 Å pores for large biomolecule high performance liquid chromatography and polymer size exclusion chromatography.
). Material technology for chromatography is constantly introducing advances, which are applied to biomolecules, such as mixed-mode materials or superficially porous particles, and co-elution methods could benefit from them to achieve faster and more efficient separations (
Synthesis and optimization of wide pore superficially porous particles by a one-step coating process for separation of proteins and monoclonal antibodies.
). To achieve faster separations, temperature is also controlled, often set at room or higher temperatures. However, for protein complexes keeping the temperature during separation (and sample handling) lower (e.g. work on ice, LC separations <10 °C) is critical for complex stability. The use of low temperatures also prevents protein aggregation when the sample is concentrated to a suitable volume for LC injection. The absence of large macromolecules eluting at void-volume in SEC are evidence of absence of protein aggregation. Column dimensions and separation conditions will determine overall separation resolution and, in turn, this determines the optimum number of fractions that should be collected to obtain adequate co-elution data. Narrow peaks are desired because it gives characteristic elution profiles that can be more effectively compared for co-migration data. However, narrow peaks can also go undetected if they are only spread across one or two fractions. The solution to this could be to collect a larger number of fractions, but this comes at the cost of more sample preparation and increased MS analysis time.
Once protein complexes are fractionated their stability as complexes is not important and the goal is to digest the proteins adequately for peptide LC-MS/MS analysis. The sample handling considerations for this step are the same as for any MS-based proteomics procedure. Nevertheless, it is important to mention here that digestion procedures free of detergents, salts and contaminants produce clean samples that are key to maximize protein identification.
Quantitative Approaches
Some form of quantitation is required to generate chromatograms or electropherograms from co-elution data and, thus, the choice of the quantitation method is important. The main approaches used to quantify co-elution data are SILAC and label-free methods, both frequently used in normal MS-based proteomic workflows. Much has been written about the comparative advantages of both quantification strategies (
) and those apply to co-elution workflows. SILAC provides accuracy and consistency across different samples as the metabolic labels are introduced during cell culture, allowing normalization to be done at an early stage in the sample handling. SILAC also saves a significant amount of sample preparation time as different conditions can be pooled into one sample for simultaneous analysis (
). For co-elution, these benefits are key, as high accuracy across fractions is achieved and the introduction of a third channel allows the study of interactome rearrangements on perturbation.
A common misconception about SILAC is that it is expensive. Although it is true that SILAC reagents add cost to an experiment, the increased accuracy in quantitation means that fewer fractions or samples are required to get equivalent data, and thus much less instrument time (which also has a cost) is needed. One caveat to using SILAC is that there are certain biological systems that cannot be easily labeled metabolically, such as primary cells, clinical samples or most whole organisms. The applications of SILAC are still vast, being compatible with numerous cell lines and, though costly, whole organisms (stable isotope labeling of mammals, SILAM (
Global interactome studies have been conducted involving heterologous expression of genetically manipulated cell lines which raises the question of how physiologically relevant the results obtained are. Skinnider et al. recently produced a SILAM mouse for tissue interactome study (
) of seven mouse tissues to map tissue-specific mammalian interactomes. Despite being experimentally challenging, these types of studies yield interactome maps that are more relevant.
SILAC limitations have also been addressed by producing a SILAC-labeled spike-in standard (
), where a SILAC sample is prepared separately in a compatible material and is added as a reference to each of the experimental samples. This method allows a SILAC-like quantification for SILAC incompatible samples and is an alternative to whole-organism labeling. Spike-in SILAC could be applied to co-elution, but the same as with label-free approaches, different physiological conditions cannot be pooled for simultaneous MS analysis.
In theory, other labeling approaches like isobaric labeling (i.e. iTRAQ or tandem mass tag, TMT) could be used in co-elution approaches to minimize MS analysis time. This could be particularly useful for multiple separation approaches (
A “tagless” strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking.
). However, pooling samples is the only advantage facing several disadvantages, including that normalization is done at a late stage (after protein digestion), sample handling is increased and data analysis becomes more challenging.
As previously mentioned, label-free approaches have also been successfully used to quantify co-elution data sets (
ComplexQuant: high-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS.
). Both available label-free methods, spectral counting from MS/MS scans or MS1 precursor ion intensities, have been used for this purpose, employing appropriate software (e.g. PepQuant (
ComplexQuant: high-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS.
)). Label-free methods are arguably simpler and have no sample limitations. While SILAC can compare up to two conditions in perturbation studies, label-free has virtually no limits. This strategy is therefore quite useful for quantification across larger comparison sets (>2 and up to 10s of biological conditions). In these cases, data-independent acquisition (DIA or SWATH-MS) is another alternative that has already been applied to co-elution studies (
). However, this comes with a significant increase in sample preparation and MS analysis time and, in the case of SWATH, additional computational challenges.
Data Analysis for Co-elution Profiling Studies
A distinct advantage of co-elution studies over other high-throughput methods is that they can detect PPIs between all proteins identified in a sample (“all-to-all”, also known as the matrix model (
)). Other high-throughput methods are limited to detecting interactions between two tagged or labeled proteins (“bait-to-bait”) such as Y2H, or between a tagged protein and any other (“bait-to-all”, also known as the spoke model) such as BioID (Fig. 2A). This increased number of potential interactions can result in a combinatorial explosion, however. For example, a co-elution data set can contain millions of potential interactions, only thousands of which are likely to be real. Analyzing co-elution data sets, therefore, often involves separating true interactors from a background of spurious false positives through bioinformatic analysis.
Fig. 2Bioinformatic analysis of co-elution data.A, Bioinformatic analysis of co-elution data is complicated by the number of potential interactions. In contrast to techniques such as Y2H that find interactions between tagged proteins (“Bait-to-bait”) or BioID (and sometimes AP-MS) that find interactions involving at least one bait protein (“Bait-to-all”), co-elution experiments have the potential to find interactions between all identified proteins in a sample (“All-to-all”). B, Schematic of classifier-based analysis of co-elution data. The strength of co-elution is quantified for every pair of proteins using multiple metrics (“features”). Features derived from external data can be included, such as co-citation or co-expression. Using a gold standard set of known complexes, a subset of the protein pairs are labeled as interacting or not-interacting. Finally, a classifier uses to the features and labels to assign every pair of profiles a classifier score, to which a threshold is applied. C, Performance of single co-elution features. Interactomes were predicted from four data sets using a single co-elution metric. Each dot represents an interactome from one replicate, and the y axis gives the precision of the 500 best-scoring interactions. Interactomes were predicted using PrInCE with default parameters (CORUM gold standard). weighted_xcorr: Weighted cross-correlation, measured with R function wccsom. pearson_R_cleaned: Pearson correlation (cleaned profiles). mutual_info: Mutual information. co_apex: Mininum number of fractions between fitted Gaussian centers. pearson_P: Pearson correlation (raw profiles) p value. pearson_R_raw: Pearson correlation (raw profiles). euclidean_distance: Euclidean distance (cleaned profiles). co_peak: Number of fractions between maximum value. pearson_plus_poisson: Pearson R (raw profiles) plus Poisson noise. co_fraction: 1 if maximum values are in the same fraction, 0 otherwise. Jaccard: Overlapping fractions in which both proteins are quantified, measured with Jaccard. Data sets: 1 Kristensen et al. 2012 (
Although there are many workflows for analyzing co-elution data, it is common to use co-elution data to generate a list of pairwise PPIs (i.e. an interactome), typically done via a machine learning classifier (
). In this analysis, the strength of co-elution is measured for every pair of proteins using a variety of metrics (Fig. 2B). Across published studies, we count eleven metrics used to evaluate the co-elution strength of pairs of proteins (Fig. 2C). These fall into five general categories: correlational metrics, such as weighted cross-correlation and Pearson correlation strength between raw and cleaned elution profiles (
). Fig. 2C shows how these metrics perform when predicting interactomes using a single metric (PrInCE, default parameters). In general, we find that correlational metrics such as Pearson R and weighted cross-correlation that use quantified protein amounts are more informative than measures that just detect if proteins are quantified in the same fractions (Jaccard and co_fraction), although each metric differs between data sets. In practice, multiple metrics are used to better differentiate between true interactors and spurious pairs, because truly interacting protein pairs should score highly in most measures.
Using a gold standard reference of known protein complexes (e.g. CORUM (
)) to label a subset of pairs in a data set as known PPIs or known non-interactors, it is possible to estimate the probability that any given protein pair is interacting. That is, combined with a gold standard reference, classifiers assign an interaction score to all protein pairs, with high-scoring pairs more closely resembling known PPIs. Finally, to arrive at an interactome, it is typical to take all protein pairs whose score is greater than a threshold as predicted PPIs. This threshold is typically chosen such that the ratio of true positives to false positives in the interactome, which are derived from the gold standard, satisfies a given FDR. Therefore, the task of finding pairwise PPIs in a co-elution data set can be framed as separating truly interacting protein pairs from a large background of non-interacting pairs. As an optional step, the resulting interactome can be clustered into protein complexes using a network-based clustering algorithm (
). Although it can be difficult to assess the quality of clusters, at least in part because metrics for measuring the similarity between clusters have biases and display non-intuitive behavior (
). These tools can be used as both standalone executable programs, where data is loaded and output files and figures are generated, and as R packages. Parameters to take note of when using these tools are the number of quantified proteins in a data set (ideally greater than 500), the number of missing values in the data set, and, primarily, the width of elution peaks, because elution profiles with poor resolution (“wide” peaks) will be poorly distinguishable and yield more spuriously correlated pairs. For example, we find that in data sets with 50 fractions, elution peaks should have a full width at half maximum of no more than 10 fractions.
Although classifier-based data analysis is common, there are many ways to treat co-elution data. For example, it is also common to cluster co-elution data into groups of similar profiles, as these groups can represent protein complexes (
). Data analysis methods discussed so far identify novel and known interactions, often focusing on PPIs with complex prediction as a downstream analysis. In contrast, “complex-centric” approaches (
) start with known protein complexes (e.g. CORUM) and assess whether members of a known complex are co-eluting. Although this approach does not detect novel PPIs, it does detect novel subunits of complexes and assembly intermediates. CCProfiler is a free software for complex-centric data analysis (
An important consideration for both classifier-based and complex-centric methods is the choice of reference complexes (“gold standards”). Gold standards do not exist for all organisms, and although proteins from non-model organisms can be mapped to model organism proteins, this can introduce errors because orthologs between organisms do not necessarily interact with the same partners. Therefore, co-elution analysis often works best on human data sets, or data sets from other well-studied organisms. A further issue regarding gold standards is that many protein interactions only occur under certain conditions (
). Although this can help filter out spuriously co-elution proteins, it can also bias results toward highly-studied proteins and away from less-well-studied and/or harder to identify proteins (
Co-elution can investigate all the existing interactions between all the proteins quantified in a given sample whereas other methods focus on a protein’s interactions at a time. In addition, it does not use protein tagging, gives quantitative information (including relative amounts of different complexes with a common protein), and, when combined with SILAC, provides interactome rearrangement information on perturbation in record time. Depending on whether soluble or membrane complexes are the focus of study, the separation strategy changes from SEC or IEX to BN-PAGE or mild detergent-based separations, but the introduction of recent membrane protein solubilization strategies might produce global approaches. To a large extent, the system under study defines the quantification strategy to use. SILAC, label-free and other methods are available depending on the cell line or tissue and whether the goal is to find new interactions or study the interactome under different physiological conditions. One important consideration of co-elution experiments is that they typically require sophisticated bioinformatic analyses, because co-elution analyses often compare all pairs of proteins quantified in a sample, and this number is large (millions) for modern data sets. Further, classifier-based analyses of co-elution data require gold standard databases of known protein complexes, a requirement which is not met for all organisms. Co-elution is a powerful tool for uncovering interactomes, and it provides many advantages over existing high-throughput interactome mapping technologies. In the future, we believe co-elution studies should move toward maximizing quantitation accuracy, lowering quantification limits and increasing separation resolution. This would allow the study of the interactome beyond the protein level (e.g. post-translational modifications) and the use of less sample amount, translating in lower costs and sustainable methods. Automatization of sample digestion would also improve the technique greatly, to alleviate the time-consuming analysis of multiple (>2) conditions.
REFERENCES
Huttlin E.L.
Bruckner R.J.
Paulo J.A.
Cannon J.R.
Ting L.
Baltier K.
Colby G.
Gebreab F.
Gygi M.P.
Parzen H.
Szpyt J.
Tam S.
Zarraga G.
Pontano-Vaites L.
Swarup S.
White A.E.
Schweppe D.K.
Rad R.
Erickson B.K.
Obar R.A.
Guruharsha K.G.
Li K.
Artavanis-Tsakonas S.
Gygi S.P.
Harper J.W.
Architecture of the human interactome defines protein communities and disease networks.
A “tagless” strategy for identification of stable protein complexes genome-wide by multidimensional orthogonal chromatographic separation and iTRAQ reagent tracking.
Megadalton complexes in the chloroplast stroma of Arabidopsis thaliana characterized by size exclusion chromatography, mass spectrometry, and hierarchical clustering.
Global proteomic analyses define an environmentally contingent Hsp90 interactome and reveal chaperone-dependent regulation of stress granule proteins and the R2TP complex in a fungal pathogen.
Superficially porous particles with 1000 Å pores for large biomolecule high performance liquid chromatography and polymer size exclusion chromatography.
Synthesis and optimization of wide pore superficially porous particles by a one-step coating process for separation of proteins and monoclonal antibodies.
ComplexQuant: high-throughput computational pipeline for the global quantitative analysis of endogenous soluble protein complexes using high resolution protein HPLC and precision label-free LC/MS/MS.