A Label-free Mass Spectrometry Method to Predict Endogenous Protein Complex Composition

At least one third of soluble proteins are predicted to exist in a stable oligomeric state. However, the compositions of the vast majority are unknown. This paper describes a biochemical method to predict protein complex composition based on orthogonal chromatographic separations and label-free protein correlation profiling. The validated method predicts hundreds of novel homo- and heterooligomeric complexes, and provides a new way to analyze protein complexes in any organism with a well-annotated proteome.

(A) The SEC and IEX fitted elution profiles for biological replicate one were plotted for the 20S proteasome and coatomer subunits. The 20S proteasome subunits are plotted in magenta and the Coatomer subunits are plotted in green. Coatamer subunit peaks are clearly separated from those of the 20S proteasome in the IEX. Locus IDs for all proteins are provided in Supplemental Table  2.
(B) The SEC and IEX fitted elution profiles for biological replicate one were plotted for the EIF3 and CCT subunits. EIF3 subunits are plotted in red and the CCT subunits are plotted in blue. Separation of the different complex subunits is obvious in the IEX. Locus IDs for all proteins are provided in Supplemental Table 2.
(C) The intactness of the known complexes was plotted for SEC only (blue lines), IEX only (red lines), and concatenated SEC and IEX profiles (black lines). Intactness is the measure of the number of subunits in a single cluster divided by the total number of subunits identified for the complex.
(D) A box plot for the distance within each cluster as a function of cluster number for the chloroplast dendrogram obtained using the combined SEC and IEX profile data. The box plot represents the first and third quartile of the data with whiskers at 1.5 of the IQR.
Supplemental Figure 4: A subset of GRF/14-3-3 proteins have distinct profiles on the IEX that drive their inclusion into a distinct sub-group.
Supplemental Figure 5: Evaluation of orthogonal gene co-expression and predicted protein-protein interaction datasets as a potential datatype to refine co-elution-based protein complex predictions.
(A) Gene expression was tested to determine if it was a useful predictor of protein complex composition. The fraction of subunits that were coexpressed from known human complexes and conserved Arabidopsis complexes was plotted. Gene coexpression was taken from ATTED-II for Arabidopsis and COEXPRESSdb for human proteins. (B) The degree of coelution of predicted protein interactors taken from Biogrid was tested. For each unique interacting pair in Biogrid, the difference in mean peak location for the SEC and IEX column separation was calculated. For example, ~15% of the pairs co-eluted in the same column fraction in the SEC separation and ~25% of the pairs co-eluted in the same column fraction in the IEX separation. (C) Biogrid predicted interactors are enriched in our clustering prediction compared to randomly selected cluster numbers. Among the 19 protein pairs, 9 had very similar cluster IDs (cluster IDs that had a difference of less than or equal to 2, Supplemental table 5). This level of similarity is not due to chance, because when cluster IDs were randomly drawn for the predicted interactors zero pairs were matched in over 70 percent of the simulations (n =10,000 simulations), and we never observed more than 4 matched pairs in any of the simulations.