|
Advertisement | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 4:1569-1590, 2005.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ABSTRACT |
|---|
|
|
|---|
10,000 signal and response measurements acquired from HT-29 cells treated with tumor necrosis factor-
, a proapoptotic cytokine, in combination with epidermal growth factor or insulin, two prosurvival growth factors. Nineteen protein signals were measured over a 24-h period using kinase activity assays, quantitative immunoblotting, and antibody microarrays. Four different measurements of apoptotic response were also collected by flow cytometry for each time course. Partial least squares regression models that relate signaling data to apoptotic response data reveal which aspects of compendium construction and analysis were important for the reproducibility, internal consistency, and accuracy of the fused set of signaling measurements. We conclude that it is possible to build self-consistent compendia of cell-signaling data that can be mined computationally to yield important insights into the control of mammalian cell responses.
A second complication in systematizing cell-signaling measurements is the lack of inherent structure in the data. Structured data, such as DNA sequence, can be referenced to a single data model and systematized via relational databases. Manipulating unstructured data, such as microscopy and immunoblot images, is a significant informatic challenge to which current databases are not well suited (11). Third, unlike gene sequences, the meaning of data on kinase activation, receptor-ligand interaction, signalosome assembly, and phospholipid modification depends heavily on context (6, 11). Critical aspects of biological context include cell type, nature and concentration of a cytokine stimulus, and cell cycle or growth phase. Experimental methods also contribute to context because quantitative measurements vary with reagent source and protocol. Finally signaling data are necessarily incomplete: whereas a genomic sequence has a well defined start and finish, the characterization of a complex signaling network is an open-ended process.
This study examined signaling networks that control the apoptosis-survival decision in HT-29 human colon adenocarcinoma cells treated with combinations of the prodeath cytokine tumor necrosis factor-
(TNF)1 and the prosurvival growth factors epidermal growth factor (EGF) and insulin. TNF activates intracellular signals by binding to trimeric death receptors and promoting assembly of intracellular death-inducing signaling complexes (12, 13). Death-inducing signaling complexes then activate several parallel signaling pathways, including apoptotic caspases (14), stress-activated c-jun N-terminal kinase 1 (JNK1) and p38 pathways (15), and the proinflammatory I
B kinase (IKK)-nuclear factor-
B pathway (16).
Activation of the EGF receptor (EGFR) tyrosine kinase occurs through receptor dimerization, conformational change, and autophosphorylation (17, 18). Phosphorylated receptors recruit adaptor proteins, and these then activate multiple signaling proteins including extracellular signal-regulated kinase (ERK) via Ras and the Akt kinase via phosphatidylinositol 3-kinase (19). The binding of insulin to the insulin receptor also activates ERK and Akt, but in contrast to EGFR, the insulin receptor is constitutively dimerized, and most insulin-induced signaling involves modification of insulin receptor substrate 1 (IRS1), a multidomain adaptor protein (20). Antagonism between TNF and EGF or TNF and insulin is well documented and relevant to many diseases, including cancer, inflammatory bowel disease, and diabetes.2 However, there exists only limited molecular level understanding of the strategies cells use to process conflicting cytokine stimuli and regulate responses such as apoptosis. The limitation arises in part because current practice in signal transduction research, as in other areas of cell biology, is to interpret data only with respect to contemporaneous controls. Categorical conclusions are drawn from direct comparison of experimental and control samples (e.g. ERK is more, less, or equally active between samples), and the next experiment is designed. In studies of complex networks, however, many variables must be measured, and the piecemeal accumulation and analysis of data become both cumbersome and limiting.
In this study we asked whether a validated and self-consistent dataset could be constructed from diverse biochemical measurements of cell-signaling proteins in human cells exposed to TNF, EGF, and insulin. Using techniques available in the average research laboratory, we normalized and fused into a single dataset
10,000 data points collected over an 18-month period. We have not yet tackled the informatic challenge of building a database to hold the information, relying instead on a series of customized spreadsheets. We therefore refer to the assembled dataset as a compendium. The heterogeneity, lack of intrinsic structure, context dependence, and open-ended nature of the signaling data make the construction of this cytokine-signal-response (CSR) compendium nontrivial (22). However, we show that the compendium makes it possible to compare, with high reliability, data that were not collected contemporaneously. We also show that data can be mined for insights that would not be obvious from more conventional approaches. Finally because the value of a data compendium can only be unlocked computationally, we analyzed the full dataset by using a predictive statistical model based on partial least squares regression (PLSR).3 We will describe elsewhere the use of modeling to elucidate the mechanisms by which cells process conflicting cytokine stimuli3; here we apply modeling to the more general problem of assessing the quality and information content of a CSR compendium. In contrast to many other proteomic efforts, we incorporated modeling early in the data collection process by linking protein signaling to relevant phenotypic responses. The inclusion of responses is necessary to constrain statistical models sufficiently to provide useful quantitative assessment of compendium design. Our key findings are that heterogeneous, time-resolved information is indeed critical for predictive power as is sampling broadly across the network. Nonetheless it appears sufficient to monitor a subset of network reactions, significantly reducing the data collection burden on future proteomic compendia.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
at a ratio of
0.13 ml of medium/cm2. The cultures were stimulated 24 h later by adding the stimulus diluted in
of the culture volume of serum-free medium for final concentrations of 0, 0.2, 5, or 100 ng/ml TNF (Peprotech); 0, 1, or 100 ng/ml EGF (Peprotech); and 0, 1, 5, or 500 ng/ml insulin (Sigma). The C225 anti-EGFR monoclonal antibody (H. S. Wiley laboratory) and IL-1ra (Amgen) inhibitors were used at 10 µg/ml. Triplicate plates were lysed at 0, 5, 15, 30, 60, and 90 min and 2, 4, 8, 12, 16, 20, and 24 h or prepared for flow cytometry at 12, 24, and 48 h.
Cell Lysis
Cell lysis had to be compatible with multiple assays and combine dead (floating) and viable cells. Floating cells were pelleted from culture supernatant and rinsed with ice-cold PBS. Adherent cells were lysed in ice-cold lysis buffer (as described in 7) and combined with the dead cells pelleted from the culture medium. An aliquot of the whole-cell homogenate was immediately frozen for use in immunoblot analysis. The lysates were clarified by centrifugation (10 min at 15,000 rpm) and aliquoted for antibody array and kinase assay. Total protein concentration was determined using the bicinchoninic acid assay (Pierce) for both the clarified lysates and the whole-cell homogenates (following addition of SDS-based lysis buffer to final concentrations of 112.5 mM Tris-Cl, pH 7.5, 4% SDS, 10 mM ß-glycerophosphate, 10 mM NaF, 10 mM sodium pyrophosphate, 1 mM Na3VO4, 1 µg/ml leupeptin, 1 µg/ml pepstatin, and 1 µg/ml chymostatin).
Kinase Assays
Kinase assays were performed as described previously (7). All signaling measurements were normalized to total protein concentration in each lysate. For kinase assays, we also corrected for variations in ATP specific activity; all measurements are expressed in relative units of kinase activity. Relative units were calculated by comparing the day-specific ratio of t = 0 min (untreated) kinase activity/background activity to a compendium-wide average. The data were normalized so that one relative unit corresponded to the compendium-averaged t = 0 min activity for the kinase.
Antibody Microarray
Antibody microarrays were processed as described previously (24). For EGFR, the same antibodies were used (24), for Akt array measurements anti-Akt (Upstate Biotechnologies) was used as a capture antibody, and anti-phospho-Akt and anti-pan-specific Akt (catalog numbers 4051 and 9272, Cell Signaling Technologies) were used as detection antibodies. We obtained three measurements for EGFR and Akt, phosphoprotein levels, total protein level, and phosphorylated to total protein ratio, from triplicate spots for each lysate. The ratiometric measurement was used to account for spot-to-spot variation in printing. All three measures were normalized to total protein concentration and to the t = 0 min average for each time course and are therefore expressed as -fold activation over the base-line signals.
Immunoblots
For immunoblots, 75 µg of total cell homogenates were separated by SDS-PAGE and transferred to nitrocellulose. Reference samples of HT-29 cells treated with 200 units/ml interferon-
for 24 h and then 500 ng/ml insulin for 30 min or 100 ng/ml TNF for 40 h were run on each gel for normalization purposes. Membranes were cut at specific molecular weights to allow probing with several antibodies for each gel. Immunoprobing was performed as described previously (25). Primary antibodies used were anti-phospho-Akt-Ser473 (catalog number 9271), anti-caspase-8 1C12 (catalog number 9746), anti-phospho-FKHR-Ser256 (catalog number 9461), anti-phospho-IRS-Ser636 (catalog number 2388), and anti-phospho-MEK1/2-Ser217/221 (catalog number 9121) from Cell Signaling Technologies; anti-phospho-IRS1-Tyr896 (catalog number GF1003) from Calbiochem; and anti-caspase-3 (1:250, catalog number SC-7272) from Santa Cruz Biotechnologies. The fluorescence was assessed using a FluorImager 495 scanner (Amersham Biosciences) and quantitated using ImageQuant (Amersham Biosciences). Signal intensities were normalized to intensities in the reference samples to correct for differences in transfer and probing efficiencies. Signals where the t = 0 min average was consistently above background level were normalized to this basal level for each time course and expressed in -fold activation units.
Flow Cytometry
Cells were grown in 24-well plates and treated with the indicated cytokines as described above. For each experiment, triplicate reference samples (untreated, treated with 100 ng/ml TNF, and treated with 100 ng/ml TNF + the prosurvival factor used in that experiment) were collected for normalization purposes and to verify the bioactivity of the cytokines. The supernatant and rinse were saved and combined with the trypsinized cells to assay both floating and adherent cells. Live cells were stained with Alexa 488-labeled annexin V and 1 µg/ml propidium iodide (Molecular Probes). Fixed cells (4% formaldehyde for 10 min followed by 100% methanol) were stained with 50 µg/ml propidium iodide (for sub-G1 DNA content) or with M30 monoclonal antibody against cleaved cytokeratin (Roche Applied Science) and anti-cleaved caspase-3 (Cell Signaling Technologies, BD Biosciences) followed by Alexa 488-conjugated donkey anti-mouse IgG and Alexa 647-conjugated donkey anti-rabbit IgG (Molecular Probes). Samples were analyzed on a BD Biosciences FACSCalibur or FACScan. Apoptosis measurements were normalized to average values found in contemporaneously treated and processed reference samples to account for variations in staining efficiency: 100 ng/ml TNF-treated reference samples were used for propidium iodide permeability and cleaved caspase-3 and cytokeratin staining assays, and untreated reference samples were used for annexin V binding and sub-G1 DNA content assays.
Partial Least Squares Model
The descriptors of signaling dynamics and partial least squares models were generated using MATLAB® and SIMCA-P, respectively, as described elsewhere.3 The fitness of values predicted by the model to the measured values for cell death was evaluated according to the equation,
![]() |
where n is the number of response measurements. The non-parametric 90% confidence intervals on the fitness to measured values were evaluated, based on the Fisher inversion, on the average R2 value for seven models from cross-validation,
![]() |
The variable importance in the projection (VIP) was calculated using
![]() |
where k is the number of variables, wak is the weight of the kth variable for principal component a, A is the total number of principal components, and SSa is the sum of squares explained by principal component a.
To mimic measurement error, random noise was added to the signaling measurements in the 5 ng/ml TNF + 1 ng/ml EGF treatment from a normal distribution centered at zero with a standard deviation of 0.1, 0.2, 0.3, 0.4, or 0.5 times the range of values for the time course in that signal. Time course descriptors were then extracted as described elsewhere.3
| RESULTS |
|---|
|
|
|---|
|
|
Creation of a Cytokine-Signal-Response Data Compendium by Data Normalization, Fusion, and Validation
Our overall aim was to merge time course data collected at different times into a single self-consistent compendium. This involved a variety of experimental and data analysis techniques, some of which were obvious, whereas others were more subtle. One obvious step was to limit day-to-day and sample-to-sample variation by standardizing assay protocols, limiting cells to a narrow range of passage number, and ensuring uniformity in cell culture and treatment conditions through the use of lot-controlled sera, cytokines, and antibodies (supplemental material, Table S1). Raw data were corrected for other sources of variation by signal-specific normalization. As described in greater detail under "Experimental Procedures," normalization included the following: 1) correcting each signaling measurement for the total amount of cellular protein in the sample, 2) adjusting kinase activity measurements for changes in the specific activity of [
-32P]ATP, and 3) comparing immunoblot signals to contemporaneous positive controls to correct for variations in transfer efficiency and antibody binding. Signals with t = 0 min normalized averages that were consistently above background were divided by the t = 0 min average for that treatment to yield a measure of "-fold activation." Other signals were expressed in activity units referenced to a base line of zero. Apoptosis measurements were compared with contemporaneous positive and negative control cell populations to yield a normalized cell death fraction. We found that careful normalization was essential for data fusion: by minimizing the contributions of experimental variability on signal measurements, it was possible to reduce the median coefficient of variation among biological replicates to
11%.
As one means to judge how reliably data from different assays had been fused in the CSR compendium, we examined correlations among three measures of Akt protein kinase activation. Immunoblots and antibody microarrays were used to monitor the levels of phospho-Ser473, a modification that is associated with Akt activation (31, 32) (Table I and Fig. 2A). An in vitro immunocomplex kinase assay was used to monitor Akt-catalyzed phosphorylation of a sequence-optimized peptide substrate (Table I and Fig. 2A). After normalization, we found that all three measures of Akt were correlated across the full panel of cytokine combinations with pairwise Pearson coefficients of 0.800.87 (Fig. 2, BD). Although experimental error may be partly responsible for the discrepancies among these measures, the biochemistry of Akt was also an important factor. Akt activation is known to involve two phosphorylation events: Ser473 modification by rictor-mTOR (mammalian target of rapamycin) (33) and modification of Thr308 by 3-phosphoinositide-dependent protein kinase 1 (34). Although fully activated only when doubly phosphorylated, Akt is at least partially active with only one modification (35, 36). We might therefore expect to see situations in which Akt activity was relatively high but Ser473 phosphorylation was low; this is exactly what was observed in cells treated with saturating TNF + insulin at t = 212 h (Fig. 4A). Other discrepancies in measures of Akt activation could similarly be explained by current understanding of its regulation. In summary, the strong correlation among three different measures of Akt kinase activity argues that data from different biochemical assays can be fused into one compendium. It is important to note, however, that correlations among different measures of protein function are critically dependent on understanding the mechanisms of signal activation and inhibition.
|
|
14% coefficient of variation between measurements. This is only slightly larger than the 11% median coefficient of variation observed among biological replicates assayed at the same time, suggesting that day-to-day variation in compendium data was similar in magnitude to biological variation and measurement noise. We conclude that our experimental methods and data fusion techniques had therefore generated a self-consistent compendium in which data collected at different times could be compared with confidence.
|
30% between observed signals or responses were likely to be biologically significant. Most variation in the compendium was much larger than this, ranging from 3-fold changes in ptEGFR to over 50-fold changes in JNK and MK2 activity. Many of the patterns that could be discerned from the heat map were expected (e.g. EGF and insulin significantly inhibited TNF-induced caspase-8 cleavage at t = 824 h; p < 104; Fig. 4A), but several were unanticipated (see supplemental material, Figs. S1S4). In these cases the primary value of the CSR compendium was its ability to put each measured signal within the broader biological context of other signals, treatments, and time points. In addition to visual inspection, the quality of the CSR data enabled us to examine quantitative correlations between cytokines and signals. We applied a previously described statistical mapping technique based on eigenvalue decomposition and rotation (25).2 Briefly each protein signal in the compendium was integrated over its 24-h time course and then projected onto a set of classifiers describing the TNF and EGF or insulin treatment (see 25 for details). This projection generated a map in which the positions of 19 protein signals were plotted relative to the TNF, EGF, and insulin stimuli. The closer a protein signal is to a cytokine, the more strongly and specifically the signal is activated by that cytokine. For example, TNF strongly induces caspase-8 cleavage and JNK1 and MK2 activation (14, 15), and these signals were closest to TNF on the cytokine-signal map (Table II). Of particular interest were unanticipated map positions. We illustrate this point with two examples: one in which an unexpectedly close correlation was observed between EGF and pIRS1-Tyr896 and a second in which a lack of correlation was observed between the Akt kinase and its substrate, pFKHR.
|
In agreement with results from the cytokine-signal map, strong IRS1-Tyr896 phosphorylation was observed in time courses of EGF-treated cells, but the modification was undetectable in the presence of insulin or TNF alone (Fig. 4, A and C). EGF-induced IRS1 modification was not unique to HT-29 cells: EGFR-dependent IRS1-Tyr896 phosphorylation was also observed in A431 epidermoid carcinoma cells.4 EGF has previously been reported to promote tyrosine phosphorylation of IRS1, but specific sites of phosphorylation have not been identified (4345). To search for candidate IRS1-Tyr896 kinases, we used the Scansite motif-based searching algorithm (46), which compares Swiss-Prot protein sequences to experimentally determined consensus sites for dozens of protein kinases. Strikingly the amino acid sequence surrounding IRS1-Tyr896 (EPKSPGEYVNIEFGS) conformed well to the EY(F/V/I) consensus site for the EGFR tyrosine kinase (47, 48) and ranked in the top 0.113% of potential EGFR substrates overall (Table III). This score is considerably better than the 0.2% cutoff recommended to avoid false positives with Scansite (49). Moreover only one known EGFR autophosphorylation site scored better than IRS1-Tyr896 (EGFR-Tyr1197 at 0.041%); conversely other well characterized EGFR sites had poorer scores than IRS1-Tyr896 (Table III). Several sequences within IRS1 were correctly identified as substrates for the insulin receptor (IR), but IRS1-Tyr896 scored only within the top 2.82% of sequences for IR tyrosine kinase (Table III and 39). IRS1-Tyr896 lacks a methionine in the +1 position that is important for IR-dependent modification (48) and is therefore less likely to be an in vivo IR kinase substrate. Intriguingly, IRS1-Tyr896 also scored highly as a potential substrate for the fibroblast growth factor receptor and platelet-derived growth factor receptor tyrosine kinases (data not shown). This raises the possibility that IRS1 functions downstream of EGFR and other growth factor receptors. Directed experiments in vivo and in vitro will be required to test this hypothesis and unravel the role played by IRS1 in cross-talk between EGFR, fibroblast growth factor receptor, platelet-derived growth factor receptor, and insulin receptors.
|
autocrine loop,2 and a transient pFKHR-Ser256 peak (Fig. 4, A, D, and E). In contrast, Akt was strongly induced in a sustained fashion by insulin, but no significant increase in pFKHR-Ser256 was observed (Fig. 4, A, D, and E). Thus, within the CSR compendium, Akt activation and FKHR-Ser256 phosphorylation are not always correlated (0.01 < R < 0.02 overall with each of the three measures of Akt activation). The absence of a correlation between signals thought to be biochemically coupled is potentially as valuable as a strong correlation because it implies that mechanistic models are incomplete. One simple explanation for the discordance between Akt activation and pFKHR-Ser256 modification is that, in TNF- and EGF-treated cells, FKHR is phosphorylated by a basophilic kinase other than Akt. Indeed two other FKHR phosphorylation sites (Thr24 and Thr316), originally identified as Akt phosphoacceptor sites (52, 54), have recently been reported to be Akt-independent in hepatocytes (55). Furthermore the equivalent sites on a closely related transcription factor, FOXO3 (Thr32 and Ser315), are not phosphorylated by Akt but rather by serum- and glucocorticoid-inducible kinase (SGK) (56). We therefore hypothesize that TNF- and EGF-induced FKHR-Ser256 phosphorylation is mediated by SGK or a similar kinase. It is noteworthy that FOXO3-Thr32, targeted by SGK, ranks even higher than FKHR-Ser256 as a substrate motif for Akt (top 0.001% for FOXO3-Thr32 versus 0.049% for FKHR-Ser256 of all Akt substrate motif matches by Scansite) (46), illustrating the overlap between Akt and SGK substrate specificities. Distinguishing the roles of Akt and SGK has been difficult because both lie downstream of phosphatidylinositol 3-kinase and 3-phosphoinositide-dependent protein kinase 1 (34), but the compendium data may have determined conditions under which Akt is uncoupled from FKHR-Ser256 modification. Follow-up experiments can now be designed to test the idea that the critical FKHR kinase in TNF- and EGF-treated cells is SGK. Moreover novel mechanistic hypotheses can be derived by examining even well known signaling events within a broad biological context of compendium data.
A Linear Model of TNF-induced Cell Death
To evaluate rigorously our design strategy for the CSR compendium (e.g. selection of which signals to measure and time points to sample, assembly of multiple data types, choice of cytokine treatments, and approaches to data processing), we used a data-driven PLSR model. This model, designed to generate predictions of apoptotic responses based on measured intracellular signaling profiles, will be described in detail elsewhere.3 Briefly, signal measurements were cast as independent (predictor) variables, and apoptotic responses were cast as dependent (predicted) variables. To maximize the information extracted from compendium data, the set of independent variables was expanded to include not only the actual measurements of each of the 19 signals at 13 time points (247 independent variables) but also 2225 metrics describing the time-dependent dynamics of that signal. These derived metrics included "local descriptors" such as the instantaneous derivative of the signal at each time point, the area under the curve, and the activation and down-regulation rates for each peak in the time course. "Global descriptors" included the total area under the curve over a 24-h period and the global maximum, mean, and steady-state values of the signal. Together the time point measurements and the derived metrics for each signal constituted a set of 660 independent variables.3 Changes in the independent variables were related to changes in the dependent variables (apoptosis measures) through regression coefficients that the PLSR algorithm calculates for the full range of cytokine treatments in the compendium. Cross-validation showed that the PLSR model could predict the apoptotic responses for any single treatment withheld from the training set with a squared Pearson correlation (R2) of 0.94. Furthermore this model could predict data not in the training set with R2 = 0.91.3
Maximizing Information Content in Models by Unit-Variance Scaling
PLSR models weigh independent and dependent variables with large covariance most heavily (57); this biases the models toward variables with the largest dynamic range. In the compendium, JNK1 activity at t = 15 min varied more than 200-fold across all treatments, whereas Akt activity never varied more than 5-fold at any time point. A priori it is not obvious that large changes in JNK1 activity are more important than more modest but sustained changes in Akt activity, yet regression-based models emphasize the former. To give all variables an equal likelihood of contributing to the PLSR model, we applied unit-variance scaling, a common preprocessing technique in regression analysis (57). Each variable (JNK1 activity at t = 15 min, for example) was divided by the square root of its variance calculated across all cytokine treatments. This maintained the relative variation in the CSR compendium (e.g. JNK1 activity at t = 15 min was still higher for TNF treatment when compared with insulin treatment), but the dataset was scaled so that all variables had the same "spread" of values. Some time courses were altered quite dramatically by unit-variance scaling. For example, high variance in JNK1 activity at t = 5, 15, and 30 min and lower (but potentially informative) variance across treatments at later time points led to a scaled TNF-induced time course in which the second wave of activation was emphasized (Fig. 5A). For caspase-8 activation, easily overlooked differences in the extent of cleavage at early times were amplified relative to cleavage at later times (Fig. 5B). As a general rule, we found that scaling enhanced the importance of signals at late time points relative to those at early times (Fig. 5D). However, a few signals with relatively uniform variance were minimally affected by scaling; raw and scaled phospho-Akt time courses were very similar for example (Fig. 5C).
|
Both scaled and unscaled models predicted cell death responses with good accuracy for single treatments withheld from the training set (squared Pearson correlation of 0.94 with measured values). As a more stringent test of performance, we examined how well the two models predicted death responses under conditions that were different from those used to generate training data. A first set of test data was obtained from cells treated with TNF in the presence of the EGFR-blocking antibody C225 (59), and the second was obtained from cells treated with TNF in the presence of interleukin-1 receptor (IL-1R) antagonist (IL-1ra) (60). We show elsewhere that, at early times, the signaling and cell death response of cells to TNF is mediated in part by an TGF-
-EGFR autocrine loop and, at later times, by an IL-1
-IL-1R autocrine loop2; C225 and IL-1ra disrupt the TGF-
and IL-1
autocrine loops, respectively. Their importance for the current discussion is that C225 caused fairly dramatic changes in signaling as early as t = 15 min but only small changes in responses. Conversely IL-1ra elicited more modest changes in signaling (mostly at t > 4 h) but caused substantial changes in TNF-induced cell death responses (Fig. 6, AC). When test data from cells treated with TNF + C225 were used as independent variables in the scaled PLSR model, cell death response predictions had a least squares fitness of 0.81 to experimental values (a perfect prediction would yield a fitness of 1; see "Experimental Procedures"). In contrast, the unscaled model performed very poorly, exhibiting no significant correlation between prediction and experiment (Fig. 6D). Although the impact of scaling was less dramatic for TNF + IL-1ra test data, we again found that the scaled model performed significantly better than the unscaled model (0.94 versus 0.81 least squares fit; Fig. 6D). We conclude that important information for predicting cell fate responses is contained in the small, consistent variations of low dynamic range signals and intermediate-to-late time points. Unit-variance scaling extracts this information from the CSR compendium and enables accurate prediction of cell death in treatments that are different from those within the training set data.
|
The Value of Heterogeneous Signals Measured by Distinct Assays
To test our assumption that it was valuable to sample multiple features of a signaling network (Fig. 1C) we built PLSR models from subsets of the data. As expected, models built on measurements on a single protein performed poorly (Fig. 7A). More surprisingly, models built from multiple signals gathered using a single type of assay (e.g. immunoblots) were also inferior to the model based on the full compendium (Fig. 7B). For example, kinase activity data were as good as the full model at predicting apoptosis following TNF + IL-1ra treatment (0.94 least squares fit) but very poor at predicting the outcome of TNF + C225 treatment (no significant fit; Fig. 7B). Overall, immunoblot data yielded the best single assay model, but this model was still significantly less predictive than the full model (0.84 versus 0.94 for TNF + IL-1ra and 0.65 versus 0.81 for TNF + C225; Fig. 7B). We do not yet know whether the limitations of single-assay data can be overcome simply by measuring many more signals, using the same assay or whether something more fundamental is at work. However, given current technology, cell-signaling measurements are most powerful when different techniques are used in combination.
|
Contributions from Derived Metrics
The full PLSR model was constructed using both signal measurements and derived metrics as independent variables. Was it important to include these derived metrics? Remarkably a model that included derived metrics but excluded actual signal measurements performed as well as the full model on both sets of test data (Fig. 7E, "No time points"). In contrast, a model built from signal measurements alone was unable to predict cell death in the C225 test dataset (Fig. 7E, "Time points"). Moreover activation rates and global descriptors of signaling dynamics (area under the curve, mean, maximal, and steady-state values) were among the most heavily weighted variables in the full model.4 Why are derived metrics so important for model construction? In the PLSR approach described here, each time point is treated as a separate independent variable, and thus, derived metrics are the only variables that contain information on how measurements were ordered in time. To assess the relative value of each derived metric, we constructed models from single metric types. Several of the models accurately predicted apoptotic responses in the IL-1ra test data, but only one (based on activation rate, or on-slope) yielded predictions with a significant fitness to the measured apoptosis values in the C225 test data (Fig. 7E). Combining several derived metrics therefore seems to yield the most predictive power. Because derived metrics are more predictive than measurements themselves, we conclude that information encoded in time-dependent signaling is a critical aspect of the CSR compendium.
Predictive Information in Early and Late Signals
Can early signals predict late responses? Surprisingly, early time points between 5 and 90 min were predictive of cell death in the combined test datasets 1248 h later (Fig. 7F), although they were significantly worse at predicting the C225 than IL-1ra test data (data not shown). This suggests that protein activities at early times encode much of the information needed to specify an apoptosis-survival cell fate decision. Late time points (t > 12 h) also had good overall predictive power (Fig. 7F), but this is to be expected because caspase activation at these late time points is mechanistically linked to cell death. Strikingly, a clear drop was observed in the predictive ability of the protein signals that were measured at t = 2, 4, and 8 h (Fig. 7F). It seems unlikely that information is really lost between 2 and 8 h; rather it may be encoded in transcriptional responses absent from our dataset (61). We hypothesize that the inclusion of transcriptional data in the compendium will reestablish the predictive power of models based on these time points. Taken together, compendium-based modeling results suggest that early signals immediately downstream of receptors contain much of the information needed to specify cell fate decisions up to 24 h later.
| DISCUSSION |
|---|
|
|
|---|
A key step in the construction of self-consistent datasets is data fusion. Heterogeneous measurements acquired at different times must be combined in such a way that they can be compared quantitatively across the full dataset. Experimental databases are common in genomics largely because sequence data are homogeneous and structured with a clear beginning and end, and data fusion is therefore straightforward. In contrast, cell-signaling data are heterogeneous and unstructured, lack an obvious completion point, and depend heavily on biological context. These features make fusion of signaling data challenging. Our approach to building a data compendium started with the definition of an experimental template that was applied repeatedly to the analysis of different cytokine combinations. The template specified standardized experimental protocols and optimized normalization procedures for determining the time-dependent activities of 19 protein signals and four cell death responses over a 48-h period. The template was applied to data collected from cells treated with 10 cytokine combinations and two combinations of a cytokine and a receptor inhibitor. Overall the raw cytokine-signal-response dataset contained
10,000 quantitative measurements assembled from 12 independent experiments.
Accuracy and Self-consistency of the CSR Compendium
To support reliable hypothesis generation, data fusion must maintain measurement accuracy and self-consistency. A number of procedures were used to ensure the accuracy of the data, and these procedures were validated experimentally. First, the linearity of each assay was verified, and measurements were shown to be directly proportional to signal strength. Experimental protocols, cell treatments, and reagents were standardized, and measurements were adjusted for assay- and sample-specific fluctuations in factors such as reagent activities and lysate concentrations. For each treatment and time point, at least three replicate biological samples were measured, allowing formal statistical testing and assessment of the magnitude of biological variation and measurement noise.
Second, to rule out fluctuations in measurements during the 18-month data collection period, measurements in the primary compendium were compared with those in a validation dataset. The validation dataset replicated a subset of the compendium measurements but with an orthogonal experimental design. Whereas the primary compendium consisted of single treatments analyzed at different time points, validation data were collected at a single time from cells treated with different cytokine combinations. For most signals and responses, validation data matched primary data very well. A few signals could not be validated quantitatively because they lacked sufficient dynamic range to distinguish real variation from noise. Nonetheless we conclude that most signals and responses were consistent throughout the data collection process; validation experiments must be designed to maximize the number of signals showing significant variation across treatments.
As a third self-consistency check we asked whether different measures of the same biochemical process yielded similar data. In DNA sequencing this type of quality control is straightforward and involves checking the noncoding and coding sequences for complementarity. However, verifying the congruence of protein signals is more complex in part because a mechanistic model relating the different biochemical properties measured by each assay is necessary. For example, we assayed the activation of Akt both by kinase activity assay and by tracking its phosphorylation state at Ser473, an activating site, using immunoblots and antibody arrays. Perfect correlations between these assays would require that phospho-Ser473 be essential for kinase activity and that all forms of phospho-Akt-Ser473 be fully active. Although kinase activity and phospho-Ser473 assays of Akt were generally well correlated, subtle discrepancies were observed. These discrepancies are consistent with the known mechanisms of Akt regulation (bisphosphorylated Akt is more active than Akt singly phosphorylated at Ser473 or Thr308 (35, 36)), but they also highlighted the challenges involved in validating the merger of data from heterogeneous assays into a single coherent representation.
Context-dependent Signaling Dynamics
The CSR compendium allowed us to compare signaling dynamics within a range of cytokine contexts. We used two approaches to mine compendium data for interesting signaling patterns: heat maps (see Supplementary Material) and cytokine-signal maps.2 One surprising finding was that the treatment of cells with EGF, but not with insulin, was strongly associated with phosphorylation of insulin receptor substrate 1 on Tyr896. The sequence flanking IRS1-Tyr896 is a remarkably good match to an EGF receptor consensus phosphoacceptor site, leading us to hypothesize that EGF may be an important pIRS1-Tyr896 inducer and that EGFR may modify IRS1 directly. This hypothesis is now being tested experimentally. A second unexpected discovery was a lack of correlation between Akt activation and Ser2256 phosphorylation of the Forkhead transcription factor, an Akt substrate (51, 52). The lack of correlation leads us to hypothesize that FKHR-Ser256 may be targeted by a kinase other than Akt in EGF- and TNF-treated cells. Consistent with this idea, the basophilic kinase SGK has recently been shown to modify the FKHR homologue FOXO3 on two phosphoacceptor sites previously thought to be Akt substrates (56). These two findings illustrate the value of comparing signal strengths quantitatively across a set of different biological conditions. Unexpected statistical correlations observed among signals are suggestive of new mechanistic links, whereas the absence of an expected correlation suggests that existing links may require reinvestigation.
Experimental Requirements for Signaling Compendia
The construction and validation of computational models of cell signaling was our primary goal in generating the CSR compendium. As a first step, we built a regression (PLSR) model that relates protein signals to cell death responses.3 PLSR models built from data scaled to unit variance extracted the most information from the compendium and performed much better than models built from unscaled variables. The best PLSR data-driven model was surprisingly powerful: given signaling information from test data not included in the training set, the model predicted apoptotic responses to within the range of error for experimental measurements.3 PLSR models were used to analyze the design and assembly of CSR data with the goal of developing optimized strategies for the construction of future compendia.
Would a simpler experimental scheme with fewer signals, treatments, or time points have yielded a CSR compendium with a similar ability to support PLSR modeling? Although the answer appears to be "yes" because some data scored low in the VIP (Fig. 5F), statistical modeling clearly benefited from heterogeneous measurements, information on signal dynamics, and the analysis of cells treated with combinations of cytokines. Models built from single signals were ineffective at predicting apoptosis, consistent with the idea that cells weigh the contributions from multiple signals when committing to survival or death (62). More interestingly, models built from any single type of assay were significantly less predictive than models based on the full compendium. Among the single- assay models, immunoblot data performed best probably because immunoblots were used to assay two distinct biochemical processes: protein phosphorylation and protein cleavage. Most proteomic efforts to date (63) have centered on the use of a single technology such as protein arrays (64) or mass spectrometry (10). However, our results suggest that it is highly advantageous to combine several different assay methods to capture the diversity of protein signals that mediate cell fate decisions.
If the analysis of single proteins is insufficient to predict cell fate, then how much coverage of a signaling network is required? The success of our CSR compendium shows that it is not necessary to strive for complete coverage: apoptosis was predicted within 94% accuracy given signaling information from only 11 of the roughly 75 protein nodes shown in Fig. 1C (15% coverage). However, we suspect that a sparse sampling of a network will only yield predictive models when nodes are repeatedly sampled under a variety of conditions. In a typical experiment one usually decides to vary one parameter at a time, but we have found that data from cells treated with a combination of cytokines were more informative than data from single cytokine treatments particularly when measurements were noisy. Combinations of prodeath and prosurvival cytokines evoke more complex signaling patterns than individual stimuli, and it seems likely that this provides a broader context within which to assess the role of each signal in cell death responses.
Which signals should be measured in the protein network? Although the answer clearly depends on the biological system, we found that upstream signals (e.g. receptors and adaptors) were easiest to relate to cytokine stimuli in statistical maps. For instance, IRS1-Tyr896 phosphorylation was prominent as an EGF-dependent signal on the cytokine-signal map.2 With the PLSR model, however, downstream signals like ERK, JNK1, and MK2 were most predictive of apoptotic responses.3 We also found that measuring connected nodes in the network, initially thought to represent somewhat redundant assays, provided useful insight into the propagation of signals through particular pathways. Akt-FKHR is one example of a densely sampled region of the network; another is the dual measurements of ERK activity and phospho-MEK level (Fig. 8B). The correlation between these two signals was reasonable (Pearson coefficient of 0.68 overall), but more in-depth analysis revealed that ERK and phospho-MEK were best correlated at the earliest time point after cytokine addition, implying that their activation was coincident (Fig. 8A, 5 min). At later time points, MEK was inactivated more rapidly than ERK, and the correlation was lost probably as a result of differential dephosphorylation (Fig. 8, A and B) (65). The ERK-MEK disparity was exacerbated under conditions of saturating TNF treatment (Fig. 8A, circles). Thus, TNF may activate the Ser-Thr phosphatases that act on MEK or alternatively inactivate the dual specificity and Tyr phosphatases that act on ERK (Fig. 8B). Directed phosphomapping experiments and phosphatase assays should be able to test this prediction. More generally, the variability in MEK-ERK induction highlights how proteins that interact directly and function in the same pathway can exhibit time-dependent changes in relative activities.
|
PLSR models based on single time point snapshots between 5 and 90 min or 12 and 24 h were able to predict apoptotic responses with reasonable accuracy although not as well as the full model. A clear loss of information was observed with protein signals at t = 28 h. In HT-29 cells, TNF-induced apoptosis is not apparent until
8 h after stimulus addition,4 implying that cells must still be processing apoptosis-survival cues at those time points. One explanation for the loss of predictive power from 2 to 8 h is that information processing involves changes in gene transcription, which were not measured in our experiments. Microarray experiments are currently underway to test this idea. Nonetheless it is clear that protein signals alone, when sampled over a time course, are sufficient for predicting HT-29 apoptotic responses induced by cytokines. It will be interesting to determine whether PLSR models based on the current CSR compendium will also be effective in predicting apoptosis induced by other stimuli or in different cell lines.
Conclusion
The data in this article show that cytokine-signal-response compendia should be constructed using measurements that are well distributed across a signaling network, although sparse coverage of the network is acceptable. Nonetheless resampling a subset of nodes using multiple assays helps to verify the consistency of heterogeneous data. Experimental validation of measurements is best carried out under conditions in which all signals have a sufficiently large dynamic range for correlation coefficients to be meaningful. In some cases this will involve two validation experiments, for example one at an early time point and one later. Measurement of signaling dynamics is clearly critical for CSR compendia, and uneven spacing of time points makes it possible to sample adequately both transient and sustained signals. Within the context of the regression-based models used here, it is important to rely not only on the data themselves but also to compute metrics that describe signal dynamics. Moreover scaling data to unit variance is necessary to optimize the extraction of information from compendium data.
In summary, we have described methods for compiling validated, self-consistent data compendia describing time-varying signals induced by cytokines and have demonstrated the value of compendium data in the study of mammalian signal transduction. We believe that similar CSR compendia coupled with both regression-based and mechanistic models will be valuable in the systematic analysis of other complex biological networks.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, July 18, 2005, DOI 10.1074/mcp.M500158-MCP200
1 The abbreviations used are: TNF, tumor necrosis factor
; CSR, cytokine-signal-response; EGF, epidermal growth factor; EGFR, epidermal growth factor receptor; ERK, extracellular signal-regulated kinase; FKHR, Forkhead transcription factor; pFKHR, phospho-FKHR; FOXO3, Forkhead box O3a; IKK, I
B kinase; IL-1
, interleukin-1
; IL-1R, IL-1 receptor; IL-1ra, IL-1 receptor antagonist; IR, insulin receptor; IRS1, insulin receptor substrate 1; pIRS1, phospho-IRS1; JNK, c-jun N-terminal kinase; MEK, mitogen-activated protein kinase and extracellular signal-regulated kinase kinase; MK2, mitogen-activated protein kinase-activated protein kinase 2; PLSR, partial least squares regression; ptAkt, phospho-to-total Akt; ptEGFR, phospho-to-total EGFR; SGK, serum- and glucocorticoid-inducible kinase; TGF-
, transforming growth factor
; VIP, variable importance in the projection. ![]()
2 K. A. Janes, S. Gaudet, J. G. Albeck, U. B. Nielsen, D. A. Lauffenburger, and P. K. Sorger, submitted for publication. ![]()
3 K. A. Janes, J. G. Albeck, S. Gaudet, P. K. Sorger, D. A. Lauffenburger, and M. B. Yaffe, submitted for publication. ![]()
4 S. Gaudet and P. K. Sorger, unpublished observations. ![]()
5 K. A. Janes and P. K. Sorger, unpublished observations. ![]()
* This work was supported by National Institutes of Health Grant P50-GM68762 and by the Whitaker Foundation (to K. A. J.). ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
|| To whom correspondence should be addressed: Dept. of Biology, Rm. 68-371, Massachusetts Inst. of Technology, 77 Massachusetts Ave., Cambridge, MA 02139. Tel.: 617-252-1648; E-mail: psorger{at}mit.edu
| REFERENCES |
|---|
|
|
|---|
B activity.
Annu. Rev. Immunol. 18, 621
663[CrossRef][Medline]
-induced mitogen-activated protein kinase and nuclear factor
B signaling pathways.
Cancer Res. 60, 2007
2017
mimic the effects of insulin in human fat cells and augment downstream signaling in insulin resistance.
J. Biol. Chem. 277, 36045
36051This article has been cited by other articles:
![]() |
A.-k. Khimji, R. Shao, and D. C. Rockey Divergent Transforming Growth Factor-{beta} Signaling in Hepatic Stellate Cells after Liver Injury: Functional Effects on ECE-1 Regulation Am. J. Pathol., September 1, 2008; 173(3): 716 - 727. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Saez-Rodriguez, A. Goldsipe, J. Muhlich, L. G. Alexopoulos, B. Millard, D. A. Lauffenburger, and P. K. Sorger Flexible informatics for linking experimental data to mathematical models via DataRail Bioinformatics, March 15, 2008; 24(6): 840 - 847. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Jorgensen and R. Linding Directional and quantitative phosphorylation networks Brief Funct Genomic Proteomic, February 12, 2008; (2008) eln001v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Roach, K. D. Smith, K. L. Strobe, S. M. Nissen, C. D. Haudenschild, D. Zhou, T. J. Vasicek, G. A. Held, G. A. Stolovitzky, L. E. Hood, et al. Transcription factor expression in lipopolysaccharide-activated peripheral-blood-derived mononuclear cells PNAS, October 9, 2007; 104(41): 16245 - 16250. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. H. Fan, A. Au, K. Tamama, R. Littrell, L. B. Richardson, J. W. Wright, A. Wells, and L. G. Griffith Tethered Epidermal Growth Factor Provides a Survival Advantage to Mesenchymal Stem Cells Stem Cells, May 1, 2007; 25(5): 1241 - 1251. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Kemp, L. Wille, C. L. Lewis, L. B. Nicholson, and D. A. Lauffenburger Quantitative Network Signal Combinations Downstream of TCR Activation Can Predict IL-2 Production Response J. Immunol., April 15, 2007; 178(8): 4984 - 4992. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. O'Neill, A. Bhamidipati, X. Bi, D. Deb-Basu, L. Cahill, J. Ferrante, E. Gentalen, M. Glazer, J. Gossett, K. Hacker, et al. Isoelectric focusing technology quantifies protein signaling in 25 cells PNAS, October 31, 2006; 103(44): 16153 - 16158. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Miller-Jensen, K. A. Janes, Y.-L. Wong, L. G. Griffith, and D. A. Lauffenburger Adenoviral vector saturates Akt pro-survival signaling and blocks insulin-mediated rescue of tumor-necrosis-factor-induced apoptosis J. Cell Sci., September 15, 2006; 119(18): 3788 - 3798. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Janes, J. G. Albeck, S. Gaudet, P. K. Sorger, D. A. Lauffenburger, and M. B. Yaffe A Systems Model of Signaling Identifies a Molecular Basis Set for Cytokine-Induced Apoptosis Science, December 9, 2005; 310(5754): 1646 - 1653. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |