If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Detecting temporal change in proteins depends on fold change and variability.
Replicate time courses improve reliability of detecting temporal dynamics.
Temporal experiments require a dense sampling of cells to track gradual transitions.
Time-course trajectory experiments require more samples than two-state comparisons.
Single-cell measurements are uniquely capable of characterizing cell-to-cell heterogeneity and have been used to explore the large diversity of cell types and physiological functions present in tissues and other complex cell assemblies. An intriguing application of single-cell proteomics is the characterization of proteome dynamics during biological transitions, like cellular differentiation or disease progression. Time-course experiments, which regularly take measurements during state transitions, rely on the ability to detect dynamic trajectories in a data series. However, in a single-cell proteomics experiment, cell-to-cell heterogeneity complicates the confident identification of proteome dynamics as measurement variability may be higher than expected. Therefore, a critical question for these experiments is how many data points need to be acquired during the time course to enable robust statistical analysis. We present here an analysis of the most important variables that affect statistical confidence in the detection of proteome dynamics: fold change, measurement variability, and the number of cells measured during the time course. Importantly, we show that datasets with less than 16 measurements across the time domain suffer from low accuracy and also have a high false-positive rate. We also demonstrate how to balance competing demands in experimental design to achieve a desired result.
Individual cells express a unique proteome; this is true for cells in a complex environment as well as cells in a laboratory-controlled cell culture experiment. These differences arise from both intrinsic and extrinsic factors, such as access to nutrients, spatial relationships to other cells, or cell cycle status (
). Tissues in multicellular organisms often contain a variety of discrete cell types, each of which expresses a unique proteome, and the combination of these functional cell states gives rise to the overall tissue function. Single-cell measurements facilitate the study of such differences by quantitatively measuring transcript or protein abundances for individual cells (
), and this still remains a widely used data source. However, intracellular and extracellular functions are most often carried out by proteins, e.g., cytoskeletal structures, metabolic enzymes, signal transducers, and others. Previous work over the past decade has revealed that mRNA abundance measurements are a poor proxy for protein abundance measurements (for review, see (
)). As single-cell proteomics gains momentum, it is important to note the practical limitations that are encountered for its practitioners, particularly as they relate to the number of cells that can be analyzed within an experiment. Single-cell mRNA-sequencing experiments, which benefit from ligation-based barcoding (
). Therefore, the primary limitation in the proteomic profiling of a very large number of individual cells (>1000) is still instrument acquisition time. For this reason, many researchers face practical limitations in the type of experiment that can be designed to investigate proteomic phenotypes at the single-cell level.
Characterizing proteome dynamics over time is critical for understanding cellular differentiation, disease progression, and treatment response. In contrast to a two-state comparison, a time-course experiment collects measurements several times during a biological process. One fundamental question for this experimental design is how many time points need to be sampled in order to detect protein dynamics. Because of the practical limitations of single-cell proteomics, it is possible that the number of time points analyzed may not be sufficient to achieve statistical confidence in the dynamic trends. Whereas the t test commonly used in a two-state comparison is well behaved with a small number of samples in each state (e.g., 5–10), it is unknown how such a small sampling will impact the ability to detect trends in time-course data. A variety of tools have been created to help map expression dynamics or trajectories (
). However, these are often created within the assumptions of an experiment where the number of sampled cells across the time domain is potentially thousands. Herein, we present a method to facilitate estimating the number of cells needed in a temporal dynamics experiment by systematically exploring the impact of proteome variability and effect size through a large simulation similar to a power analysis.
All calculations and data used in this article can be found in our publicly available GitHub repository, https://github.com/PayneLab/SingleCellSampleSize. Below are listed specific scripts used to create figures and metrics cited in the article.
Calculating Accuracy and False Discovery
For the data presented in Figures 1 and 2, we simulated a large population of “cells” with a single protein abundance and time measurement using the standard formula y = mx + b as protein_abundance = slope ∗ time + 1 + ε. The error term, ε, is a random error that mimics the biological and/or technical variability in a measurement. This error is drawn from a zero-centered normal distribution with a specified standard deviation. For a single simulation, slope and standard deviation are chosen from slope s ∈ [0.5, 1, 2, 4] and standard deviation v ∈ [0, 0.25, 0.5, 0.75, 1]. We then created a population of 10,000 with random seeds for the time variable between 0 and 1. The full software for making simulated populations can be found in the GitHub repository in a file called simulate_data.R.
To determine the true-positive (TP) and false-positive (FP) rates, we sampled a specific number of cells from the population, n_sample ∈ [7, 16, 20, 30, 100]. With the protein abundance and time values for these subsampled cells, we used cellAlign (
) to interpolate a trajectory (supplemental Fig. S1). Next, we calculated the area between the interpolated trajectory and the population's true temporal trajectory, ABC_true. The null hypothesis (no change) was tested by calculating the area between the interpolated trajectory and a horizontal line at the mean protein abundance of the subsampled cells; this metric is called ABC_null. If the ABC_true < ABC_null, then we assert that the data from the subsampling represent a changing protein; if ABC_true > ABC_null, then we assert that the subsampling represents a nonchanging protein. The accuracy reported in Figure 1 is the result of 1000 subsamples per population. We also repeated the entire simulation 10 times, which is shown in Figure 1 as the error bars. The full software for calculating accuracy and making the panels in Figure 1 can be found in our GitHub repository in the file called makeFigure1.R.
The process for calculating the false discovery was very similar to calculating accuracy, except that the correct answer was that there was no change, and the incorrect answer was that interpolated trajectories matched a sloped line. We simulated a large population of cells, using the same formula as previously used one, with the caveat that this population always had a true slope of zero. To calculate false discovery, we again subsampled cells from the population and interpolated a trajectory using cellAlign. We then calculated an area between the curve metric ABC_true, which reflected the distance between the interpolated trajectory and the population's true trajectory (i.e., no change). We also calculated an ABC_falsepositive metric, which reflected the area between the interpolated trajectory and a sloped line. For the plots in Figure 2, the slope was either 0.5, 1, 2, or 4. The FP rate reported in Figure 2 is the result of 1000 subsamples per population. We repeated the entire simulation 10 times, which is shown in Figure 2 as the error bars. The full software for simulation, calculations, and plotting the panels in Figure 2 can be found in our GitHub repository in the file makeFigure2.R.
Generalizing Slope and Variation
To generalize the TP and FP rates, we ran a separate simulation to recalculate these metrics using the slope/variation (S/V) ratio and number of subsampled cells. The data in Figure 3 were generated using the same types of simulations as described previously for Figures 1 and 2 except that different slopes and variations were used to calculate S/V values of 0.5, 1, 1.5, 2, 3, 4, and 6. The plotted data come from different simulations of a single S/V ratio with different values for S and V, respectively. For example, data for S/V of 0.5 were generated using S/V = (0.5/1.0; 0.75/1.5; 1.0/2.0; 1.5/3.0; 2.0/4.0; and 3.0/6.0). As illustrated in supplemental Figure S3, different S/V combinations for the same ratio have equivalent accuracy. This demonstrates that the S/V metric is robust to scale and will still be useful for methods that influence the ratio, such as tandem mass tag experiments, which result in signal compression. The other S/V values were calculated in the same manner. The full code used to simulate, calculate, and plot data in Figure 3 can be found in our GitHub repository in the files makeFigure3A.R and makeFigure3B.R.
) and uses two cell types, the murine epithelial cell line C10 and the murine endothelial cell line SVEC, to plot fold change between cell type and the within-group variation. The fold change for each protein is found by taking the absolute difference between the mean abundance for C10 cells and SVEC cells. The plotted variation is the standard deviation of C10 cells. The full code used to analyze, calculate, and plot data for Supplemental Figure S2 can be found in our GitHub repository in the file makeFigure2.R.
To characterize how well a subsample of measurements can approximate the true slope and variation of a larger population, we ran a simulation as described previously with true S/V = 1. From the large population of 10,000 cells, we subsampled a given number of cells n_sample ∈ [7, 16, 20, 30, 100] and calculated S/Vest. The S/Vest is calculated by using linear regression to fit a line to the subsampled data. The slope of the fitted line is then used as our estimated slope. The estimated variation is found by calculating the standard deviation of the residuals off the fitted line. We calculated S/Vest for 1000 independent subsamples of the population and computed the distribution of the difference S/Vtrue − S/Vest. This is plotted in Figure 4A. Figure 4B shows the number of proteins that would remain after using S/Vest = 1 as a cutoff. We simulated five populations with 1000 cells, at S/Vtrue ∈ [0, 0.5, 1, 1.5, 2]. We subsampled 30 cells from a population, calculated S/Vest, and discarded the subsampling event if S/Vest < 1. The figure shows the percent of sampling events that remained after filtering. The full code used to analyze, calculate, and plot data for Figure 4, A and B can be found in our GitHub repository in the file makeFigure4.R and simulate_data_figure4.R.
One of the challenges associated with mass spectrometry measurements is that they are inherently destructive. In order to measure the proteins in a cell, the cell itself is destroyed (e.g., lysed). Thus, for an experiment that measures change over time, the cells measured at time0 will be different from the cells measured at time1. This means that the change in protein abundance observed between time0 and time1 has at least two potential sources. Some of the change can be attributed to temporal dynamics. The second source of change between cells measured at time0 and time1 is attributed to cell-to-cell variability. We specifically emphasize that even for synchronized cells, or experimental designs where great effort has been taken to homogenize cellular state, there will be a real and observable variability in the proteome between individual single cells. Furthermore, this variability is often much greater than one might initially expect.
A second important element of a time-course experiment is how time is measured. There are numerous relevant applications of time in biology, including cellular differentiation, the transition from health to disease, or response to an external stimulus. In these various experimental systems, time can be absolute (2 PM) or relative (5 min after stimulus). Time can be an observable fact (date) or a value inferred from the data (relative time during the cell cycle as measured by the abundance of various markers). In studying disease progression, the time metric is more accurately a “pseudotime” that measures the approximate progression from a healthy to a diseased state, perhaps measured by visible morphological features. Depending on the specific scientific question, an experimental sampling of the time domain might be rigorously defined at very specific intervals or a random sampling. In this article and the statistical simulations described later, time is abstract. The variable representing time varies within the bounds from zero to one; zero represents the temporal beginning of an experiment and one represents the end.
While many experiments are able to collect samples at multiple, discrete, and specific time points, there are also many experiments where this is not possible. For example, consider cells sampled from the bone marrow with an experimental goal of understanding blood cell development. These collected cells exist in a wide variety of states, including hematopoietic stem cells, lymphoblasts, T cells, and all stages in between (along with all other developmental end points). An important goal of single-cell characterization is to understand the transitions between these states as cells mature. A similar experiment could be designed around disease progression, with a goal of understanding the transition between a healthy cell and a dysfunctional cell.
The experiments described previously view time as a continuum and are interested in the gradual transition of cells over time. This experimental design is fundamentally different from one that has control over time, such as a pharmacological dose/response test. If one has control over time, a t test is appropriate. The focus of this article addresses experiments that do not have the option of collecting samples at discrete and prespecified time points. In this situation, grouping samples into early and late time points creates a structure appropriate for a t test. However, this is inappropriate when one wants to understand the gradual transition between continuous time points. For these experiments, we explore alternative statistical metrics.
To understand how measurement variability, effect size, and the sample size affect our ability to detect temporal changes in proteome dynamics, we performed a large simulation across the relevant parameter space. Each simulated cell has one protein measurement and an associated time variable, calculated from two input parameters. First, the simulation uses a simple linear change as the shape of temporal dynamics, and in our model, this rate of change is called the slope. Second, we specify a measurement variability modeled as a normally distributed error term. Using slope, variability, and a time value between 0 and 1, we can calculate protein abundance (see Experimental procedures section). The simulation starts by creating a large number of cells that cover the entire time range; each cell is represented as an (abundance, time) data point. From the simulated population, we subsampled a small number of cells with the primary goal of determining whether this small sample accurately represents the larger population. The subsampled measurements are used to interpolate a temporal expression trajectory using cellAlign (
); this interpolated trajectory was then compared with the true population trajectory and a null model (supplemental Fig. S1). If the subsampled trajectory was closer to the true population trajectory, then we classified this subsampling event as correct; if the subsampled trajectory was a better fit to the null model, then this subsampling event was classified as incorrect.
We ran multiple simulations with different parameter values for slope and variability. In each simulation, we calculated the average accuracy (see Experimental Procedures section). Figure 1 shows the accuracy of detecting protein dynamics for various combinations of slope, measurement variation, and the number of sampled cells. Several expected trends emerge from the simulation. First, the accuracy improves if more cells are sampled across the time domain. Regardless of the slope or measurement variability, increasing the number of cells improves the temporal expression trajectory interpolated from the subsampling. The smallest number that we sampled, seven cells, has poor accuracy in almost any (slope, variation) parameter set. A significant improvement is seen in increasing from 7 to 16 cells, followed by steady improvement as the number of cells increases to 20, 30, or beyond. Second, accuracy decreases with greater measurement variability. For example, in the simulations where slope = 1, a 16 cell sample has an average accuracy of 95% if the measurement variability is 0.25 but has an average accuracy of 68% if the variability = 1.0.
To characterize the potential for incorrectly labeling a protein as changing, when it actually was unchanged in the time course, we simulated data where the slope parameter was zero. From this large population, we again subsampled, interpolated an expression trajectory using cellAlign, and compared the trajectory to the population's true trajectory (i.e., no change) or a simple sloped line. If the interpolated trajectory more closely matched a sloped line, despite being composed of data points in a population whose slope was zero, we counted this as an FP. As we discovered in simulations previously, the number of sampled cells and the measurement variability have a significant impact on the FP rate (Fig. 2). In several simulations with a small subsampling (e.g., seven cells), the FP rate approached 40%.
To help generalize the results of our simulation, and make them more immediately applicable to proteomics datasets, we grouped parameter sets by their S/V ratio. The simulations in Figures 1 and 2 report results that used a convenient numeric scale. However, proteomic datasets reported in the literature have a wide range of possible values, with some datasets reporting raw quantitative values in the millions and others using log-transformed and zero-centered data. By transforming our results into an S/V ratio, we directly test whether the observed TP and FP trends are scale free. Thus, regardless of how quantitative protein data are obtained or processed, the S/V ratio can provide applicable guidance. In addition, it does not matter whether the source of variability is technical or biological; we account for both in this combined metric.
For a wide range of S/V ratios, we ran simulations to generate the TP and FP rates (see Experimental procedures section). This explicit exploration of the relationship between expression change and variability revealed distinct trends in the ability to correctly identify protein dynamics. For example, in an experiment with 16 cells, an S/V = 2 would have an 80% TP rate (Fig. 3A), meaning that if 100 proteins had this trajectory slope and measurement variability, we expect to detect 80 of them (the other 20 go undetected). Under the same conditions, we also see a 15% FP rate. This means that of all the nonchanging proteins, 15% of them would be incorrectly identified as changing. As with the results presented in Figures 1 and 2, the TP and FP rates improve when a larger number of cells are sampled across the time course. If an experiment sampled 30 cells instead of 16 cells, the TP rate would improve from 80% to 90% for S/V =2; coordinately, the FP rate would fall from 15% to 5%. These charts are essential in understanding how the accuracy of temporal trajectory detection depends both on the number of cells analyzed across the time domain and also the trajectory's slope and the protein's inherent measurement variability.
To help frame these results, we sought to understand S/V values for proteins in real data. We examined a single-cell proteomics experiment that compared two different cell types (
), with a sufficient number of replicates to obtain a reliable estimate of within-group variability (n > 20). We note that this dataset does not demonstrate temporal dynamics but rather the magnitude of differences in protein abundance between biological states. This can still be used to approximate slope if time is scaled to a unit value. Fold change and within-group variability were calculated for each protein in this dataset (supplemental Fig. S2). Most proteins have a small fold change, and often the magnitude of fold change is similar to the magnitude of variability. Thus, relatively few proteins have an S/V ratio above one. As shown in Figure 3, proteins with an S/V below one will have a low TP and high FP rate unless a very large number of cells are used. Proteins with a more attractive TP and FP rate, such as S/V = 2, are only ~3% of proteins in the dataset. Proteins with S/V > 4 are exceedingly rare (<0.2%).
Without an Oracle
The analysis of real data does not benefit from knowing the true protein expression dynamics, as one knows in a simulated dataset. Thus, when trying to apply the lessons learned in a simulation (e.g., Fig. 3) to a real-world time-course dataset, researchers must estimate slope and variation of a protein in their data. We investigated how well we can estimate the slope and variation (S/Vest), using the same simulation methodology as aforementioned. We calculated the precision of S/Vest for sample sizes of 7, 16, 20, 30, and 100 cells (Fig. 4A). As expected, the precision of this estimate improves when more cells are samples. When we sample 100 cells, the S/Vest is typically very close to the real value; the standard deviation of the error is 0.35. If we sample 30 cells, the standard deviation of the error is 0.70. To demonstrate the effect of using this estimate, we simulated how many proteins of various S/V values would be removed from a dataset if S/Vest was used as the filtering criteria (Fig. 4B). If an experiment sampled 30 cells across a time course and used an S/Vest = 1 for a cutoff, approximately 93% of high-quality proteins (S/V = 2) would be retained.
How Many Cells?
A proposed set of dynamic proteins discovered in a time-course experiment represents a mix of TP and FP identifications. Although the aforementioned statistical simulations may help estimate the relative rates of FPs, it is not possible to point out which specific identifications are potentially suspect. The best way to clarify this list and winnow out FPs is via replicate analyses. If the same experiment is independently replicated, the expected TP rate can be expressed as px, where p represents the probability and x represents the number of independent replicates. For example, in a 30 cell time-course experiment with two replicates, we would expect the TP identification rate for proteins with S/V = 2 to be 0.92 or 0.81; the expected FP rate would be 0.052 or 0.0025. Predicting the TP or FP rates for a more complex experimental design, such as requiring n observations among k replicates, can be determined using standard statistical sampling methods. Using these expected rates and the relative number of proteins for various S/V values suggested from supplemental Figure 2, scientists can appropriately plan for various experimental design scenarios.
A challenging part of experimental design is balancing competing priorities and appropriately budgeting a limited resource. As previously discussed, the primary limiting factor in a single-cell proteomics study is the total number of cells that can be analyzed. This number is currently much less than desired, which forces researchers to choose whether more cells should be devoted to a single time course or to additional replicate time courses. In this discussion, we use the term “replicate” to mean a repeated analysis of the time-course sampling.
As a simple illustrative example, imagine allocating a budget of 50 cells in a time-course experiment (Fig. 5). Option A involves two replicates with 25 cells each; option B allocates 16 cells into each of three replicates. With option A, researchers achieve a more dense sampling of the time course and therefore have better TP and FP rates per replicate. Option B has an additional replicate. Although the TP and FP rates are not as good in each individual time course in option B, leveraging three replicates drives the overall FP rate lower than in option A. We intentionally refrain from advocating option A or option B, because both financial and practical constraints will partially determine what will even be possible. Rather the purpose of this section is to demonstrate how to apply the FP and accuracy rates presented in Figure 3 to design an experiment. We note that the software used to calculate these metrics is completely open source (https://github.com/PayneLab/SingleCellSampleSize) and can be adapted to ascertain accuracy and FP rates for any experimental design.
Single-cell proteomics is an emerging technology that promises to help clarify the diversity of cellular phenotypes as well as reveal essential trends in proteome dynamics. The current throughput of single-cell proteomics is significantly lower than single-cell sequencing technologies. Therefore, until there is a dramatic change in proteomics instrumentation and associated technologies, the proteomics community will struggle with the necessity of analyzing fewer cells than they would like. Within this context, it is essential to properly plan experiments on a limited number of cells to maximize the likelihood of success. Statistical power calculations exist for two-state comparison experiments that rely on the t test. However, temporal trajectory experiments lack an experimental planning tool to help estimate the accuracy of different designs. Here, we have simulated the accuracy of detecting a temporal change in protein abundance against the null hypothesis of no change. Simulations explored a variety of metrics, such as the magnitude of temporal change (slope), cell-to-cell heterogeneity (variability), and the number of cells analyzed across the time domain. The simulations highlight the need to analyze a sufficient number of cells across a time course; a higher number of cells across a time course always leads to more favorable TP and FP rates.
All projects work within the bounds of a budget, and herein, we discuss a budget as the number of cells that can be analyzed. Choosing a specific experimental design is a balancing act and may require compromises between the desire to have enough cells to accurately approximate the temporal expression trajectory and practical limitations of the budget. In the broad array of experiments conducted in biomedical and environmental sciences, investigators will have to make this difficult choice. In a clinical experiment monitoring drug response, patient demographics may be particularly compelling, and demand more patients from diverse backgrounds. With a fixed budget of cells, the choice to analyze more patients reduces the number of cells analyzed for each patient. Alternatively, the biological sample itself may be a limiting factor. For rare cell types or highly degraded/diseased samples, there may be a finite number of cells that are available to be analyzed—regardless of budget. The ultimate purpose of this article is to guide readers on how best to set up experiments. Because of the aforementioned experimental constraints, we refrain from advocating a specific design and instead strongly recommend that researchers utilize the data presented herein as a guide or simulate expected TP/FP rates for their own proposed experimental designs.
Our statistical simulations were motivated by their utility in the emerging field of single-cell proteomics. However, the results are applicable to any analysis attempting to identify dynamic trajectories, such as longitudinal studies of an individual person over time (
). For these and similar studies, it is important to understand the variability of a measurement. We show that most proteins have a within-group variability equivalent to or greater than the observed fold change between conditions. If variability to this degree is present in the data, it will be very challenging to confidently detect temporal dynamics with a limited number of measurements across time. Even for studies that utilize rigorous clinical assays with defined technical variability, biological variability must be anticipated and characterized.
Finally, we note that the simulations herein model only simple linear increases and not more complex expression patterns. A common biological experiment measures the response to an external stimulus, often reporting a temporary change followed by a return to the original state (
This work was funded through a sponsored research agreement from Biogen, Inc.
H. B. methodology, software, validation, formal analysis, writing - original draft, writing - review and editing, and visualization; A. J. G. conceptualization, validation, and writing - review and editing; E. D. P. writing—review and editing, supervision, and funding acquisition; R. T. K. writing - review and editing, supervision, funding acquisition; and S. H. P. conceptualization, methodology, validation, formal analysis, writing - original draft, writing - review and editing, visualization, and project administration.
Spatiotemporal dissection of the cell cycle with single-cell proteogenomics.