# Statistical Power Analysis for Designing Bulk, Single-Cell, and Spatial Transcriptomics Experiments: Review, Tutorial, and Perspectives

## Abstract

**:**

## 1. Introduction

## 2. Power Analysis for Bulk RNA-Seq Experiments

#### 2.1. Bulk RNA-Seq Experiment

#### 2.2. Bulk RNA-Seq Power Analysis Tools

#### 2.3. Bulk RNA-Seq Power Analysis Tool Recommendation

## 3. Power Analysis for Single-Cell RNA-Seq (scRNA-Seq) Experiments

#### 3.1. Power Analysis for Cell Subpopulation Detection

#### 3.1.1. Ascertaining Cell Subpopulation Proportions in a Single Tissue

#### 3.1.2. Ascertaining Differential Cell Subpopulation Proportions between Distinct Experimental Conditions

#### 3.2. Power Analysis for DEG Detection

#### 3.2.1. DEGs across Different Conditions for a Cell Type

#### 3.2.2. DEGs across Different Cell Types

#### 3.3. scRNA-Seq Power Analysis Tool Recommendations

## 4. Power analysis for Spatial Transcriptomic Experiments

#### 4.1. Introduction of High-Throughput Spatial Transcriptomics (HST) Technology

#### 4.2. Literature Reviews of Power Analysis for HST Data

## 5. Conclusions

## Supplementary Materials

**Figure 1.**Comparison of bulk RNA-seq, single-cell RNA-seq, and high-throughput spatial transcriptomics technologies in terms of the profiling resolution (level), data structure, and target discoveries.

**Figure 2.**The figure depicts three representative research questions for the analysis of HST data. SVG denotes the identification of a gene with a spatial pattern of gene expression. Tissue architecture refers to the identification of a tissue’s structure through the clustering of similar gene expression patterns. Cell-cell communication, on the other hand, detects the interaction between cells using their spatial information and gene expression data.

**Figure 3.**Depending on the type of HST data, it can be considered as either marked point process data or areal data. First, imaging-based HST data can be regarded as marked point process data. For example, cell locations are analogous to the spatial coordinates of birds’ habitats in the US. Its spatial information is modeled through the distance among habitats. Sequencing-based HST data, on the other hand, can be regarded as areal data on a regular grid. Here the spot, which is a group of cells, can be compared to the states’ aggregated bird habitats. Its spatial information is modeled through the adjacency or neighborhood structure.

**Figure 4.**Key experimental factors in designing HST experiments include: (1) the choice of tissue area, (2) the number and sizes of fields of view (FoVs), and (3) the number of cells and spots. These experimental factors can affect the statistical power needed to achieve the research goals, e.g., those mentioned in Figure 2. For example, the choice of tissue area, along with the number and sizes of FoVs, can determine the degree to which the biological aspects of our interest (e.g., interesting cell subpopulations, or cell-cell communications) are captured in the generated HST data. Likewise, the number of cells and spots can affect the signal-to-noise ratios (effect sizes) of the generated HST data.

**Table 1.**A table shows six software tools for statistical power analysis for bulk RNA-seq experiments. Each tool is presented along with the citation and the software environments that have been implemented.

Tool Name [Citation] (Implementation) | |||
---|---|---|---|

Pilot Data | Pilot Data with Stored Data | ||

Type 1 Error | Poisson Log-normal | - | ‘Scotty’ [33] (Web Interface) |

Negative Binomial | ‘RNASeqPower’ [19] (R package) | - | |

FDR | ‘ssizeRNA’ [31] (R package) | ‘RnaSeqSampleSize’ [34] (R package) | |

‘RNASeqPowerCalculator’ [35] (R package) | ‘PROPER’ [32] (R package) |

**Table 2.**A table with information about different software tools for scRNA-seq power analysis with two distinct detection targets. Experimental Factors: cell number (1), individual number (2), Sequencing depth (3).

Detection Target | # of Samples | Tool Name | Experimental Factor | Software | Model | Power Assessment |
---|---|---|---|---|---|---|

Cell sub- population | Single sample | ‘SCOPIT’ [37] | (1) | R package & Web application | Multinomial | Analytical |

‘howmanycells’ | Web application | Negative Binomial | ||||

Multi sample | ‘Sensei‘ [38] | (1), (2) | Beta Binomial | |||

‘scPOST’ [39] | R package | Linear mixed model | Simulation- based | |||

DEG | ‘scPower’ [40] | (1), (2), (3) | R package & Web server | Negative Binomial | Pseudobulk | |

‘hierarchicell’ [41] | R package | Simulation- based | ||||

Single sample | ‘powsimR’ [42] | (1) | ||||

‘POWSC’ [43] | (1), (3) | A mixture of zero-inflated Poisson and log-normal Poisson distributions | ||||

‘scDesign’ [44] | Gamma-Normal mixture model |

