Sample Size Calculation and Power Analysis for the General Mediation Analysis Method

Rizvi, Nubaira; Bam, Amjila; Cao, Wentao; Yu, Qingzhao

doi:10.3390/stats9010019

Open AccessArticle

Sample Size Calculation and Power Analysis for the General Mediation Analysis Method

¹

Department of Biostatistics & Data Science, School of Public Health, LSU-Health New Orleans, New Orleans, LA 70112, USA

²

Pennington Biomedical Research Center, Louisiana State University, Baton Rouge, LA 70808, USA

^*

Author to whom correspondence should be addressed.

Stats 2026, 9(1), 19; https://doi.org/10.3390/stats9010019

Submission received: 23 December 2025 / Revised: 5 February 2026 / Accepted: 12 February 2026 / Published: 14 February 2026

Download

Browse Figures

Review Reports Versions Notes

Abstract

Mediation analysis is a widely used statistical technique for identifying the mechanisms underlying the relationship between an exposure and an outcome. However, accurate power analysis and sample size determination for mediation models that involve non-normal distributions or mixtures of continuous and binary variables are challenging. We propose a computationally efficient simulation-based approach for general mediation analysis. By applying monotone smoothing splines to estimate empirical critical values derived from extensive simulations, our method enables accurate power calculations without the need for real-time simulation. We validated the method across varying scenarios, including continuous, binary variables and time-to-event outcome with strict Type I error control. The method-quantified large effects (0.35) yielded >80% power at minimal sample sizes (n = 25–50) across all settings, while small effects (0.02) required larger samples. Continuous models achieved 80% power for small effects at n = 410, whereas fully binary models required n > 500. For medium effects (0.15), the power was >0.80 at n = 75 with binary mediators. This study presents a robust framework that combines the flexibility of simulation-based inference with the speed of analytical approximations. We provide an accompanying R package to facilitate efficient sample size planning for mediation models.

Keywords:

mediation analysis; power analysis; sample size calculation

1. Introduction

Mediation analysis (MA) is a popular technique for explaining the underlying relationship between an explanatory/exposure variable and a response/outcome variable. In a mediation analysis, we assume that one or more intermediate variables (mediators) mediate the effect of the exposure variable on the response variable. MA is used to decompose the total effect and estimate the indirect effect between exposure and outcome via each mediator. Identifying these indirect effects is important, as it can help understand how an exposure is responsible for a change in a response, allowing for the further design of targeted and effective interventions to improve the outcome. A critical dimension for ensuring accurate identification of significant mediators is ensuring an appropriate sample size in the conducted tests, which provides sufficient statistical power to detect and estimate substantial indirect effects. Inadequate sample sizes can increase the risk of Type II errors, leading to the misidentification of essential mediators. By conducting sample size calculations and power analysis, researchers can use limited resources to identify mediators and estimate their effects accurately.

In most widely used approaches to mediation analysis, such as the Baron and Kenny causal steps method (1986), mediation effects are tested by fitting generalized linear models to the exposure, the mediators, and the outcome [1]. Several other studies have also done reviews on the mediation analysis process [2,3,4,5,6]. These traditional approaches assume linear associations between the expected outcomes and predictors through link functions. But they struggle when the data are hierarchical (e.g., patients within clinics) or when mediators/outcomes include a mix of continuous, categorical, and survival variables. They are also not easily extendable to nonparametric predictive models. The general multiple mediation analysis method (mma) proposed by Yu et al. (2017) addresses these gaps by allowing for multiple mediators of different types, complex multilevel or longitudinal data structures, and nonlinear or nonparametric relationships [7]. However, due to the generality of the mma method, the required sample size can vary widely depending on the assumptions of the design and analysis. In this paper, we propose a novel simulation-based method for power analysis and sample size calculations based on the mma method by Yu et al.

Several approaches have been considered to determine optimal sample sizes for mediation models. The causal steps method was historically a common choice for sample size calculations. But research indicates that it lacks the statistical power to reliably detect mediation and recommends examining the indirect effect directly instead [8,9]. Fritz and MacKinnon (2007) systematically evaluated various methods for testing mediation and compared their power under different conditions, highlighting the need for sufficiently large sample sizes to detect indirect effects reliably [10]. Most current practices for assessing power and sample size in mediation models rely on Monte Carlo simulation [11,12,13,14]. Recently, Zhang (2014) provided statistical power estimates for detecting indirect effects using the bootstrap method based on this simulation [15]. But these methods often have high computational costs, resulting in lengthy processing times. Other researchers, such as Miočević and MacKinnon (2017), used Bayesian frameworks for mediation analysis, offering a probabilistic interpretation of mediation effects and sample size estimation [16]. In addition, Pan (2018) used simulations to estimate the sample sizes needed to achieve 80% power under varying mediation effect sizes, within-subject correlations (ICC), and repeated measures [17]. Sim (2022) calculated sample sizes for confirmatory factor analysis (CFA) models in mediation analysis [18]. Despite these advances, existing methods are often tailored to specific model structures and do not easily extend to more flexible mediation frameworks. The mma method accommodates all these complexities. In addition, the Sobel method (1982) for testing indirect effects assumes that the indirect effect is normally distributed [19]. But studies have shown that they can deviate from normal distributions, especially in small samples [20]. Therefore, a more general and computationally efficient strategy is needed that can provide accurate power and sample size estimates.

In this paper, we propose methods for statistical power analysis in three distinct scenarios relevant to null hypotheses in mediation analysis. First, the exposure does not affect the mediator. Second, the mediator does not affect the outcome after controlling for the exposure. Finally, the indirect effect of the exposure on the outcome is absent as there is neither an exposure–mediator association nor a mediator–outcome association. These scenarios represent the core pathways in mediation analysis. Assessing power under each provides a comprehensive evaluation of the conditions under which the mediation effect can be detected. To estimate statistical power, our method simulates data under user-specified parameters, derives the sampling distribution of the indirect effect, and compares it with critical values from precomputed null distributions. By repeating this procedure across increasing sample sizes, the approach identifies the minimum sample size required to achieve a desired level of power based on quantiles. This framework provides an efficient and flexible strategy for planning mediation studies. Furthermore, by integrating simulation with machine learning-based quantile estimation, our approach reduces computational burden and improves accessibility. An R 4.4.3 package (mmaPower) has been developed for the sample size and power calculation in mediation analysis.

2. Methodology

2.1. Overview of the General Mediation Analysis Method

In a mediation model, three key effects are specified. Figure 1 illustrates a simple mediation model, where c represents the direct association between the exposure X on the outcome Y that is not transmitted through the mediator M, a denotes the effect of X on M, and b represents the effect of M on Y while controlling for X. The indirect effect of X on Y through M is given by a function of a and b. In linear model settings, a, b, and c are the coefficients for the predictors, the direct effect is c, and the indirect effect through M is a times b, or a ∗ b.

The general mediation analysis involves fitting predictive models and simulating the outcome through resampling the mediator, either marginally or conditionally on the different exposures. Unlike linear approaches, which define the indirect effect as the product of coefficients, the general framework defines the indirect effect (IE) as the expected change in the outcome attributable solely to the shift in the mediator’s distribution induced by the exposure, while holding the direct exposure pathway constant. We used the method detailed by Yu et al. to decompose the total effect (TE) into the indirect effect (IE) and direct effect (DE) [21]. It relies on the following four assumptions listed in Table 1.

Based on these assumptions, the average total effect (ATE) is defined as the average change rate of the outcome, Y, given other explanatory variables, Z, with respect to the treatment or exposure variable, X =

x^{*}

. That is,

{T E}_{| Z} (x^{*}) = \lim_{u \to u^{*}} \frac{E (Y | Z, X = x^{*} + u) - E (Y | Z, X = x^{*})}{u}

and

{{A T E}_{| Z} = E}_{x^{*}} [{T E}_{| Z} (x^{*})]

where

u^{*}

is the smallest positive unit that exists within the domain of the exposure variable, X [7].

The average direct effect (ADE), not from

M_{i}

, is calculated as the ATE, but with

M_{i}

fixed at its marginal distribution (i.e., not varying with the exposure variable). Then,

{D E}_{\ M_{i} | Z} (x^{*}) = E_{M_{i}} [\lim_{u \to u^{*}} \frac{E (Y (x^{*} + u, M_{i}, M_{- i} (x^{*} + u))) - E (Y (x^{*}, M_{i}, M_{- i} (x^{*})))}{u} | Z]

and

{{A D E}_{\ M_{i} | Z} = E}_{x^{*}} {D E}_{\ M_{i} | Z} (x^{*})

, where

M_{- i}

represents the vector of remaining intermediate variables M without

M_{i}

. Finally, the average indirect effect (AIE) through

M_{i}

is calculated as the difference between the average TE and the average TE not from

M_{i}

. That is,

{A I E}_{M_{i} | Z} = {A T E}_{| Z} - {A D E}_{{\ M}_{i} | Z}

.

A key advantage of this method is its invariance across various types of exposures, outcomes, and mediators, regardless of the measurement scales used. The relationships among these variables can be flexibly modeled using a range of predictive approaches, including generalized linear models, nonparametric methods, or survival models. The method can also decompose indirect effects if necessary, allowing for distinguishing the effect carried by an individual mediator from that contributed by multiple mediators simultaneously.

For example, in clinical research, one might investigate whether a new immunotherapy (binary exposure) prolongs patient survival (time-to-event outcome) by reducing tumor burden (continuous mediator). Similarly, in public health, researchers might examine if a housing improvement program (binary exposure) reduces the risk of asthma diagnosis (binary outcome) by improving indoor air quality (continuous mediator). The general mediation method can be used to test the indirect effect and find corresponding effect size for this different combination of variable types. In addition, it is necessary in studies to determine the sample size and corresponding power that is needed to detect the indirect effect of the mediator. This study proposes an algorithm to calculate these under the general mediation analysis method.

2.2. Power Analysis in General Mediation Analysis

2.2.1. Critical Values

In simulation-based mediation power analysis, the decision to accept or reject the null hypothesis is based on comparing standardized indirect effect estimates to critical values derived under the null hypothesis. The null hypothesis is that there is no indirect effect (

θ

), formally expressed as: H₀:

θ

= 0. In the proposed power analysis, we considered three ways in which the indirect effect might be 0 through its different components. The scenarios under the null hypotheses are:

No association between the exposure and mediator; i.e., a = 0.
No association between the mediator and the outcome; i.e., b = 0.
No association between either the mediator and the exposure or with the outcome; i.e., a = 0 and b = 0.

Critical values define the rejection region for statistical significance and depend on the specific alternative hypothesis being tested (e.g., one-sided or two-sided). From the distribution of the indirect effect under the null hypothesis, we identify the critical values that ensure the type I error remains at the pre-set level.

2.2.2. The Algorithm to Find Critical Values

For a prespecified set of parameter values,

Ψ_{k},

k = 1, \dots, K

(see Remark 1),

For the ith scenario under the null hypothesis as listed above, $i = 1, 2, 3$ , adjust $Ψ_{k}$ to $Ψ_{k i}$ .
(1)
Using the parameter setting, $Ψ_{k i}$ , generates $S$ datasets $D_{i}^{(s, k)}$ where $s = 1, \dots, S$ (see Remark 2).
(2)
For each simulated dataset $D_{i}^{(s, k)},$ use the $m m a$ function in the mma package [7] to obtain $B$ bootstrap estimates of indirect effects of the mediator, denoted as ${\hat{θ}}_{0 i}^{(s, b, k)}$ , where $b = 1, \dots, B (s e e R e m a r k 3)$ .
(3)
Step (b) produces $S \times B$ estimates for the kth parameter configuration, $Ψ_{k}$ . Aggregate ${{\hat{θ}}_{0 i}^{(s, b, k)}}$ to form a sample from the distribution of the indirect effect estimates under the null hypothesis ith scenario, ${{\hat{θ}}_{0 i}^{(s, b, k)}}$ ∼ $F_{0} (θ {| H}_{0})$ . Extract the $\frac{α}{2}, α, 1 - α, a n d 1 - \frac{α}{2}$ percentile from the ${{\hat{θ}}_{0 i}^{(s, b, k)}}$ , denoted as $Q_{i k α}$ , $Q_{i k (1 - α)}$ , $Q_{i k \frac{α}{2}}$ and $Q_{i k (1 - \frac{α}{2})}$ for the alternative hypothesis $θ < 0$ , $θ > 0$ or $θ$ ≠ 0 respectively.
(4)
To control for the type I error, $Q_{k α} = m i n (Q_{1 k α}, Q_{2 k α}, Q_{3 k α})$ and $Q_{k (1 - α)} = m a x (Q_{1 k (1 - α)}, Q_{2 k (1 - α)}, Q_{3 k (1 - α)})$ are the lower and upper critical values for one-sided tests with an alternative hypothesis $θ < 0$ and $θ > 0$ respectively with the parameter configuration $Ψ_{k} .$ Similarly, for two-sided tests, $Q_{k \frac{α}{2}} = m i n (Q_{1 k \frac{α}{2}}, Q_{2 k \frac{α}{2}}, Q_{3 k \frac{α}{2}})$ and $Q_{k (1 - \frac{α}{2})} = m a x (Q_{1 k (1 - \frac{α}{2})}, Q_{2 k (1 - \frac{α}{2})}, Q_{3 k (1 - \frac{α}{2})})$ are the lower and upper critical values for the alternative hypothesis $θ$ ≠ 0.
Fit four separate smoothing spline functions with restrictions (see Remark 4), denoted as ${S S}_{α} (Ψ_{k})$ , ${S S}_{(1 - α)} (Ψ_{k})$ , ${S S}_{\frac{α}{2}} (Ψ_{k})$ and ${S S}_{(1 - \frac{α}{2})} (Ψ_{k})$ . The outcomes for these models are the lower and upper critical values $Q_{α}$ , $Q_{(1 - α)}$ for one-sided tests and $Q_{\frac{α}{2}}$ , $Q_{(1 - \frac{α}{2})}$ for two-sided tests, respectively, while $Ψ_{k}$ is the vector of predictors (see Remark 4).

{S S}_{α} (Ψ_{k})

,

{S S}_{(1 - α)} (Ψ_{k})

,

{S S}_{\frac{α}{2}} (Ψ_{k})

and

{S S}_{(1 - \frac{α}{2})} (Ψ_{k})

are the functions to generate critical values when the parameter set

Ψ_{k}

is specified.

Remark 1.

Ψ_{k}

is the kth configuration of parameters,

Ψ_{k} = (a_{k}, b_{k}, c_{k}, σ_{X k}, σ_{M k}, σ_{Y k}, n_{k}),

where

a_{k}, b_{k}, c_{k}

denotes the associations among variables,

σ_{X k}, σ_{M k}, σ_{Y k}

are the standard deviations of corresponding variables, and

n_{k}

is the sample size. K is the number of configurations. Adjust

a_{k} = 0

under scenario 1,

b_{k} = 0

under scenario 2, and both are 0 under scenario 3. The significance level assumed in this study is 0.05. The standard deviations for exposure, mediator, and outcome variables are

σ_{X}

,

σ_{M}

and

σ_{Y}

respectively. We consider four values of standard deviation: 0.5, 1, 2, and 3. Next, to take into account small, medium, and large effect sizes, we consider the following values of effect sizes a, b, and c, where a: 0 to 0.5 by increments of 0.05, b: 0 to 0.5 by increments of 0.05, and c: −3 to 3 by increments of 0.5. Lastly, the data is generated for sample sizes 25, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, and 500. The mma R package provides several methods for estimating power, including the normal, quantile, and bias-corrected bootstrap interval (BCBI), which can be used for both one-sided and two-sided tests [7]. We focus on the popularly used quantile method for both one-sided and two-sided tests.

Remark 2.

Simulated datasets were generated based on the following parameters: sample size, effect sizes and their corresponding standard deviations, number of bootstrap replications, significance level, and the number of replications per simulation setting. The data generation process varied depending on whether variables were treated as continuous or binary.

For exposure (x), if it is continuous, values were drawn from a normal distribution with mean 0 and standard deviation,

σ_{X}

, as specified. If x is binary, values were drawn from a binomial distribution with success probability equal to the specified exposure variability value,

σ_{X}

.

For mediator (m), if continuous, a baseline mediator error term,

m_{0}

, was drawn from a normal distribution with mean 0 and standard deviation,

σ_{M}

. The mediator,

m_{1}

, was then calculated by adding the product of the effect size a and the exposure value to the baseline error,

m_{1} = m_{0} + a * x

. For binary mediators, the baseline values were generated from a binomial distribution with probability,

σ_{M}

. The probability of success for mediator was adjusted using a logistic transformation that incorporated the effect size and baseline distribution parameters. That is,

p = \frac{o d d s}{1 + o d d s}

= \frac{e x p (l o g i t)}{1 + e x p (l o g i t)}

where

l o g i t = l o g (\frac{σ_{M}}{1 - σ_{M}}) + a * x

.

For the outcome (y), if continuous, an initial outcome value was calculated using a linear model with a direct effect c and a normally distributed error term with mean 0 and standard deviation

σ_{Y}

. The total effect was then computed by adding the product of the mediator effect b and the mediator (either baseline or final). For binary outcomes, values were drawn from a binomial distribution, with the probability of success computed via a logistic transformation. The model incorporated the effects of exposure (c) and/or the mediator (b), depending on the specific component of the null hypothesis. Three different Y datasets were generated according to the different components of the hypothesis:

Under a = 0, the mediator effect from exposure was removed, and the probability used was

p = \frac{o d d s}{1 + o d d s}

= \frac{e x p (l o g i t)}{1 + e x p (l o g i t)}

where

l o g i t = \log (\frac{σ_{Y}}{1 - σ_{Y}}) + c * x + b * m_{0}

.

Under b = 0, the mediator’s effect on the outcome was excluded, and the logit used to calculate the probability of drawing the sample was

l o g i t = \log (\frac{σ_{Y}}{1 - σ_{Y}}) + c * x .

Under ab = 0, the indirect effect was constrained to zero, maintaining the direct pathway while testing mediation.

Under ab

\neq 0

, indirect effect exists and the logit will be

l o g i t = \log (\frac{σ_{Y}}{1 - σ_{Y}}) + b * m_{1} + c * x

.

For time-to-event outcomes (y), event times were generated assuming a Weibull distribution. The hazard function was defined as under a = 0,

h (t) = λ ν t^{ν - 1} e^{c X + b m_{0}}

; under b = 0,

h (t) = λ ν t^{ν - 1} e^{c X}

; and under ab

\neq

0,

h (t) = λ ν t^{ν - 1} e^{c X + b m_{1}}

. Here,

λ

and

ν

are the shape and scale parameters. The true event time T was simulated using the inverse probability method:

T = {\frac{- l n (U)}{λ e^{L P}}}^{1 / ν}

, where U~Uniform (0, 1) and LP is the linear predictor under each scenario. To reflect realistic conditions, right censoring was applied. A fixed recruitment period, R, and total study duration, D, were defined, with subjects entering the study uniformly between time 0 and R. The censoring time for each subject was calculated as C = D − E, where E is the entry time. The final observed outcome was defined as Y = min (T, C), with a corresponding event indicator

δ = I (T \leq C)

.

Remark 3.

The replication size

S

was chosen through a convergence analysis. We tried

S \in [4000, 10,000]

. By increasing S, we repeated the simulations until the estimated 2.5th percentile (

Q_{2.5}

) and 97.5th percentile (

Q_{97.5}

) of the estimates converged to the “true” values. The “true” value is assumed to be the percentiles obtained from the 10,000 simulations. The maximum deviation between the sampled and true quantiles was evaluated across 100 repetitions. The convergence is assumed when the maximum absolute error between the sampled and true quantiles remains below 10% of the true

v a l u e

. When this condition is met, the smallest possible replication number, S, is sufficient to produce stable critical values. For the simulation, the number of bootstrap replications, B, was set to be 1000. The number 1000 was chosen because it yields quantile estimates nearly identical to the full model-based results while substantially reducing computation time.

Remark 4.

To ensure the estimated critical values are consistent, we impose shape constraints during the smoothing process using Shape Constrained Additive Models (SCAM) with Monotone P-splines [22]. We implemented this using the scam package in R. Mathematically, each critical value function, SS, is modeled as a linear combination of B-spline basis functions

B_{j} (Ψ_{k})

with coefficients

γ_{j}

. That is,

S S (Ψ_{k}) = \sum_{j = 1}^{k} γ_{j} B_{j} (Ψ_{k})

subject to the monotonicity constraint,

γ_{j} \geq γ_{j - 1}

. The lower critical value functions

{S S}_{α} (Ψ_{k})

and

{S S}_{\frac{α}{2}} (Ψ_{k})

are constrained to be non-decreasing with respect to sample size and non-increasing with respect to standard deviations. Similarly, the upper critical value functions

{S S}_{(1 - α)} (Ψ Ψ_{k})

and

{S S}_{(1 - \frac{α}{2})} (Ψ_{k})

are constrained to be non-increasing with respect to sample size and non-decreasing with respect to standard deviations. In the R implementation, we specified these constraints using bs = “mpi” (monotonic P-splines increasing) or bs = “mpd” (monotonic P-splines decreasing) within the scam() function, ensuring that the critical value surface remains smooth and logically consistent.

2.2.3. Power Estimation

Statistical power in mediation analysis refers to the probability of correctly rejecting the null hypothesis

H_{0}

when there is a nonzero mediation effect, that is, detecting a mediator when its indirect effect,

θ

, is significantly different from zero.

Let

{\hat{θ}}_{1}

denote the estimated indirect effect from the simulated data under the alternative hypothesis

H_{1}

. Assume the parameter setting under

H_{1}

is defined by

Ψ

. We first generate the distribution of

{\hat{θ}}_{1}

. For one-sided tests, when the alternative hypothesis is

H_{1} :

θ < 0

, power is defined as

P ({\hat{θ}}_{1} < Q_{α} | H_{1}),

and for

H_{1} :

θ > 0

, it is

P ({\hat{θ}}_{1} > Q_{(1 - α)} | H_{1})

, where

Q_{α} {= S S}_{α} (Ψ)

, and

Q_{(1 - α)} = {S S}_{(1 - α)} (Ψ)

are the corresponding lower and upper critical values, respectively.

For two-sided tests, when the alternative hypothesis is

H_{1} : θ

≠ 0, power is defined as

P ({\hat{θ}}_{1} \notin [Q_{\frac{α}{2}}, Q_{(1 - \frac{α}{2})}] | H_{1})

, where

Q_{\frac{α}{2}} = {S S}_{\frac{α}{2}} (Ψ)

and

Q_{(1 - \frac{α}{2})} = {S S}_{(1 - \frac{α}{2})} (Ψ)

are the corresponding lower and upper critical values, respectively. That is, the power is the probability that the estimated indirect effect of the mediator falls in the rejection region. To calculate the power, we use the following algorithm.

2.2.4. The Algorithm for Power Calculation

For a user-specified set of parameter values,

Ψ

, under the alternative hypothesis,

Generate datasets $D_{a}^{(s)}$ for $s = 1, \dots, S$ using user-specified parameters, $Ψ$ .
(a)
For each dataset $D_{a}^{(s)}$ , fit the mediation model using the $m m a$ function and obtain B bootstrap estimates of the indirect effect of M, denoted as ${\hat{θ}}_{1}^{(s, b)}$ for $b = 1, \dots, B .$
(b)
Combine all $S \times B$ simulated estimates ${{\hat{θ}}_{1}^{(s, b)}}$ to form the distribution of the estimated indirect effects under the alternative hypothesis, ${{\hat{θ}}_{1}^{(s, b)}}$ ∼ $F_{1} (θ {| H}_{1})$ .
Generate the lower and upper critical values from the fitted smoothing spline functions for the chosen alternative hypothesis.
Calculate statistical power as the probability that the distribution of the indirect effect of M under $H_{1}$ falls in the rejection region.

The predicted power values were constrained not to exceed 1, and the plotted curves were truncated when the predicted power reached or exceeded 0.99 to focus on the region of primary interest. When a specific power threshold was of interest (e.g., 0.80), the function identified the smallest sample size at which the predicted power from each model reached or surpassed the threshold. Vertical reference lines were added to the resulting plots at these points to facilitate interpretation.

3. Simulation Studies

To evaluate the performance of the proposed power estimation method for mediation analysis, we conducted simulation studies across a range of sample sizes and model parameters. The procedure, implemented in the open-source R package mmaPower, computes statistical power by comparing bootstrap-based estimates of indirect effects with pre-generated critical quantile thresholds under the null hypothesis.

3.1. Power Analysis

To illustrate the application of the proposed method, we present a power analysis conducted using the mmaPower() function from the R package mmaPower. In this example, we consider a mediation model with a binary exposure, continuous mediator, and binary outcome, where a = 0.5, b = 0.5, and c = 2, and variances

σ_{X}

= 3,

σ_{M}

= 0.5, and

σ_{Y}

= 0.5. The analysis evaluates the power when the alternative hypothesis is H₁:

θ

≠ 0 at a significance level of α = 0.05, with a sample size of 350. The power calculation was implemented in R as follows:

\begin{matrix} m m a P o w e r & (a = 0.5, b = 0.5, c = 2, x = “ b i n a r y ”, m = “ c o n t i n u o u s ”, \\ y = “ b i n a r y ”, x s i g = 3, m s i g = 0.5, y s i g = 0.5, n = 350, \\ a l p h a = 0.05, t e s t = “ t w o s i d e d ”, p l o t = F A L S E) \end{matrix}

Table 2 summarizes the key arguments supplied to the mmaPower() function. It lists the path coefficients, the variances of the variables, the sample size, the significance level, the parameter being tested, and whether a plot of the results was generated.

To illustrate the utility and flexibility of the proposed algorithm, the function was used to generate power estimates for different scenarios. Table 3 presents the estimated statistical power to detect the indirect effect across varying sample sizes, assuming the exposure, mediator, and outcome are continuous. As expected, power increased with larger sample sizes and stronger effect magnitudes. For small effect sizes (0.02), power remained low at smaller sample sizes but reached the conventional 0.80 threshold at n = 500. For medium (0.15) and large (0.35) effect sizes, power exceeded 0.80 at n = 75 and n = 50, respectively.

Table 4 summarizes the estimated power when both the exposure and mediator are binary, and the outcome is continuous. Compared to the fully continuous scenario, power increased more slowly with increasing sample size, particularly for small effect sizes. The method achieved approximately 0.80 power for a small effect size, near n = 500, reflecting the loss of information associated with dichotomizing predictors. For medium and large effects, power remained adequate even at moderate sample sizes (n = 50–75). These findings indicate that while binary predictors slightly reduce sensitivity, the quantile-based approach remains robust and computationally efficient.

Table 5 shows the power estimates for the case in which the exposure, mediator, and outcome are all binary. This configuration yielded the lowest overall power across comparable sample sizes. For small effects, power increased gradually and reached 0.80 at >500. For medium and large effects, power exceeded 0.80 at around n = 350 and n = 200, respectively.

Table 6 displays the power estimates for a mediation model with a binary exposure, continuous mediator, and time-to-event outcome. Assuming a 24-month recruitment period for a 60-month study duration, for small effects, power increased gradually but remained limited, reaching 0.80 only at n > 500. For medium and large effects, power exceeded 0.80 at around n = 225 and n = 100, respectively.

3.2. Sample Size Calculation

The mmaPower() function can also be used to generate a plot of statistical power as a function of sample size. For the mediation model specified with parameters a = 0.5, b = 0.5, c = 2, and standard deviations

σ_{X}

= 3,

σ_{M}

= 0.5, and

σ_{Y}

= 0.5, the power versus sample size plot is obtained as follows:

\begin{matrix} m m a P o w e r & (a = 0.5, b = 0.5, c = 2, x = “ b i n a r y ”, m = “ c o n t i n u o u s ”, \\ y = “ b i n a r y ”, x s i g = 3, m s i g = 0.5, y s i g = 0.5, n = 350, \\ a l p h a = 0.05, t e s t = “ t w o s i d e d ”, p l o t = T R U E) \end{matrix}

Figure 2 shows the resulting power curve for a two-sided test using the proposed method. For the chosen parameter values, the estimated statistical power at a sample size of n = 350 is approximately 0.75. This suggests that a sample size of 350 would provide sufficient sensitivity to detect the given indirect effect, using the bootstrap-quantile approach implemented in the R package.

To illustrate the practical utility of the proposed methodology, we present two distinct research scenarios drawn from the literature. These examples demonstrate how applied researchers can utilize the proposed framework to determine the necessary sample size for detecting specific indirect mechanisms.

Scenario 1: Behavioral Weight Loss Interventions (continuous outcome). Consider a researcher designing a randomized controlled trial similar to the SHED-IT intervention [23]. The objective is to determine if a new internet-based lifestyle intervention (binary exposure, X) effectively reduces Body Mass Index (continuous outcome, Y) by increasing the duration of moderate-to-vigorous physical activity (continuous mediator, M). Estimating power for this specific pathway (X to M to Y) is challenging because physical activity data is often skewed and does not strictly follow the normality assumptions required by the Sobel test. To ensure the study is not underpowered, the investigator can use the proposed quantile-based framework. Here, inputs are effect sizes derived from pilot data, such as intervention increases activity by a = 0.4 SD; activity reduces BMI by b = −0.3 SD. The variable types are specified as binary X, continuous M, and continuous Y. The algorithm calculates the minimum sample size, N = 285, required to detect this specific indirect effect with 80% power at alpha = 0.05.

Scenario 2: Biobehavioral Mechanisms in Cancer Survival (Time-to-Event Outcome). Another application can be found in psycho-oncology, similar to the biobehavioral studies conducted by Antoni et al. [24]. Consider a trial designed to test if a Cognitive Behavioral Stress Management (CBSM) intervention (binary exposure, X) prolongs disease-free survival (time-to-event outcome, Y) in breast cancer patients. The hypothesized mechanism is that the intervention reduces inflammatory biomarkers, such as serum cortisol (Continuous Mediator, M), which in turn slows tumor progression. Using the proposed general mediation framework, the researcher can accurately plan this trial. To define censoring, specify the recruitment period, 24 months, and total follow-up duration, 60 months, to model censoring. Next, input the expected reduction in cortisol, a, and the hazard ratio associated with the biomarker, b. The algorithm simulates the Weibull-distributed survival times and censoring patterns to provide an estimated sample size, N = 450, that accounts for the loss of information due to censoring. This capability allows researchers to justify their sample size for grant applications involving complex survival mechanisms without requiring custom Monte Carlo simulations.

4. Discussion

In this study, we developed and evaluated a simulation-based method for estimating statistical power in mediation analysis using precomputed null distributions and quantile-based rejection regions. The method leverages simulated null distributions to calculate empirical critical values, which are then compared against indirect effect estimates generated under user-specified parameters. By interpolating between stored critical values with techniques such as smoothing splines, the approach provides flexible and efficient power estimation across a wide range of sample sizes and effect sizes without rerunning computationally intensive simulations for each new scenario.

A key benefit of this approach is its balance between accuracy and computational efficiency. Traditional analytic approximations often rely on restrictive parametric assumptions. Monte Carlo power analysis requires the researcher to generate and fit thousands of regression models for each discrete sample size they wish to test [11,15,25], while bootstrap methods can be intensive when repeated across many parameter settings. This requires minutes to hours of computation time for a single study design [12]. Our framework avoids both extremes by reusing tables of critical values and employing nonparametric regression to estimate between simulation points. This makes the method particularly suitable for applied researchers who need rapid power estimation or sample size determination for mediation studies, while retaining statistical rigor.

Despite these advantages, the method has limitations. First, its accuracy depends on the density and coverage of the precomputed simulation grid. Sparse grids may necessitate interpolation over wide parameter ranges, which can introduce approximation error. Second, although the framework is designed to be generalizable, its performance in highly complex or strongly nonlinear mediation scenarios has not yet been fully validated. Third, while the approach is computationally efficient compared to full-scale simulation studies, it still requires moderate computational resources, particularly for large-scale or high-dimensional analyses. Lastly, as with any simulation-based method, results are only as reliable as the underlying data-generating assumptions and parameter ranges used to build the simulation library, and caution is advised when applying the method outside these bounds.

Future work should aim to extend this framework in several directions. While the current framework establishes the validity of the quantile-based approach for single-mediator models, contemporary research frequently employs designs with multiple mediators. Extending this methodology to parallel mediation models is a priority for future development and involves addressing three specific statistical challenges. First, the framework must control association among mediators, as high covariance between mediators inflates the standard errors of the effect sizes, which can significantly affect power for specific indirect effects. Second, the calculation must adjust for the proportion of variance explained, R-square, by the additional mediators. As more valid mediators are included in the model, the residual variance of the outcome decreases, potentially altering the signal-to-noise ratio and the resulting power estimates. Finally, testing multiple indirect pathways simultaneously increases the family-wise error rate. Future iterations of this package will integrate multiple testing corrections like Bonferroni or Holm–Bonferroni adjustments directly into the power calculation to ensure robust inference when exploring multiple mechanisms. In addition, the precomputed library can be expanded to include a broader set of models, such as nonlinear or multilevel mediation models. This would enhance generalizability. Incorporating adaptive grid refinement strategies could reduce interpolation error by increasing simulation density in regions where power estimates change rapidly. Finally, extending the approach to accommodate complex mediation pathways would further enhance its practical applicability.

Author Contributions

Conceptualization, Q.Y., W.C. and N.R.; methodology, Q.Y. and N.R.; software, Q.Y.; validation, Q.Y., A.B. and N.R.; formal analysis, N.R.; resources, Q.Y.; writing—original draft preparation, N.R.; writing—review and editing, Q.Y., A.B. and W.C.; visualization, N.R.; supervision, Q.Y.; funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially supported by the National Institute on Minority Health and Health Disparities of the National Institutes of Health under grant number 2R15MD012387-02, by the National Cancer Institute grant number R01CA275089 and by the NIEHS under grant number P42ES013648.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The simulated data presented in this study are available in the article. The code used to generate the simulation results is available from the corresponding author upon request.

Acknowledgments

During the preparation of this manuscript, the authors used Google Gemini, December 2025 version, for the purposes of refining the manuscript for clarity and flow. The authors have reviewed and edited the output and take full responsibility for the content of this publication. We acknowledge that part of this research was conducted with high-performance computational resources provided by the Louisiana Optical Network Infrastructure.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baron, R.M.; Kenny, D.A. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 1986, 51, 1173–1182. [Google Scholar] [CrossRef] [PubMed]
Gunzler, D.; Chen, T.; Wu, P.; Zhang, H. Introduction to mediation analysis with structural equation modeling. Shanghai Arch. Psychiatry 2013, 25, 390–394. [Google Scholar] [PubMed]
Hayes, A.F. Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach; Guilford Publications: New York, NY, USA, 2017. [Google Scholar]
MacKinnon, D. Introduction to Statistical Mediation Analysis; Routledge: Oxfordshire, UK, 2012. [Google Scholar]
Preacher, K.J. Advances in mediation analysis: A survey and synthesis of new developments. Annu. Rev. Psychol. 2015, 66, 825–852. [Google Scholar] [CrossRef] [PubMed]
Rucker, D.D.; Preacher, K.J.; Tormala, Z.L.; Petty, R.E. Mediation analysis in social psychology: Current practices and new recommendations. Soc. Personal. Psychol. Compass 2011, 5, 359–371. [Google Scholar] [CrossRef]
Yu, Q.; Li, B. mma: An R Package for Mediation Analysis with Multiple Mediators. J. Open Res. Softw. 2017, 5, 11. [Google Scholar] [CrossRef]
MacKinnon, D.P.; Lockwood, C.M.; Hoffman, J.M.; West, S.G.; Sheets, V. A comparison of methods to test the significance of the mediated effect. Psychol. Methods 2002, 7, 83–104. [Google Scholar] [CrossRef]
Hayes, A.F. Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Commun. Monogr. 2009, 76, 408–420. [Google Scholar] [CrossRef]
Fritz, M.S.; MacKinnon, D.P. Required Sample Size to Detect the Mediated Effect. Psychol. Sci. 2007, 18, 233–239. [Google Scholar] [CrossRef]
Muthén, L.K.; Muthén, B.O. How to use a Monte Carlo study to decide on sample size and determine power. Struct. Equ. Model. 2002, 9, 599–620. [Google Scholar] [CrossRef]
Thoemmes, F.; MacKinnon, D.P.; Reiser, M.R. Power analysis for complex mediational designs using Monte Carlo methods. Struct. Equ. Model. 2010, 17, 510–534. [Google Scholar] [CrossRef]
Qin, X. Sample size and power calculations for causal mediation analysis: A Tutorial and Shiny App. Behav. Res. Methods 2024, 56, 1738–1769. [Google Scholar] [CrossRef]
Schoemann, A.M.; Boulton, A.J.; Short, S.D. Determining Power and Sample Size for Simple and Complex Mediation Models. Soc. Psychol. Personal. Sci. 2017, 8, 379–386. [Google Scholar] [CrossRef]
Zhang, Z. Monte Carlo based statistical power analysis for mediation models: Methods and software. Behav. Res. Methods 2014, 46, 1184–1198. [Google Scholar] [CrossRef] [PubMed]
Miočević, M.; MacKinnon, D.P.; Levy, R. Power in Bayesian Mediation Analysis for Small Sample Research. Struct. Equ. Model. Multidiscip. J. 2017, 24, 666–683. [Google Scholar] [CrossRef] [PubMed]
Pan, H.; Liu, S.; Miao, D.; Yuan, Y. Sample size determination for mediation analysis of longitudinal data. BMC Med. Res. Methodol. 2018, 18, 32. [Google Scholar] [CrossRef] [PubMed]
Sim, M.; Kim, S.-Y.; Suh, Y. Sample Size Requirements for Simple and Complex Mediation Models. Educ. Psychol. Meas. 2022, 82, 76–106. [Google Scholar] [CrossRef]
Sobel, M.E. Asymptotic confidence intervals for indirect effects in structural equation models. Sociol. Methodol. 1982, 13, 290–312. [Google Scholar] [CrossRef]
Bollen, K.A.; Stine, R. Direct and indirect effects: Classical and bootstrap estimates of variability. Sociol. Methodol. 1990, 20, 115–140. [Google Scholar] [CrossRef]
Yu, Q.; Li, B. mma: Multiple Mediation Analysis. 2023. Available online: https://cran.r-project.org/web/packages/mma/mma.pdf (accessed on 5 January 2025).
Pya, N.; Wood, S.N. Shape constrained additive models. Stat. Comput. 2015, 25, 543–559. [Google Scholar] [CrossRef]
Morgan, P.J.; Lubans, D.R.; Collins, C.E.; Warren, J.M.; Callister, R. Exploring the mechanisms of weight loss in the SHED-IT intervention for overweight men: A mediation analysis. Int. J. Behav. Nutr. Phys. Act. 2009, 6, 76. [Google Scholar] [CrossRef]
Antoni, M.H.; Lechner, S.C.; Kazi, A.; Wimberly, S.R.; Sifre, T.; Urcuyo, K.R.; Phillips, K.; Glück, S.; Carver, C.S. How stress management improves benefit finding, quality of life, and health status in women with early-stage breast cancer. J. Consult. Clin. Psychol. 2006, 74, 1143–1152. [Google Scholar] [CrossRef]
Preacher, K.J.; Selig, J.P. Advantages of Monte Carlo confidence intervals for indirect effects. Commun. Methods Meas. 2012, 6, 77–98. [Google Scholar] [CrossRef]

Figure 1. Conceptual diagram of the mediation pathway.

Figure 2. Plot showing the power vs. sample size required. The arrow denotes the required sample size.

Table 1. Summary of key assumptions for the proposed methodology.

Assumption	Description
A1: No X-Y Confounding	No unmeasured confounding of the exposure–outcome relationship.
A2: No X-M Confounding	No unmeasured confounding of the exposure–mediator relationship
A3: No M-Y Confounding	No unmeasured confounding of the mediator–outcome relationship.
A4: No M-M Causality	Assumes that in multi-mediator models, one mediator does not causally affect another (independent mediators).

Table 2. Arguments for the R Function mmaPower().

Parameter	Description
a	Effect size from exposure to mediator
b	Effect size from mediator to outcome
c	Direct effect from exposure to outcome
x	Exposure type (continuous, binary)
m	Mediator type (continuous, binary)
y	Outcome type (continuous, binary, surv)
xsig	Standard deviation of the exposure variable
msig	Standard deviation of the mediator variable
ysig	Standard deviation of the outcome variable
Duration (optional)	Study duration for time-to-event outcome
Recruitment (optional)	Recruitment period for time-to-event outcome
n (optional)	Sample size (used to estimate power)
power (optional)	Desired power (used to estimate required sample size)
alpha (optional)	Significance level (default = 0.05)
test	Alternative hypothesis (less than 0, greater than 0, two-sided)
plot (optional)	Logical flag to generate power vs. sample size plot when both n and power are NULL

Table 3. Estimated power for detecting indirect effect at varying sample sizes when exposure, mediator, and outcome are continuous.

Sample Size (n)	Power for Small Effect (ab = 0.02)	Power for Medium Effect (ab = 0.15)	Power for Large Effect (ab = 0.35)
25	0.06	0.24	0.61
50	0.11	0.62	0.94
75	0.16	0.86	0.99
100	0.22	0.96	0.99
200	0.44	1.00	0.99
350	0.71	1.00	0.99
500	0.89	1.00	0.99

Table 4. Estimated power for detecting indirect effect at varying sample sizes when exposure and mediator are binary, and outcome is continuous.

Sample Size (n)	Power for Small Effect (ab = 0.02)	Power for Medium Effect (ab = 0.15)	Power for Large Effect (ab = 0.35)
25	0.06	0.24	0.61
50	0.11	0.62	0.94
75	0.16	0.86	0.99
100	0.22	0.96	0.99
200	0.44	0.99	0.99
350	0.71	0.99	0.99
500	0.89	0.99	0.99

Table 5. Estimated power for detecting indirect effect at varying sample sizes when exposure, mediator, and outcome are binary.

Sample Size (n)	Power for Small Effect (ab = 0.02)	Power for Medium Effect (ab = 0.15)	Power for Large Effect (ab = 0.35)
25	0.01	0.05	0.14
50	0.02	0.12	0.35
75	0.03	0.20	0.55
100	0.05	0.29	0.70
200	0.10	0.60	0.95
350	0.18	0.85	0.99
500	0.26	0.95	0.99

Table 6. Estimated power for detecting indirect effect at varying sample sizes when exposure is binary, mediator is continuous, and outcome is time-to-event.

Sample Size (n)	Power for Small Effect (ab = 0.02)	Power for Medium Effect (ab = 0.15)	Power for Large Effect (ab = 0.35)
25	0.02	0.09	0.22
50	0.04	0.20	0.48
75	0.06	0.35	0.72
100	0.09	0.48	0.86
200	0.18	0.76	0.98
350	0.31	0.94	0.99
500	0.42	0.99	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rizvi, N.; Bam, A.; Cao, W.; Yu, Q. Sample Size Calculation and Power Analysis for the General Mediation Analysis Method. Stats 2026, 9, 19. https://doi.org/10.3390/stats9010019

AMA Style

Rizvi N, Bam A, Cao W, Yu Q. Sample Size Calculation and Power Analysis for the General Mediation Analysis Method. Stats. 2026; 9(1):19. https://doi.org/10.3390/stats9010019

Chicago/Turabian Style

Rizvi, Nubaira, Amjila Bam, Wentao Cao, and Qingzhao Yu. 2026. "Sample Size Calculation and Power Analysis for the General Mediation Analysis Method" Stats 9, no. 1: 19. https://doi.org/10.3390/stats9010019

APA Style

Rizvi, N., Bam, A., Cao, W., & Yu, Q. (2026). Sample Size Calculation and Power Analysis for the General Mediation Analysis Method. Stats, 9(1), 19. https://doi.org/10.3390/stats9010019

Article Menu

Sample Size Calculation and Power Analysis for the General Mediation Analysis Method

Abstract

1. Introduction

2. Methodology

2.1. Overview of the General Mediation Analysis Method

2.2. Power Analysis in General Mediation Analysis

2.2.1. Critical Values

2.2.2. The Algorithm to Find Critical Values

2.2.3. Power Estimation

2.2.4. The Algorithm for Power Calculation

3. Simulation Studies

3.1. Power Analysis

3.2. Sample Size Calculation

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI