# Interpreting Randomized Controlled Trials

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## Simple Summary

## Abstract

## 1. Introduction

## 2. Sampling Theory and Experimental Design

**Y**= {Y

_{1}, …, Y

_{n}} may be used to compute a statistical estimator of θ, and P(Y|θ) may be used to determine the estimator’s probability distribution. For example, a sample mean may be used to estimate a population mean. The distribution of the sample mean may be approximated by a normal distribution (bell-shaped curve) if the sample size is sufficiently large, and a 95% confidence interval (CI) around the observed sample mean may be computed to quantify uncertainty by giving us an idea of how closely we can estimate θ from the sample. Another example is that, given (X,Y) data on a numerical outcome variable Y and a covariate X, a regression model P(Y|X, α, β) with linear conditional mean E(Y|X) = α + βX may be assumed to characterize how Y varies with X. If Y is a binary (0/1) indicator of response, then a logistic regression model log{Pr(Y = 1|X)/Pr(Y = 0|X)} = α + βX can be used. In each case, the parameters θ = (α, β) may be estimated from a sample of (X,Y) pairs to make inferences about the population from which the sample was taken, provided that the sample accurately represents the population. For example, a simple random sample of size n must be obtained in such a way that all possible sets of n objects from the population are equally likely to be the sample.

## 3. Bayesian and Frequentist Inference

_{1}, …, Yn} into the posterior distribution P(θ|Y

_{1}, …, Yn) using Bayes’ theorem. The posterior distribution reflects our updated knowledge about θ owing to the information contained in the sample {Y

_{1}, …, Yn}, and quantifies our final beliefs about θ. Bayesian inferences thus are based on the posterior. For example, if L is the 2.5th percentile and U is the 97.5th percentile of the posterior, then [L, U] is a 95% posterior CrI for θ i.e., θ is in the interval [L, U] with a probability of 0.95 based on the posterior, written as Pr(L < θ < U|data) = 0.95.

_{1}, …, Y

_{n}denote n patients’ binary response indicators, with Y

_{i}= 1 if a response is observed from the ith patient and Y

_{i}= 0 otherwise. We then have R = Y

_{1}+ … + Y

_{n}. Assuming conditional independence of n observations given θ, R follows a binomial distribution with parameters n and θ. A beta(a,b) distribution over the unit interval (0, 1) is a very tractable prior for θ. The beta(a,b) prior has mean a/(a + b) and effective sample size (ESS) = a + b, which quantifies the informativeness of the prior. The beta prior is commonly used because it is conjugate for the binomial likelihood; the posterior of θ given observed R and n is also a beta distribution, but with updated parameters, beta(a + R, b + n − R). In the RMC example, we define response as complete response (CR) or partial response (PR) on imaging at 3 months and assume beta(1,1) prior distribution, also known as Laplace’s prior. Beta(1,1) is the uniform distribution over the interval of (0, 1). That is, under beta(1,1), all values in the unit interval between 0 and 1 are equiprobable, and they can be viewed as noninformative (Figure 2A). It also has ESS = 2 and thus encodes little prior knowledge about θ. Because this cancer is rare and there is no comparator treatment, it is not feasible to conduct a randomized study. Suppose that a single-arm pilot study with n = 10 patients is conducted to establish feasibility and R = 7 responses are observed. This dataset allows us to update the uniform prior to the beta (1 + 7, 1 + 3) = beta(8,4) posterior, which has ESS = 12, posterior mean 8/12 = 0.67, and 95% posterior CrI 0.39–0.89 (Figure 2B). The Bayesian posterior estimator 0.67 shrinks the empirical estimate 7/10 = 0.70 toward the prior mean 0.50, which is characteristic of Bayesian estimation. Frequentist methods, such as those used in Least Absolute Shrinkage and Selection Operator (LASSO) or ridge regression, also achieve shrinkage by including penalty terms, a concept known as penalization [26]. In general, shrinkage and penalization improve the estimation of unknown parameters and enhance prediction accuracy.

_{1}, …, Y

_{10}for the pilot study and Y

_{11}, …, Y

_{60}for the second study. Assume also that the subjects of the second study are sampled randomly from the same population as those of the first pilot study, a strong assumption that will be further explored in later sections. Furthermore, assume that Y

_{1}, …, Y

_{60}are conditionally independent given θ. These assumptions allow the two Bayesian posterior computations described above to be executed in one step by treating Y

_{1}, …, Y

_{10}, Y

_{11}, …, Y

_{60}as a single sample, assuming the first beta(1,1) prior, and directly obtaining the beta(28,34) posterior for θ in one step. If, instead, the second study were executed without observing the pilot study results, then it would be appropriate to use a uniform beta(1,1) prior, so 20 responses in 50 patients would lead to beta(1 + 20, 1 + 30) = beta(21,31) posterior. This has mean 21/(21 + 31) = 0.40 and 95% CrI 0.27–0.54 (Figure 3A,B). This is different from the posterior in Figure 2D because the two analyses begin with different priors, a beta(1,1) prior to seeing the pilot study results versus a beta(8,4) prior using the observed pilot study data. However, if the data from the pilot study are revealed afterward (Figure 3C,D), then the final posterior distribution will be the same as in Figure 2D. This is an example of the general fact that, if data are generated from the same distribution over time, then repeated application of Bayes’ Law is coherent in that it gives the same posterior that would be obtained if the sequence of datasets were observed in one study.

## 4. Confirmations and Refutations

_{0}: HR = 1.0 of no treatment difference, no probability distribution is assumed for the parameter HR. A frequentist test compares the observed value T

^{obs}of a test statistic T to the distribution of T that would result from an infinite number of repetitions of the experiment that generates the data, assuming that H

_{0}is true. If T

^{obs}are very unlikely to be observed based on the distribution of T under H

_{0}, this serves as refutational evidence against H

_{0}(Figure 4B). This can be quantified by a p-value, which is defined as 2 × Pr(T > |T

^{obs}) for a two-sided test, under specific model assumptions [35,36,37].

_{0}[39]. A practical solution for this problem is to quantify the level of surprise provided by a p-value as refutational evidence against a given hypothesis in terms of bits of information, which are easy to interpret. This can be executed by transforming a p-value into an S-value [12,40], defined as S = −log

_{2}(p). Bearing in mind that a p-value is a statistic because it is computed from data, if H

_{0}is true then a p-value is uniformly distributed between 0 and 1. This implies that under H

_{0}, a p-value has a mean 1/2 and, for example, the probability that p < 0.05 is 0.05. The rationale for computing S is that the probability of observing all tails in S flips of a fair coin equals (1/2)

^{S}, so p = (1/2)

^{S}gives S as a simple, intuitive way to quantify how surprising a p-value should be [12,41,42,43,44]. S represents the number of coin flips, typically rounded to the nearest integer. Suppose that an HR of 0.71 is observed and a p-value of 0.0016 is obtained against the null hypothesis of HR = 1.0. Since −log

_{2}(0.0016) = 9.3, rounding this to the nearest integer gives S = 9 bits of refutational information against the null hypothesis of HR = 1. This may be interpreted as the degree of surprise that we would have after observing all tails in 9 consecutive flips of a coin that we believe is fair. A larger S indicates greater surprise, which is stronger evidence to refute the belief that the coin is fair, which corresponds to the belief that H

_{0}is true. Thus, the surprise provided by an S-value is refutational for H

_{0}. In this case, S = 9 quantifies the degree of surprise that should result from observing a p-value of 0.0016 if H

_{0}is true. We provide a simple calculator (Supplementary File S1) that can be used by clinicians to convert p-values to S-values. Of note, S-values quantify refutational information against the fairness of the coin tosses, which includes but is not limited to the possibility that the coin itself is biased towards tails. An unbiased coin can be tossed unfairly to result in all tails in S flips. For simplicity herein, our assertion that the coin is “fair” encompasses all these scenarios.

^{−7}), which corresponds to obtaining all tails in 22 tosses of a fair coin [45].

_{0}, which is wrong because a null hypothesis can almost never be confirmed. In the example, this misinterpretation would say that there was no meaningful difference between zoledronic acid versus control. The correct interpretation is that there was no strong evidence against the claim of no difference between zoledronic acid versus control in time to the first SRE. Using the S-value, p = 0.39 supplies approximately 1 bit of information against the null hypothesis of no difference, which is equivalent to asserting that a coin is fair after tossing it only once. This is why a very large p-value, by itself, provides very little information [35].

_{2}(0.05) ≈ 4, assuming that the statistical model assumptions are correct. Thus, the data from the RCT suggest that HR values within the interval bound 0.57 and 0.87 are at most as surprising as seeing 4 tails in 4 fair coin tosses. Values lying outside this range have more than 4 bits of refutational information against them, and the point estimate HR of 0.71 is the value with the least refutational information against it. Similarly, frequentist 99% CIs correspond to a p-value threshold of 1 − 0.99 = 0.01 and thus contain values against the null with at most −log

_{2}(0.01) ≈ 7 bits of information, which is the same or less surprising than seeing 7 tails in 7 tosses of a fair coin. A number of recent reviews provide additional guidance on converting statistical outputs into intuitive information measures [11,12,36,37,40,48].

_{0}is false or that the assumed model does not fit the data well. For simplicity, hereafter we will follow the common convention used in medical RCTs of assuming that the model is adequate, and thus that a small p-value yields refutational evidence only against the tested hypothesis, which typically is the null hypothesis of no treatment difference.

## 5. Inferences and Decisions

^{−5}, which corresponds to 15 bits of refutational information against the null hypothesis. On the other hand, the p-value of 0.013 reported by COSMIC-313 supplied only 6 bits of refutational information against the null hypothesis that the triplet combination of cabozantinib + nivolumab + ipilimumab yields the same average PFS as the control arm. Therefore, although both trials were considered to show a “positive” PFS signal using the same p-value cutoff of 0.05 based on prespecified decision-theoretic criteria, METEOR yielded more than twice the refutational information against its null hypothesis compared with COSMIC-313.

## 6. Pre Hoc and Post Hoc Power

_{a}: HR = HR*), it is not used in the interpretation of a completed RCT. While there is typically only one null hypothesis, that the HR = 1.0, there are infinitely many potential alternative HR* values. Since a typical power computation is based on one arbitrary value for the alternative hypothesis and essentially is a device for computing sample size, most power computations have very little value and may be misleading after a trial has been completed. For example, the stated power of the CLEAR phase 3 RCT in metastatic ccRCC was determined based on the selected alternative value HR* of 0.714 for the primary endpoint of PFS, but upon completion of the trial, the estimated HR was 0.39 [66]. Post hoc power calculations conducted using the observed results after RCT completion are simply a re-expression of the observed p-value, and they provide no additional information [67]. This is the reason why knowing the power of an RCT is useful during the design stage of the trial, mainly as a rationale for sample size, but it has no value when analyzing the trial’s data. After the RCT is completed, the main interest for causal inferences is the uncertainty intervals of comparative parameters such as HRs or differences between means [35,67]. Due to the arbitrariness of HR*, it may be argued that a typical power computation is little more than a device to rationalize a computed sample size and that a plotted curve or table of power figures for a range of HR* values is much more honest and informative.

## 7. Variability and Uncertainty

^{2}, which is a parameter defined as the expected value of (Y − μ)

^{2}for Y, a random variable of a population with mean μ. A population variance typically is estimated by a sample variance s

^{2}, which is often used to compute the SE = s/sqrt(n) of a sample mean. As the sample size increases, the SE decreases, and the CI for the mean becomes narrower [68,69,70].

## 8. Aleatory and Epistemic Probabilities

## 9. Random Sampling and Random Allocation

## 10. Comparative and Group-Specific Inferences

^{−5}, yielding 14 bits of refutational information against the null hypothesis. Figure 7B,C shows two RCTs with large p-values for the relative treatment effect measured by the HR estimate. However, the RCT in Figure 7B shows a consistent signal of no meaningful effect size difference between the treatment and control groups, as can be determined by looking at the consistently narrow cumulative event curve difference represented by the shaded gray area. The RCTs in Figure 7A,B yielded informative signals as evidenced by the narrow cumulative event curve difference. Conversely, Figure 7C shows the results of an uninformative RCT. This low signal is evident by the wide cumulative event curve difference throughout the survival plot and is consistent with the wide 95% CIs of 0.54–1.30 for the HR estimate. Therefore, no inferences can be made at any time point for the survival curves presented in Figure 7C. Readers inspecting the noisy data in Figure 7C may mistakenly conclude that there exists a signal of a survival difference favoring the treatment over the control group at the tail end of the curve from approximately 40 months onward. However, the wide cumulative event curve difference shows that the estimated curves from that time point onward are based almost exclusively on noise. This important visual information would be missed in survival plots that do not show the comparative uncertainty estimates for the differences between the RCT groups. In general, any Kaplan–Meier estimate becomes progressively less precise over time as the numbers at risk decrease, and at the tail end of the curve, it provides a much less reliable estimate due to the low numbers of patients followed at those time points. Indeed, only 21/159 = 13.2% of patients in the RCT shown in Figure 7C were in the risk set at 40 months. It has been proposed, accordingly, to refrain from presenting survival plots after the time point where only around 10% to 20% of patients remain at risk of the failure event [94]. A key point is that if we are to make decisions regarding a test hypothesis, such as the null hypothesis, then the binary decision to either reject or accept is inadequate because it cannot distinguish between the two very different scenarios shown in Figure 7B,C. Instead, we can more appropriately use the trinary of “reject” (Figure 7A), “accept” (Figure 7B), or “inconclusive” (Figure 7C).

## 11. Blocking and Stratification

## 12. Forward and Reverse Causal Inference

_{0}and μ

_{1}, then the difference between the sample means is an unbiased estimator of the between-treatment effect Δ = μ

_{1}− μ

_{0}for the population to which the sample corresponds. For example, if Y indicates response, then the sample average treatment effect is the difference between the two treatments’ estimated response probabilities, and the difference between the sample response probabilities follows a probability distribution with mean Δ; that is, it is unbiased. Assume that h

_{1}is the hazard function (event rate) for the treatment group and h

_{0}is the hazard function for the control group as previously described [1,2]. For HR = h

_{1}/h

_{0}, the relative treatment effect may be written as Δ = log(HR) = log(h

_{1}) − log(h

_{0}), and the sample log(HR) provides an unbiased estimator of Δ. The key assumptions to ensure this is that (1) whichever treatment, X = 0 or 1, is given to a patient, the observed outcome must equal the potential outcome, Y = Y(X); (2) given any patient covariates, treatment choice is conditionally independent of the future potential outcomes (that is, one cannot see into the future); and (3) both treatments must be possible for the patient. In terms of a DAG, (Figure 6B) [87], randomization removes any arrows from observed or unknown variables to treatment X, so the causal effect of X on Y cannot be confounded with the effects of any other variables. In particular, randomization removes the treatment decision from the physician or the patient, who would otherwise use the patient’s covariates or preferences to choose treatments (Figure 6A). An additional statistical tool is the central limit theorem (CLT), which says that, for a sufficiently large sample size, the distribution of the sample estimator is approximately normal with mean Δ and specified variance. This may be used to test hypotheses and compute uncertainty measures such as confidence intervals and p-values [120].

## 13. Generalizability and Transportability of Causal Effects

_{0}and μ

_{1}, such as mean survival times, may differ between the sample’s actual population and the target population, the between-treatment effect Δ = μ

_{1}− μ

_{0}is the same for the two populations. Transportability from experimental subjects to future patients seen in the clinic who share relevant mechanistic causal properties is a standard scientific assumption [2,80,81,82]. For example, inferences from an RCT comparing therapies that target human epidermal growth factor receptor 2 (HER2) signaling in breast cancer may be transported if a patient seen in the clinic has breast cancer driven by HER2 signaling, despite the fact that the clinic population is otherwise completely separate in space and time from the sample enrolled in the RCT [2,82].

## 14. Representativeness and Inclusiveness

## 15. Relevance and Robustness

## 16. Intention to Treat and Per Protocol

## 17. Prognostic and Predictive Effects

## 18. Superiority and Noninferiority

## 19. Enthusiastic and Skeptical Priors

## 20. Intermediate Endpoints and Overall Survival

## 21. Synergy, Additivity, and Independence

_{50}), and efficacy, such as fractional cancer cell kill, to generate dose-response curves. However, commonly obtained dose-response data using survival outcomes in RCTs are often insufficient to determine additivity, synergy, or antagonism [209]. This highlights the need for careful dose-finding of therapy combinations in the early phases of development [4,139,216,217,218] and elucidation of patient-specific differences in drug pharmacokinetics and pharmacodynamics [219,220]. Tailored RCT designs such as factorial RCTs can be used to efficiently determine the contribution of each therapy by randomly allocating participants to receive neither, one or the other, or both interventions [8,221].

## 22. Systematic and Random Biases

## 23. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Msaouel, P.; Lee, J.; Thall, P.F. Making Patient-Specific Treatment Decisions Using Prognostic Variables and Utilities of Clinical Outcomes. Cancers
**2021**, 13, 2741. [Google Scholar] [CrossRef] [PubMed] - Msaouel, P.; Lee, J.; Karam, J.A.; Thall, P.F. A Causal Framework for Making Individualized Treatment Decisions in Oncology. Cancers
**2022**, 14, 3923. [Google Scholar] [CrossRef] [PubMed] - Lee, J.; Thall, P.F.; Lim, B.; Msaouel, P. Utility-based Bayesian personalized treatment selection for advanced breast cancer. J. R. Stat. Soc. Ser. C Appl. Stat.
**2022**, 71, 1605–1622. [Google Scholar] [CrossRef] [PubMed] - Lee, J.; Thall, P.F.; Msaouel, P. Bayesian treatment screening and selection using subgroup-specific utilities of response and toxicity. Biometrics
**2022**, 79, 2458–2473. [Google Scholar] [CrossRef] - Marshall, I.J.; Nye, B.; Kuiper, J.; Noel-Storr, A.; Marshall, R.; Maclean, R.; Soboczenski, F.; Nenkova, A.; Thomas, J.; Wallace, B.C. Trialstreamer: A living, automatically updated database of clinical trial reports. J. Am. Med. Inform. Assoc.
**2020**, 27, 1903–1912. [Google Scholar] [CrossRef] - Kruskal, W.; Mosteller, F. Representative sampling, IV: The history of the concept in statistics, 1895–1939. Int. Stat. Rev./Rev. Int. De Stat.
**1980**, 48, 169–195. [Google Scholar] [CrossRef] - Kruskal, W.; Mosteller, F. Representative sampling, III: The current statistical literature. Int. Stat. Rev./Rev. Int. De Stat.
**1979**, 48, 245–265. [Google Scholar] [CrossRef] - Senn, S. Statistical Issues in Drug Development, 3rd ed.; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2021. [Google Scholar]
- Greenland, S. For and Against Methodologies: Some Perspectives on Recent Causal and Statistical Inference Debates. Eur. J. Epidemiol.
**2017**, 32, 3–20. [Google Scholar] [CrossRef] - Greenland, S. Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons. Paediatr. Perinat. Epidemiol.
**2021**, 35, 8–23. [Google Scholar] [CrossRef] - Greenland, S.; Mansournia, M.A.; Joffe, M. To curb research misreporting, replace significance and confidence by compatibility: A Preventive Medicine Golden Jubilee article. Prev. Med.
**2022**, 164, 107127. [Google Scholar] [CrossRef] - Rafi, Z.; Greenland, S. Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Med. Res. Methodol.
**2020**, 20, 244. [Google Scholar] [CrossRef] - Fisher, R.A. Design of experiments. Br. Med. J.
**1936**, 1, 554. [Google Scholar] [CrossRef] - Armitage, P. Fisher, Bradford Hill, and randomization. Int. J. Epidemiol.
**2003**, 32, 925–928, discussion 945–928. [Google Scholar] [CrossRef] [PubMed] - Preece, D.A. RA Fisher and Experimental Design: A Review. Biometrics
**1990**, 46, 925–935. [Google Scholar] [CrossRef] - Marks, H.M. Rigorous uncertainty: Why RA Fisher is important. Int. J. Epidemiol.
**2003**, 32, 932–937, discussion 945–938. [Google Scholar] [CrossRef] [PubMed] - Craiu, R.V.; Gong, R.; Meng, X.-L. Six Statistical Senses. Annu. Rev. Stat. Its Appl.
**2023**, 10, 699–725. [Google Scholar] [CrossRef] - Efron, B. Modern Science and the Bayesian-Frequentist Controversy; Division of Biostatistics, Stanford University: Stanford, CA, USA, 2005. [Google Scholar]
- Thall, P.F. Statistical Remedies for Medical Researchers; Springer International Publishing: New York, NY, USA, 2019. [Google Scholar]
- Gelman, A.; Simpson, D.; Betancourt, M. The Prior Can Often Only Be Understood in the Context of the Likelihood. Entropy
**2017**, 19, 555. [Google Scholar] [CrossRef] - Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Taylor & Francis: Abingdon, UK, 2013. [Google Scholar]
- Msaouel, P.; Hong, A.L.; Mullen, E.A.; Atkins, M.B.; Walker, C.L.; Lee, C.H.; Carden, M.A.; Genovese, G.; Linehan, W.M.; Rao, P.; et al. Updated Recommendations on the Diagnosis, Management, and Clinical Trial Eligibility Criteria for Patients with Renal Medullary Carcinoma. Clin. Genitourin. Cancer
**2019**, 17, 1–6. [Google Scholar] [CrossRef] - Msaouel, P.; Malouf, G.G.; Su, X.; Yao, H.; Tripathi, D.N.; Soeung, M.; Gao, J.; Rao, P.; Coarfa, C.; Creighton, C.J.; et al. Comprehensive Molecular Characterization Identifies Distinct Genomic and Immune Hallmarks of Renal Medullary Carcinoma. Cancer Cell
**2020**, 37, 720–734.e713. [Google Scholar] [CrossRef] - Wiele, A.J.; Surasi, D.S.; Rao, P.; Sircar, K.; Su, X.; Bathala, T.K.; Shah, A.Y.; Jonasch, E.; Cataldo, V.D.; Genovese, G.; et al. Efficacy and Safety of Bevacizumab Plus Erlotinib in Patients with Renal Medullary Carcinoma. Cancers
**2021**, 13, 2170. [Google Scholar] [CrossRef] - Wilson, N.R.; Wiele, A.J.; Surasi, D.S.; Rao, P.; Sircar, K.; Tamboli, P.; Shah, A.Y.; Genovese, G.; Karam, J.A.; Wood, C.G.; et al. Efficacy and safety of gemcitabine plus doxorubicin in patients with renal medullary carcinoma. Clin. Genitourin. Cancer
**2021**, 19, e401–e408. [Google Scholar] [CrossRef] - Lyman, G.H.; Msaouel, P.; Kuderer, N.M. Risk Model Development and Validation in Clinical Oncology: Lessons Learned. Cancer Investig.
**2023**, 41, 1–11. [Google Scholar] [CrossRef] [PubMed] - Olsson, E.J. Bayesian Epistemology. In Introduction to Formal Philosophy; Hansson, S.O., Hendricks, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 431–442. [Google Scholar]
- Carnap, R. Testability and Meaning. Philos. Sci.
**1936**, 3, 419–471. [Google Scholar] [CrossRef] - van Zwet, E.; Schwab, S.; Greenland, S. Addressing exaggeration of effects from single RCTs. Significance
**2021**, 18, 16–21. [Google Scholar] [CrossRef] - van Zwet, E.; Schwab, S.; Senn, S. The statistical properties of RCTs and a proposal for shrinkage. Stat. Med.
**2021**, 40, 6107–6117. [Google Scholar] [CrossRef] - van Zwet, E.W.; Cator, E.A. The significance filter, the winner’s curse and the need to shrink. Stat. Neerl.
**2021**, 75, 437–452. [Google Scholar] [CrossRef] - Greenland, S. Probability logic and probabilistic induction. Epidemiology
**1998**, 9, 322–332. [Google Scholar] [CrossRef] [PubMed] - Greenland, S. Induction versus Popper: Substance versus semantics. Int. J. Epidemiol.
**1998**, 27, 543–548. [Google Scholar] [CrossRef] - Popper, K.R. Conjectures and Refutations: The Growth of Scientific Knowledge; Routledge and Kegan Paul: London, UK, 1963. [Google Scholar]
- Greenland, S.; Senn, S.J.; Rothman, K.J.; Carlin, J.B.; Poole, C.; Goodman, S.N.; Altman, D.G. Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. Eur. J. Epidemiol.
**2016**, 31, 337–350. [Google Scholar] [CrossRef] [PubMed] - Greenland, S. Divergence vs. Decision P-values: A Distinction Worth Making in Theory and Keeping in Practice—Or, How Divergence P-values Measure Evidence Even When Decision P-values Do Not. Scand. J. Stat.
**2023**, 50, 54–88. [Google Scholar] [CrossRef] - Cole, S.R.; Edwards, J.K.; Greenland, S. Surprise! Am. J. Epidemiol.
**2021**, 190, 191–193. [Google Scholar] [CrossRef] - McShane, B.B.; Gal, D. Statistical Significance and the Dichotomization of Evidence. J. Am. Stat. Assoc.
**2017**, 112, 885–895. [Google Scholar] [CrossRef] - Amrhein, V.; Greenland, S.; McShane, B. Scientists rise up against statistical significance. Nature
**2019**, 567, 305–307. [Google Scholar] [CrossRef] - Mansournia, M.A.; Nazemipour, M.; Etminan, M. P-value, compatibility, and S-value. Glob. Epidemiol.
**2022**, 4, 100085. [Google Scholar] [CrossRef] - Pearl, J. Bayesianism and Causality, or, Why I am Only a Half-Bayesian. In Foundations of Bayesianism; Corfield, D., Williamson, J., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 2001; pp. 19–36. [Google Scholar]
- Carmona-Bayonas, A.; Jimenez-Fonseca, P.; Gallego, J.; Msaouel, P. Causal Considerations Can Inform the Interpretation of Surprising Associations in Medical Registries. Cancer Investig.
**2022**, 40, 1–13. [Google Scholar] [CrossRef] - Bareinboim, E.; Correa, J.D.; Ibeling, D.; Icard, T.F. On Pearl’s Hierarchy and the Foundations of Causal Inference. In Probabilistic and Causal Inference: The Works of Judea Pearl; ACM Books: New York, NY, USA, 2022; pp. 507–556. [Google Scholar]
- Greenland, S. The Causal Foundations of Applied Probability and Statistics. In Probabilistic and Causal Inference: The Works of Judea Pearl; Association for Computing Machinery: New York, NY, USA, 2022; Volume 36, pp. 605–624. [Google Scholar]
- Junk, T.R.; Lyons, L. Reproducibility and Replication of Experimental Particle Physics Results. arXiv
**2020**, arXiv:2009.06864. [Google Scholar] - Smith, M.R.; Halabi, S.; Ryan, C.J.; Hussain, A.; Vogelzang, N.; Stadler, W.; Hauke, R.J.; Monk, J.P.; Saylor, P.; Bhoopalam, N.; et al. Randomized controlled trial of early zoledronic acid in men with castration-sensitive prostate cancer and bone metastases: Results of CALGB 90202 (alliance). J. Clin. Oncol.
**2014**, 32, 1143–1150. [Google Scholar] [CrossRef] [PubMed] - Morey, R.D.; Hoekstra, R.; Rouder, J.N.; Lee, M.D.; Wagenmakers, E.-J. The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev.
**2016**, 23, 103–123. [Google Scholar] [CrossRef] [PubMed] - Amrhein, V.; Trafimow, D.; Greenland, S. Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. Am. Stat.
**2019**, 73, 262–270. [Google Scholar] [CrossRef] - Greenland, S. Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values. Am. Stat.
**2019**, 73, 106–114. [Google Scholar] [CrossRef] - Royall, R. On the Probability of Observing Misleading Statistical Evidence. J. Am. Stat. Assoc.
**2000**, 95, 760–768. [Google Scholar] [CrossRef] - Xie, M.-G.; Singh, K. Confidence Distribution, the Frequentist Distribution Estimator of a Parameter: A Review. Int. Stat. Rev.
**2013**, 81, 3–39. [Google Scholar] [CrossRef] - Meng, X.-L. Double Your Variance, Dirtify Your Bayes, Devour Your Pufferfish, and Draw your Kidstrogram. N. Engl. J. Stat. Data Sci.
**2022**, 1, 4–23. [Google Scholar] [CrossRef] - Efron, B.; Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science; Cambridge University Press: New York, NY, USA, 2016; p. xix. 475p. [Google Scholar]
- Choueiri, T.K.; Tomczak, P.; Park, S.H.; Venugopal, B.; Ferguson, T.; Chang, Y.H.; Hajek, J.; Symeonides, S.N.; Lee, J.L.; Sarwar, N.; et al. Adjuvant Pembrolizumab after Nephrectomy in Renal-Cell Carcinoma. N. Engl. J. Med.
**2021**, 385, 683–694. [Google Scholar] [CrossRef] [PubMed] - Msaouel, P.; Jimenez-Fonseca, P.; Lim, B.; Carmona-Bayonas, A.; Agnelli, G. Medicine before and after David Cox. Eur. J. Intern. Med.
**2022**, 98, 1–3. [Google Scholar] [CrossRef] [PubMed] - Greenland, S. Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int. J. Epidemiol.
**2006**, 35, 765–775. [Google Scholar] [CrossRef] - Gelman, A.; Hill, J.; Vehtari, A. Regression and Other Stories; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Ioannidis, J.P. Why most discovered true associations are inflated. Epidemiology
**2008**, 19, 640–648. [Google Scholar] [CrossRef] - Greenland, S.; Hofman, A. Multiple comparisons controversies are about context and costs, not frequentism versus Bayesianism. Eur. J. Epidemiol.
**2019**, 34, 801–808. [Google Scholar] [CrossRef] - Senn, S. You May Believe You Are a Bayesian But You Are Probably Wrong. Ration. Mark. Morals
**2011**, 2, 42. [Google Scholar] - Strevens, M. The Knowledge Machine: How irrationality Created Modern Science, 1st ed.; Liveright Publishing Corporation: New York, NY, USA, 2020; p. x. 350p. [Google Scholar]
- Choueiri, T.K.; Escudier, B.; Powles, T.; Mainwaring, P.N.; Rini, B.I.; Donskov, F.; Hammers, H.; Hutson, T.E.; Lee, J.L.; Peltola, K.; et al. Cabozantinib versus Everolimus in Advanced Renal-Cell Carcinoma. N. Engl. J. Med.
**2015**, 373, 1814–1823. [Google Scholar] [CrossRef] - Msaouel, P. Less is More? First Impressions From COSMIC-313. Cancer Investig.
**2023**, 41, 101–106. [Google Scholar] [CrossRef] [PubMed] - Choueiri, T.K.; Powles, T.; Albiges, L.; Burotto, M.; Szczylik, C.; Zurawski, B.; Yanez Ruiz, E.; Maruzzo, M.; Suarez Zaizar, A.; Fein, L.E.; et al. Cabozantinib plus Nivolumab and Ipilimumab in Renal-Cell Carcinoma. N. Engl. J. Med.
**2023**, 388, 1767–1778. [Google Scholar] [CrossRef] [PubMed] - Altman, D.G.; Bland, J.M. How to obtain the confidence interval from a P value. BMJ
**2011**, 343, d2090. [Google Scholar] [CrossRef] - Motzer, R.; Alekseev, B.; Rha, S.Y.; Porta, C.; Eto, M.; Powles, T.; Grunwald, V.; Hutson, T.E.; Kopyltsov, E.; Mendez-Vidal, M.J.; et al. Lenvatinib plus Pembrolizumab or Everolimus for Advanced Renal Cell Carcinoma. N. Engl. J. Med.
**2021**, 384, 1289–1300. [Google Scholar] [CrossRef] - Hoenig, J.M.; Heisey, D.M. The Abuse of Power. Am. Stat.
**2001**, 55, 19–24. [Google Scholar] [CrossRef] - Msaouel, P. The Big Data Paradox in Clinical Practice. Cancer Investig.
**2022**, 40, 567–576. [Google Scholar] [CrossRef] - Searle, S.R.; Casella, G.; McCulloch, C.E. Variance Components; Wiley: New York, NY, USA, 1992; p. xxiii. 501p. [Google Scholar]
- Greenland, S. Principles of multilevel modelling. Int. J. Epidemiol.
**2000**, 29, 158–167. [Google Scholar] [CrossRef] - Greenland, S.; Robins, J.M. Identifiability, exchangeability and confounding revisited. Epidemiol. Perspect. Innov.
**2009**, 6, 4. [Google Scholar] [CrossRef] - Cornfield, J. Recent methodological contributions to clinical trials. Am. J. Epidemiol.
**1976**, 104, 408–421. [Google Scholar] [CrossRef] - Gelman, A. The Boxer, the Wrestler, and the Coin Flip. Am. Stat.
**2006**, 60, 146–150. [Google Scholar] [CrossRef] - Stark, P.B. Pay No Attention to the Model Behind the Curtain. Pure Appl. Geophys.
**2022**, 179, 4121–4145. [Google Scholar] [CrossRef] - Hall, N.S. RA Fisher and his advocacy of randomization. J. Hist. Biol.
**2007**, 40, 295–325. [Google Scholar] [CrossRef] [PubMed] - Ludbrook, J.; Dudley, H. Issues in biomedical statistics: Statistical inference. Aust. N. Z. J. Surg.
**1994**, 64, 630–636. [Google Scholar] [CrossRef] [PubMed] - Shapiro, D.D.; Msaouel, P. Causal Diagram Techniques for Urologic Oncology Research. Clin. Genitourin. Cancer
**2021**, 19, 271 271.e1–271.e7. [Google Scholar] [CrossRef] - Lipsky, A.M.; Greenland, S. Causal Directed Acyclic Graphs. JAMA
**2022**, 327, 1083–1084. [Google Scholar] [CrossRef] [PubMed] - Greenland, S.; Pearl, J.; Robins, J.M. Causal diagrams for epidemiologic research. Epidemiology
**1999**, 10, 37–48. [Google Scholar] [CrossRef] - Bareinboim, E.; Pearl, J. Causal inference and the data-fusion problem. Proc. Natl. Acad. Sci. USA
**2016**, 113, 7345–7352. [Google Scholar] [CrossRef] - Bareinboim, E.; Pearl, J. Transportability of Causal Effects: Completeness Results. Proc. AAAI Conf. Artif. Intell.
**2021**, 26, 698–704. [Google Scholar] [CrossRef] - Msaouel, P. Impervious to Randomness: Confounding and Selection Biases in Randomized Clinical Trials. Cancer Investig.
**2021**, 39, 783–788. [Google Scholar] [CrossRef] - Correa, J.; Tian, J.; Bareinboim, E. Adjustment criteria for generalizing experimental findings. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 1361–1369. [Google Scholar]
- Bareinboim, E.; Pearl, J. Controlling Selection Bias in Causal Inference. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, La Palma, Canary Islands, Spain, 21–23 April 2012; pp. 100–108. [Google Scholar]
- Hernan, M.A.; Hernandez-Diaz, S.; Robins, J.M. A structural approach to selection bias. Epidemiology
**2004**, 15, 615–625. [Google Scholar] [CrossRef] - Lu, H.; Cole, S.R.; Howe, C.J.; Westreich, D. Toward a Clearer Definition of Selection Bias When Estimating Causal Effects. Epidemiology
**2022**, 33, 699–706. [Google Scholar] [CrossRef] [PubMed] - Greenland, S. Randomization, statistics, and causal inference. Epidemiology
**1990**, 1, 421–429. [Google Scholar] [CrossRef] [PubMed] - Senn, S.J.; Auclair, P. The graphical representation of clinical trials with particular reference to measurements over time. Stat. Med.
**1990**, 9, 1287–1302. [Google Scholar] [CrossRef] [PubMed] - Senn, S. Controversies concerning randomization and additivity in clinical trials. Stat. Med.
**2004**, 23, 3729–3753. [Google Scholar] [CrossRef] - Albiges, L.; Tannir, N.M.; Burotto, M.; McDermott, D.; Plimack, E.R.; Barthelemy, P.; Porta, C.; Powles, T.; Donskov, F.; George, S.; et al. First-line Nivolumab plus Ipilimumab Versus Sunitinib in Patients Without Nephrectomy and With an Evaluable Primary Renal Tumor in the CheckMate 214 Trial. Eur. Urol.
**2022**, 81, 266–271. [Google Scholar] [CrossRef] - Motzer, R.J.; Tannir, N.M.; McDermott, D.F.; Aren Frontera, O.; Melichar, B.; Choueiri, T.K.; Plimack, E.R.; Barthelemy, P.; Porta, C.; George, S.; et al. Nivolumab plus Ipilimumab versus Sunitinib in Advanced Renal-Cell Carcinoma. N. Engl. J. Med.
**2018**, 378, 1277–1290. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
- Vickers, A.J.; Sjoberg, D.D. Methods Modernizing Statistical Reporting in Medical Journals: Challenges and Future Directions. Eur. Urol.
**2022**, 82, 575–577. [Google Scholar] [CrossRef] - Pocock, S.J.; Clayton, T.C.; Altman, D.G. Survival plots of time-to-event outcomes in clinical trials: Good practice and pitfalls. Lancet
**2002**, 359, 1686–1689. [Google Scholar] [CrossRef] - Laupacis, A.; Sackett, D.L.; Roberts, R.S. An assessment of clinically useful measures of the consequences of treatment. N. Engl. J. Med.
**1988**, 318, 1728–1733. [Google Scholar] [CrossRef] - Hutton, J.L. Number needed to treat: Properties and problems. J. R. Stat. Soc. Ser. A Stat. Soc.
**2000**, 163, 381–402. [Google Scholar] [CrossRef] - Hutton, J.L. Number needed to treat and number needed to harm are not the best way to report and assess the results of randomised clinical trials. Br. J. Haematol.
**2009**, 146, 27–30. [Google Scholar] [CrossRef] - Hutton, J.L. Misleading Statistics. Pharm. Med.
**2010**, 24, 145–149. [Google Scholar] [CrossRef] - Senn, S. Mastering variation: Variance components and personalised medicine. Stat. Med.
**2016**, 35, 966–977. [Google Scholar] [CrossRef] - Senn, S. Testing for baseline balance in clinical trials. Stat. Med.
**1994**, 13, 1715–1726. [Google Scholar] [CrossRef] [PubMed] - Senn, S. Seven myths of randomisation in clinical trials. Stat. Med.
**2013**, 32, 1439–1450. [Google Scholar] [CrossRef] [PubMed] - Pijls, B.G. The Table I Fallacy: P Values in Baseline Tables of Randomized Controlled Trials. J. Bone Joint. Surg. Am.
**2022**, 104, e71. [Google Scholar] [CrossRef] - Elwert, F.; Winship, C. Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annu. Rev. Sociol.
**2014**, 40, 31–53. [Google Scholar] [CrossRef] [PubMed] - Pocock, S.J.; Simon, R. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics
**1975**, 31, 103–115. [Google Scholar] [CrossRef] - Taves, D.R. Minimization: A new method of assigning patients to treatment and control groups. Clin. Pharmacol. Ther.
**1974**, 15, 443–453. [Google Scholar] [CrossRef] [PubMed] - Proschan, M.; Brittain, E.; Kammerman, L. Minimize the use of minimization with unequal allocation. Biometrics
**2011**, 67, 1135–1141. [Google Scholar] [CrossRef] - Pond, G.R. Statistical issues in the use of dynamic allocation methods for balancing baseline covariates. Br. J. Cancer
**2011**, 104, 1711–1715. [Google Scholar] [CrossRef] [PubMed] - Hasegawa, T.; Tango, T. Permutation test following covariate-adaptive randomization in randomized controlled trials. J. Biopharm. Stat.
**2009**, 19, 106–119. [Google Scholar] [CrossRef] [PubMed] - Friedman, L.M.; DeMets, D.L.; Furberg, C.D.; Granger, C.B.; Reboussin, D.M. Fundamentals of Clinical Trials; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
- Greenland, S. On the Logical Justification of Conditional Tests for Two-By-Two Contingency Tables. Am. Stat.
**1991**, 45, 248–251. [Google Scholar] [CrossRef] - Holmberg, M.J.; Andersen, L.W. Adjustment for Baseline Characteristics in Randomized Clinical Trials. JAMA
**2022**, 328, 2155–2156. [Google Scholar] [CrossRef] [PubMed] - Harrell, J.F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis; Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar] [CrossRef]
- Greenland, S.; Pearl, J.; Robins, J.M. Confounding and Collapsibility in Causal Inference. Stat. Sci.
**1999**, 14, 29–46, 18. [Google Scholar] [CrossRef] - Hernan, M.A. A definition of causal effect for epidemiological research. J. Epidemiol. Community Health
**2004**, 58, 265–271. [Google Scholar] [CrossRef] - Holland, P.W. Statistics and Causal Inference. J. Am. Stat. Assoc.
**1986**, 81, 945–960. [Google Scholar] [CrossRef] - Russell, B. On the notion of cause. In Proceedings of the Aristotelian Society; Oxford University Press: Oxford, UK, 1912; Volume 13, pp. 1–26. [Google Scholar]
- Gelman, A.; Imbens, G. Why Ask Why? Forward Causal Inference and Reverse Causal Questions; National Bureau of Economic Research: Cambridge, MA, USA, 2013. [Google Scholar]
- Rubin, D.B. Causal Inference Using Potential Outcomes. J. Am. Stat. Assoc.
**2005**, 100, 322–331. [Google Scholar] [CrossRef] - Pearl, J.; Bareinboim, E. Note on “Generalizability of Study Results”. Epidemiology
**2019**, 30, 186–188. [Google Scholar] [CrossRef] - Brooks, D. The Sampling Distribution and Central Limit Theorem; CreateSpace Independent Publishing Platform: Scotts Valley, CA, USA, 2012. [Google Scholar]
- Degtiar, I.; Rose, S. A Review of Generalizability and Transportability. Annu. Rev. Stat. Its Appl.
**2023**, 10, 501–524. [Google Scholar] [CrossRef] - Dahabreh, I.J.; Robertson, S.E.; Steingrimsson, J.A.; Stuart, E.A.; Hernan, M.A. Extending inferences from a randomized trial to a new target population. Stat. Med.
**2020**, 39, 1999–2014. [Google Scholar] [CrossRef] [PubMed] - Dahabreh, I.J.; Hernan, M.A. Extending inferences from a randomized trial to a target population. Eur. J. Epidemiol.
**2019**, 34, 719–722. [Google Scholar] [CrossRef] [PubMed] - Campbell, D.T. Factors relevant to the validity of experiments in social settings. Psychol. Bull.
**1957**, 54, 297–312. [Google Scholar] [CrossRef] [PubMed] - Findley, M.G.; Kikuta, K.; Denly, M. External Validity. Annu. Rev. Political Sci.
**2021**, 24, 365–393. [Google Scholar] [CrossRef] - Rothman, K.J.; Gallacher, J.E.; Hatch, E.E. Why representativeness should be avoided. Int. J. Epidemiol.
**2013**, 42, 1012–1014. [Google Scholar] [CrossRef] - Richiardi, L.; Pizzi, C.; Pearce, N. Commentary: Representativeness is usually not necessary and often should be avoided. Int. J. Epidemiol.
**2013**, 42, 1018–1022. [Google Scholar] [CrossRef] - Ebrahim, S.; Davey Smith, G. Commentary: Should we always deliberately be non-representative? Int. J. Epidemiol.
**2013**, 42, 1022–1026. [Google Scholar] [CrossRef] - Rothman, K.J.; Gallacher, J.E.; Hatch, E.E. Rebuttal: When it comes to scientific inference, sometimes a cigar is just a cigar. Int. J. Epidemiol.
**2013**, 42, 1026–1028. [Google Scholar] [CrossRef] - Bradburn, M.J.; Lee, E.C.; White, D.A.; Hind, D.; Waugh, N.R.; Cooke, D.D.; Hopkins, D.; Mansell, P.; Heller, S.R. Treatment effects may remain the same even when trial participants differed from the target population. J. Clin. Epidemiol.
**2020**, 124, 126–138. [Google Scholar] [CrossRef] - Brookes, S.T.; Whitely, E.; Egger, M.; Smith, G.D.; Mulheran, P.A.; Peters, T.J. Subgroup analyses in randomized trials: Risks of subgroup-specific analyses; power and sample size for the interaction test. J. Clin. Epidemiol.
**2004**, 57, 229–236. [Google Scholar] [CrossRef] - Wallington, S.F.; Dash, C.; Sheppard, V.B.; Goode, T.D.; Oppong, B.A.; Dodson, E.E.; Hamilton, R.N.; Adams-Campbell, L.L. Enrolling Minority and Underserved Populations in Cancer Clinical Research. Am. J. Prev. Med.
**2016**, 50, 111–117. [Google Scholar] [CrossRef] [PubMed] - Schmotzer, G.L. Barriers and facilitators to participation of minorities in clinical trials. Ethn. Dis.
**2012**, 22, 226–230. [Google Scholar] [PubMed] - Behring, M.; Hale, K.; Ozaydin, B.; Grizzle, W.E.; Sodeke, S.O.; Manne, U. Inclusiveness and ethical considerations for observational, translational, and clinical cancer health disparity research. Cancer
**2019**, 125, 4452–4461. [Google Scholar] [CrossRef] [PubMed] - Shlomo, N.; Skinner, C.; Schouten, B. Estimation of an indicator of the representativeness of survey response. J. Stat. Plan. Inference
**2012**, 142, 201–211. [Google Scholar] [CrossRef] - Messiah, A.; Castro, G.; de la Vega, P.R.; Acuna, J.M. Random sample community-based health surveys: Does the effort to reach participants matter? BMJ Open
**2014**, 4, e005791. [Google Scholar] [CrossRef] - Apolo, A.B.; Msaouel, P.; Niglio, S.; Simon, N.; Chandran, E.; Maskens, D.; Perez, G.; Ballman, K.V.; Weinstock, C. Evolving Role of Adjuvant Systemic Therapy for Kidney and Urothelial Cancers. Am. Soc. Clin. Oncol. Educ. Book
**2022**, 42, 1–16. [Google Scholar] [CrossRef] - Liu, K.; Meng, X.-L. There Is Individualized Treatment. Why Not Individualized Inference? Annu. Rev. Stat. Its Appl.
**2016**, 3, 79–111. [Google Scholar] [CrossRef] - Lee, J.; Thall, P.F.; Msaouel, P. Precision Bayesian phase I-II dose-finding based on utilities tailored to prognostic subgroups. Stat. Med.
**2021**, 40, 5199–5217. [Google Scholar] [CrossRef] - Kaelin, W.G.J. Common pitfalls in preclinical cancer target validation. Nat. Rev. Cancer
**2017**, 17, 425–440. [Google Scholar] [CrossRef] - Rubin, D. Interview with Don Rubin. Obs. Stud.
**2022**, 8, 77–94. [Google Scholar] [CrossRef] - Greenland, S. An introduction to instrumental variables for epidemiologists. Int. J. Epidemiol.
**2018**, 47, 358. [Google Scholar] [CrossRef] - Mansournia, M.A.; Higgins, J.P.; Sterne, J.A.; Hernan, M.A. Biases in Randomized Trials: A Conversation Between Trialists and Epidemiologists. Epidemiology
**2017**, 28, 54–59. [Google Scholar] [CrossRef] - Bretthauer, M.; Loberg, M.; Wieszczy, P.; Kalager, M.; Emilsson, L.; Garborg, K.; Rupinski, M.; Dekker, E.; Spaander, M.; Bugajski, M.; et al. Effect of Colonoscopy Screening on Risks of Colorectal Cancer and Related Death. N. Engl. J. Med.
**2022**, 387, 1547–1556. [Google Scholar] [CrossRef] [PubMed] - Rudolph, J.E.; Naimi, A.I.; Westreich, D.J.; Kennedy, E.H.; Schisterman, E.F. Defining and Identifying Per-protocol Effects in Randomized Trials. Epidemiology
**2020**, 31, 692–694. [Google Scholar] [CrossRef] [PubMed] - Kent, D.M.; Paulus, J.K.; van Klaveren, D.; D’Agostino, R.; Goodman, S.; Hayward, R.; Ioannidis, J.P.A.; Patrick-Lake, B.; Morton, S.; Pencina, M.; et al. The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement. Ann. Intern. Med.
**2020**, 172, 35–45. [Google Scholar] [CrossRef] [PubMed] - Greenland, S. Effect Modification and Interaction. In Wiley StatsRef: Statistics Reference Online; Wiley Online Library: Hoboken, NJ, USA, 2014; pp. 1–5. [Google Scholar]
- Cuzick, J. Prognosis vs. Treatment Interaction. JNCI Cancer Spectr.
**2018**, 2, pky006. [Google Scholar] [CrossRef] [PubMed] - Slamon, D.J.; Clark, G.M.; Wong, S.G.; Levin, W.J.; Ullrich, A.; McGuire, W.L. Human breast cancer: Correlation of relapse and survival with amplification of the HER-2/neu oncogene. Science
**1987**, 235, 177–182. [Google Scholar] [CrossRef] [PubMed] - Slamon, D.J.; Godolphin, W.; Jones, L.A.; Holt, J.A.; Wong, S.G.; Keith, D.E.; Levin, W.J.; Stuart, S.G.; Udove, J.; Ullrich, A.; et al. Studies of the HER-2/neu proto-oncogene in human breast and ovarian cancer. Science
**1989**, 244, 707–712. [Google Scholar] [CrossRef] - Cooke, T.; Reeves, J.; Lanigan, A.; Stanton, P. HER2 as a prognostic and predictive marker for breast cancer. Ann. Oncol.
**2001**, 12 (Suppl. S1), S23–S28. [Google Scholar] [CrossRef] - Hayes, D.F. HER2 and Breast Cancer—A Phenomenal Success Story. N. Engl. J. Med.
**2019**, 381, 1284–1286. [Google Scholar] [CrossRef] - Wang, X.; Zhou, J.; Wang, T.; George, S.L. On Enrichment Strategies for Biomarker Stratified Clinical Trials. J. Biopharm. Stat.
**2018**, 28, 292–308. [Google Scholar] [CrossRef] [PubMed] - Thall, P.F. Adaptive Enrichment Designs in Clinical Trials. Annu. Rev. Stat. Appl.
**2021**, 8, 393–411. [Google Scholar] [CrossRef] [PubMed] - Park, Y.; Liu, S.; Thall, P.F.; Yuan, Y. Bayesian group sequential enrichment designs based on adaptive regression of response and survival time on baseline biomarkers. Biometrics
**2022**, 78, 60–71. [Google Scholar] [CrossRef] [PubMed] - Hahn, A.W.; Dizman, N.; Msaouel, P. Missing the trees for the forest: Most subgroup analyses using forest plots at the ASCO annual meeting are inconclusive. Ther. Adv. Med. Oncol.
**2022**, 14, 17588359221103199. [Google Scholar] [CrossRef] [PubMed] - Heng, D.Y.; Xie, W.; Regan, M.M.; Harshman, L.C.; Bjarnason, G.A.; Vaishampayan, U.N.; Mackenzie, M.; Wood, L.; Donskov, F.; Tan, M.H.; et al. External validation and comparison with other models of the International Metastatic Renal-Cell Carcinoma Database Consortium prognostic model: A population-based study. Lancet Oncol.
**2013**, 14, 141–148. [Google Scholar] [CrossRef] - Harrington, D.; D’Agostino, R.B.S.; Gatsonis, C.; Hogan, J.W.; Hunter, D.J.; Normand, S.T.; Drazen, J.M.; Hamel, M.B. New Guidelines for Statistical Reporting in the Journal. N. Engl. J. Med.
**2019**, 381, 285–286. [Google Scholar] [CrossRef] - Kent, D.M.; Steyerberg, E.; van Klaveren, D. Personalized evidence based medicine: Predictive approaches to heterogeneous treatment effects. BMJ
**2018**, 363, k4245. [Google Scholar] [CrossRef] - Schuirmann, D.J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm.
**1987**, 15, 657–680. [Google Scholar] [CrossRef] - Gauthier, J.; Wu, Q.V.; Gooley, T.A. Cubic splines to model relationships between continuous variables and outcomes: A guide for clinicians. Bone Marrow Transpl.
**2020**, 55, 675–680. [Google Scholar] [CrossRef] - Dickler, M.N.; Barry, W.T.; Cirrincione, C.T.; Ellis, M.J.; Moynahan, M.E.; Innocenti, F.; Hurria, A.; Rugo, H.S.; Lake, D.E.; Hahn, O.; et al. Phase III Trial Evaluating Letrozole As First-Line Endocrine Therapy With or Without Bevacizumab for the Treatment of Postmenopausal Women With Hormone Receptor-Positive Advanced-Stage Breast Cancer: CALGB 40503 (Alliance). J. Clin. Oncol.
**2016**, 34, 2602–2609. [Google Scholar] [CrossRef] - Birtle, A.; Johnson, M.; Chester, J.; Jones, R.; Dolling, D.; Bryan, R.T.; Harris, C.; Winterbottom, A.; Blacker, A.; Catto, J.W.F.; et al. Adjuvant chemotherapy in upper tract urothelial carcinoma (the POUT trial): A phase 3, open-label, randomised controlled trial. Lancet
**2020**, 395, 1268–1277. [Google Scholar] [CrossRef] [PubMed] - Cuzick, J. Forest plots and the interpretation of subgroups. Lancet
**2005**, 365, 1308. [Google Scholar] [CrossRef] [PubMed] - Pfeffer, M.A.; McMurray, J.J.; Velazquez, E.J.; Rouleau, J.L.; Kober, L.; Maggioni, A.P.; Solomon, S.D.; Swedberg, K.; Van de Werf, F.; White, H.; et al. Valsartan, captopril, or both in myocardial infarction complicated by heart failure, left ventricular dysfunction, or both. N. Engl. J. Med.
**2003**, 349, 1893–1906. [Google Scholar] [CrossRef] [PubMed] - Blume, J.D.; D’Agostino McGowan, L.; Dupont, W.D.; Greevy, R.A.J. Second-generation p-values: Improved rigor, reproducibility, & transparency in statistical analyses. PLoS ONE
**2018**, 13, e0188299. [Google Scholar] [CrossRef] - Wang, Y.; Zhang, D.; Du, G.; Du, R.; Zhao, J.; Jin, Y.; Fu, S.; Gao, L.; Cheng, Z.; Lu, Q.; et al. Remdesivir in adults with severe COVID-19: A randomised, double-blind, placebo-controlled, multicentre trial. Lancet
**2020**, 395, 1569–1578. [Google Scholar] [CrossRef] - DeMets, D.L.; Cook, T. Challenges of Non-Intention-to-Treat Analyses. JAMA
**2019**, 321, 145–146. [Google Scholar] [CrossRef] - Mauri, L.; D’Agostino, R.B.S. Challenges in the Design and Interpretation of Noninferiority Trials. N. Engl. J. Med.
**2017**, 377, 1357–1367. [Google Scholar] [CrossRef] - Soonawala, D.; Dekkers, O.M.; Vandenbroucke, J.P.; Egger, M. Noninferiority is (too) common in noninferiority trials. J. Clin. Epidemiol.
**2016**, 71, 118–120. [Google Scholar] [CrossRef] - Flacco, M.E.; Manzoli, L.; Ioannidis, J.P. Noninferiority is almost certain with lenient noninferiority margins. J. Clin. Epidemiol.
**2016**, 71, 118. [Google Scholar] [CrossRef] - Zampieri, F.G.; Casey, J.D.; Shankar-Hari, M.; Harrell, F.E.J.; Harhay, M.O. Using Bayesian Methods to Augment the Interpretation of Critical Care Trials. An Overview of Theory and Example Reanalysis of the Alveolar Recruitment for Acute Respiratory Distress Syndrome Trial. Am. J. Respir. Crit. Care Med.
**2021**, 203, 543–552. [Google Scholar] [CrossRef] - Spiegelhalter, D.J.; Freedman, L.S.; Mahesh, K.B.P. Bayesian Approaches to Randomized Trials. J. R. Stat. Soc. Ser. A Stat. Soc.
**1994**, 157, 357–416. [Google Scholar] [CrossRef] - Ruberg, S.J.; Beckers, F.; Hemmings, R.; Honig, P.; Irony, T.; LaVange, L.; Lieberman, G.; Mayne, J.; Moscicki, R. Application of Bayesian approaches in drug development: Starting a virtuous cycle. Nat. Rev. Drug Discov.
**2023**, 22, 235–250. [Google Scholar] [CrossRef] [PubMed] - Combes, A.; Hajage, D.; Capellier, G.; Demoule, A.; Lavoue, S.; Guervilly, C.; Da Silva, D.; Zafrani, L.; Tirot, P.; Veber, B.; et al. Extracorporeal Membrane Oxygenation for Severe Acute Respiratory Distress Syndrome. N. Engl. J. Med.
**2018**, 378, 1965–1975. [Google Scholar] [CrossRef] [PubMed] - Harrington, D.; Drazen, J.M. Learning from a Trial Stopped by a Data and Safety Monitoring Board. N. Engl. J. Med.
**2018**, 378, 2031–2032. [Google Scholar] [CrossRef] - Goligher, E.C.; Tomlinson, G.; Hajage, D.; Wijeysundera, D.N.; Fan, E.; Juni, P.; Brodie, D.; Slutsky, A.S.; Combes, A. Extracorporeal Membrane Oxygenation for Severe Acute Respiratory Distress Syndrome and Posterior Probability of Mortality Benefit in a Post Hoc Bayesian Analysis of a Randomized Clinical Trial. JAMA
**2018**, 320, 2251–2259. [Google Scholar] [CrossRef] [PubMed] - Weir, C.J.; Taylor, R.S. Informed decision-making: Statistical methodology for surrogacy evaluation and its role in licensing and reimbursement assessments. Pharm. Stat.
**2022**, 21, 740–756. [Google Scholar] [CrossRef] [PubMed] - Ionan, A.C.; Paterniti, M.; Mehrotra, D.V.; Scott, J.; Ratitch, B.; Collins, S.; Gomatam, S.; Nie, L.; Rufibach, K.; Bretz, F. Clinical and Statistical Perspectives on the ICH E9(R1) Estimand Framework Implementation. Stat. Biopharm. Res.
**2022**, 15, 554–559. [Google Scholar] [CrossRef] - Mayo, S.; Kim, Y. What Can Be Achieved with the Estimand Framework? Stat. Biopharm. Res.
**2023**, 15, 549–553. [Google Scholar] [CrossRef] - Korn, E.L.; Freidlin, B.; Abrams, J.S. Overall survival as the outcome for randomized clinical trials with effective subsequent therapies. J. Clin. Oncol.
**2011**, 29, 2439–2442. [Google Scholar] [CrossRef] - Stewart, D.J. Before we throw out progression-free survival as a valid end point. J. Clin. Oncol.
**2012**, 30, 3426–3427. [Google Scholar] [CrossRef] - Booth, C.M.; Eisenhauer, E.A. Progression-free survival: Meaningful or simply measurable? J. Clin. Oncol.
**2012**, 30, 1030–1033. [Google Scholar] [CrossRef] [PubMed] - Anderson, K.C.; Kyle, R.A.; Rajkumar, S.V.; Stewart, A.K.; Weber, D.; Richardson, P. Clinically relevant end points and new drug approvals for myeloma. Leukemia
**2008**, 22, 231–239. [Google Scholar] [CrossRef] [PubMed] - Hussain, M.; Goldman, B.; Tangen, C.; Higano, C.S.; Petrylak, D.P.; Wilding, G.; Akdas, A.M.; Small, E.J.; Donnelly, B.J.; Sundram, S.K.; et al. Prostate-specific antigen progression predicts overall survival in patients with metastatic prostate cancer: Data from Southwest Oncology Group Trials 9346 (Intergroup Study 0162) and 9916. J. Clin. Oncol.
**2009**, 27, 2450–2456. [Google Scholar] [CrossRef] [PubMed] - Bashir, Q.; Thall, P.F.; Milton, D.R.; Fox, P.S.; Kawedia, J.D.; Kebriaei, P.; Shah, N.; Patel, K.; Andersson, B.S.; Nieto, Y.L.; et al. Conditioning with busulfan plus melphalan versus melphalan alone before autologous haemopoietic cell transplantation for multiple myeloma: An open-label, randomised, phase 3 trial. Lancet Haematol.
**2019**, 6, e266–e275. [Google Scholar] [CrossRef] [PubMed] - Thall, P.F.; Millikan, R.E.; Sung, H.G. Evaluating multiple treatment courses in clinical trials. Stat. Med.
**2000**, 19, 1011–1028. [Google Scholar] [CrossRef] - Chakraborty, B.; Moodie, E.E.M. Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine; Springer: New York, NY, USA, 2013. [Google Scholar]
- Tsiatis, A.A. Dynamic Treatment Regimes: Statistical Methods for Precision Medicine; CRC Press: Boca Raton, FL, USA; Taylor & Francis Group: Abingdon, UK, 2020. [Google Scholar]
- Wang, X.; Chakraborty, B. The Sequential Multiple Assignment Randomized Trial for Controlling Infectious Diseases: A Review of Recent Developments. Am. J. Public Health
**2023**, 113, 49–59. [Google Scholar] [CrossRef] - Murphy, S.A. An experimental design for the development of adaptive treatment strategies. Stat. Med.
**2005**, 24, 1455–1481. [Google Scholar] [CrossRef] - Almirall, D.; Lizotte, D.J.; Murphy, S.A. SMART Design Issues and the Consideration of Opposing Outcomes: Discussion of “Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer” by Wang, Rotnitzky, Lin, Millikan, and Thall. J. Am. Stat. Assoc.
**2012**, 107, 509–512. [Google Scholar] [CrossRef] - Almirall, D.; Nahum-Shani, I.; Sherwood, N.E.; Murphy, S.A. Introduction to SMART designs for the development of adaptive interventions: With application to weight loss research. Transl. Behav. Med.
**2014**, 4, 260–274. [Google Scholar] [CrossRef] - Motzer, R.J.; Jonasch, E.; Agarwal, N.; Alva, A.; Baine, M.; Beckermann, K.; Carlo, M.I.; Choueiri, T.K.; Costello, B.A.; Derweesh, I.H.; et al. Kidney Cancer, Version 3.2022, NCCN Clinical Practice Guidelines in Oncology. J. Natl. Compr. Cancer Netw.
**2022**, 20, 71–90. [Google Scholar] [CrossRef] - Chakraborty, B.; Murphy, S.A. Dynamic Treatment Regimes. Annu. Rev. Stat. Appl.
**2014**, 1, 447–464. [Google Scholar] [CrossRef] [PubMed] - Boele, F.; Harley, C.; Pini, S.; Kenyon, L.; Daffu-O’Reilly, A.; Velikova, G. Cancer as a chronic illness: Support needs and experiences. BMJ Support. Palliat. Care
**2019**. [Google Scholar] [CrossRef] [PubMed] - Wang, L.; Rotnitzky, A.; Lin, X.; Millikan, R.E.; Thall, P.F. Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer. J. Am. Stat. Assoc.
**2012**, 107, 493–508. [Google Scholar] [CrossRef] - Wahed, A.S.; Thall, P.F. Evaluating Joint Effects of Induction-Salvage Treatment Regimes on Overall Survival in Acute Leukemia. J. R. Stat. Soc. Ser. C Appl. Stat.
**2013**, 62, 67–83. [Google Scholar] [CrossRef] [PubMed] - Huang, X.; Choi, S.; Wang, L.; Thall, P.F. Optimization of multi-stage dynamic treatment regimes utilizing accumulated data. Stat. Med.
**2015**, 34, 3424–3443. [Google Scholar] [CrossRef] [PubMed] - Xu, Y.; Muller, P.; Wahed, A.S.; Thall, P.F. Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times. J. Am. Stat. Assoc.
**2016**, 111, 921–935. [Google Scholar] [CrossRef] - Thall, P.F.; Mueller, P.; Xu, Y.; Guindani, M. Bayesian nonparametric statistics: A new toolkit for discovery in cancer research. Pharm. Stat.
**2017**, 16, 414–423. [Google Scholar] [CrossRef] - Murray, T.A.; Yuan, Y.; Thall, P.F. A Bayesian Machine Learning Approach for Optimizing Dynamic Treatment Regimes. J. Am. Stat. Assoc.
**2018**, 113, 1255–1267. [Google Scholar] [CrossRef] - Valenti, V.; Jimenez-Fonseca, P.; Msaouel, P.; Salazar, R.; Carmona-Bayonas, A. Fooled by Randomness. The Misleading Effect of Treatment Crossover in Randomized Trials of Therapies with Marginal Treatment Benefit. Cancer Investig.
**2022**, 40, 184–188. [Google Scholar] [CrossRef] - Isbary, G.; Staab, T.R.; Amelung, V.E.; Dintsios, C.M.; Iking-Konert, C.; Nesurini, S.M.; Walter, M.; Ruof, J. Effect of Crossover in Oncology Clinical Trials on Evidence Levels in Early Benefit Assessment in Germany. Value Health
**2018**, 21, 698–706. [Google Scholar] [CrossRef] - Tap, W.D.; Jones, R.L.; Van Tine, B.A.; Chmielowski, B.; Elias, A.D.; Adkins, D.; Agulnik, M.; Cooney, M.M.; Livingston, M.B.; Pennock, G.; et al. Olaratumab and doxorubicin versus doxorubicin alone for treatment of soft-tissue sarcoma: An open-label phase 1b and randomised phase 2 trial. Lancet
**2016**, 388, 488–497. [Google Scholar] [CrossRef] [PubMed] - Tap, W.D.; Wagner, A.J.; Schoffski, P.; Martin-Broto, J.; Krarup-Hansen, A.; Ganjoo, K.N.; Yen, C.C.; Abdul Razak, A.R.; Spira, A.; Kawai, A.; et al. Effect of Doxorubicin Plus Olaratumab vs Doxorubicin Plus Placebo on Survival in Patients with Advanced Soft Tissue Sarcomas: The ANNOUNCE Randomized Clinical Trial. JAMA
**2020**, 323, 1266–1276. [Google Scholar] [CrossRef] [PubMed] - Goss, P.E.; Ingle, J.N.; Pritchard, K.I.; Robert, N.J.; Muss, H.; Gralow, J.; Gelmon, K.; Whelan, T.; Strasser-Weippl, K.; Rubin, S.; et al. Extending Aromatase-Inhibitor Adjuvant Therapy to 10 Years. N. Engl. J. Med.
**2016**, 375, 209–219. [Google Scholar] [CrossRef] [PubMed] - Laber, E.B.; Davidian, M. Dynamic treatment regimes, past, present, and future: A conversation with experts. Stat. Methods Med. Res.
**2017**, 26, 1605–1610. [Google Scholar] [CrossRef] [PubMed] - Plana, D.; Palmer, A.C.; Sorger, P.K. Independent Drug Action in Combination Therapy: Implications for Precision Oncology. Cancer Discov.
**2022**, 12, 606–624. [Google Scholar] [CrossRef] - Worthington, R.J.; Melander, C. Combination approaches to combat multidrug-resistant bacteria. Trends Biotechnol.
**2013**, 31, 177–184. [Google Scholar] [CrossRef] - Richman, D.D. HIV chemotherapy. Nature
**2001**, 410, 995–1001. [Google Scholar] [CrossRef] - Tamma, P.D.; Cosgrove, S.E.; Maragakis, L.L. Combination therapy for treatment of infections with gram-negative bacteria. Clin. Microbiol. Rev.
**2012**, 25, 450–470. [Google Scholar] [CrossRef] - Kerantzas, C.A.; Jacobs, W.R.J. Origins of Combination Therapy for Tuberculosis: Lessons for Future Antimicrobial Development and Application. mBio
**2017**, 8, 10–1128. [Google Scholar] [CrossRef] - Frei, E., III; Holland, J.F.; Schneiderman, M.A.; Pinkel, D.; Selkirk, G.; Freireich, E.J.; Silver, R.T.; Gold, G.L.; Regelson, W. A comparative study of two regimens of combination chemotherapy in acute leukemia. Blood
**1958**, 13, 1126–1148. [Google Scholar] [CrossRef] - Chou, T.C. Theoretical basis, experimental design, and computerized simulation of synergism and antagonism in drug combination studies. Pharmacol. Rev.
**2006**, 58, 621–681. [Google Scholar] [CrossRef] [PubMed] - Msaouel, P.; Goswami, S.; Thall, P.F.; Wang, X.; Yuan, Y.; Jonasch, E.; Gao, J.; Campbell, M.T.; Shah, A.Y.; Corn, P.G.; et al. A phase 1-2 trial of sitravatinib and nivolumab in clear cell renal cell carcinoma following progression on antiangiogenic therapy. Sci. Transl. Med.
**2022**, 14, eabm6420. [Google Scholar] [CrossRef] - Lee, J.; Thall, P.F.; Msaouel, P. A phase I-II design based on periodic and continuous monitoring of disease status and the times to toxicity and death. Stat. Med.
**2020**, 39, 2035–2050. [Google Scholar] [CrossRef] [PubMed] - Yuan, Y.; Nguyen, H.Q.; Thall, P.F. Bayesian Designs for Phase I-II Clinical Trials; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
- de Lima, M.; Couriel, D.; Thall, P.F.; Wang, X.; Madden, T.; Jones, R.; Shpall, E.J.; Shahjahan, M.; Pierre, B.; Giralt, S.; et al. Once-daily intravenous busulfan and fludarabine: Clinical and pharmacokinetic results of a myeloablative, reduced-toxicity conditioning regimen for allogeneic stem cell transplantation in AML and MDS. Blood
**2004**, 104, 857–864. [Google Scholar] [CrossRef] [PubMed] - Gerard, E.; Zohar, S.; Thai, H.T.; Lorenzato, C.; Riviere, M.K.; Ursino, M. Bayesian dose regimen assessment in early phase oncology incorporating pharmacokinetics and pharmacodynamics. Biometrics
**2022**, 78, 300–312. [Google Scholar] [CrossRef] - Montgomery, A.A.; Peters, T.J.; Little, P. Design, analysis and presentation of factorial randomised controlled trials. BMC Med. Res. Methodol.
**2003**, 3, 26. [Google Scholar] [CrossRef] - Palmer, A.C.; Sorger, P.K. Combination Cancer Therapy Can Confer Benefit via Patient-to-Patient Variability without Drug Additivity or Synergy. Cell
**2017**, 171, 1678–1691.e1613. [Google Scholar] [CrossRef] - Kotecha, R.R.; Hsu, D.J.; Lee, C.H.; Patil, S.; Voss, M.H. In silico modeling of combination systemic therapy for advanced renal cell carcinoma. J. Immunother. Cancer
**2021**, 9, e004059. [Google Scholar] [CrossRef] - Frei, E., III; Freireich, E.J.; Gehan, E.; Pinkel, D.; Holland, J.F.; Selawry, O.; Haurani, F.; Spurr, C.L.; Hayes, D.M.; James, G.W. Studies of sequential and combination antimetabolite therapy in acute leukemia: 6-mercaptopurine and methotrexate. Blood
**1961**, 18, 431–454. [Google Scholar] [CrossRef] - Logothetis, C.J.; Gallick, G.E.; Maity, S.N.; Kim, J.; Aparicio, A.; Efstathiou, E.; Lin, S.H. Molecular classification of prostate cancer progression: Foundation for marker-driven treatment of prostate cancer. Cancer Discov.
**2013**, 3, 849–861. [Google Scholar] [CrossRef] - Farewell, V.T. Mixture Models in Survival Analysis: Are They Worth the Risk? Can. J. Stat./La Rev. Can. Stat.
**1986**, 14, 257–262. [Google Scholar] [CrossRef] - Amico, M.; Keilegom, I.V. Cure Models in Survival Analysis. Annu. Rev. Stat. Its Appl.
**2018**, 5, 311–342. [Google Scholar] [CrossRef] - Senn, S.J. Falsificationism and clinical trials. Stat. Med.
**1991**, 10, 1679–1692. [Google Scholar] [CrossRef] - Mansournia, M.A.; Nazemipour, M.; Etminan, M. Causal diagrams for immortal time bias. Int. J. Epidemiol.
**2021**, 50, 1405–1409. [Google Scholar] [CrossRef] [PubMed] - Giobbie-Hurder, A.; Gelber, R.D.; Regan, M.M. Challenges of guarantee-time bias. J. Clin. Oncol.
**2013**, 31, 2963–2969. [Google Scholar] [CrossRef] - Senn, S. Lessons from TGN1412 and TARGET: Implications for observational studies and meta-analysis. Pharm. Stat.
**2008**, 7, 294–301. [Google Scholar] [CrossRef] [PubMed] - Senn, S. Tea for three: Of infusions and inferences and milk in first. Significance
**2012**, 9, 30–33. [Google Scholar] [CrossRef] - Senn, S. A Conversation with John Nelder. Stat. Sci.
**2003**, 18, 118–131. [Google Scholar] [CrossRef] - Greenland, S.; Mansournia, M.A. Limitations of individual causal models, causal graphs, and ignorability assumptions, as illustrated by random confounding and design unfaithfulness. Eur. J. Epidemiol.
**2015**, 30, 1101–1110. [Google Scholar] [CrossRef] - Weele, T.J.V. Confounding and effect modification: Distribution and measure. Epidemiol. Methods
**2012**, 1, 55–82. [Google Scholar] [CrossRef] - Suzuki, E.; Shinozaki, T.; Yamamoto, E. Causal Diagrams: Pitfalls and Tips. J. Epidemiol.
**2020**, 30, 153–162. [Google Scholar] [CrossRef] [PubMed] - Breskin, A.; Cole, S.R.; Hudgens, M.G. A Practical Example Demonstrating the Utility of Single-world Intervention Graphs. Epidemiology
**2018**, 29, e20–e21. [Google Scholar] [CrossRef] [PubMed] - Richardson, T.S.; Robins, J.M. Single world intervention graphs: A primer. In Second UAI Workshop on Causal Structure Learning; Bellevue: Washington, DC, USA, 2013. [Google Scholar]
- Ocampo, A.; Bather, J.R. Single-world intervention graphs for defining, identifying, and communicating estimands in clinical trials. Stat. Med.
**2023**, 42, 3892–3902. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Information processing model of the two major schools of statistical inference. The unobserved collection of mechanisms in nature generates phenomena known as data-generating processes. These physical mechanisms generate data, which are then processed by statistical models that use probability distributions to generate information that can be quantified in binary digits (bits) of surprisal. Information can be used to make inferences about both the data-generating process and the unobserved underlying nature.

**Figure 2.**Bayesian updating of response probability to an investigational therapy in patients with chemotherapy-refractory renal medullary carcinoma (RMC). Prior probability distributions are colored blue and posterior probability distributions are colored red. (

**A**) Uniform prior, also known as the Laplace prior, encoding the assumption that all response values in the unit interval of (0, 1) are equally likely. (

**B**) Posterior probability distribution updated from the uniform prior after 7 out of 10 patients with RMC treated in a pilot feasibility study showed response. (

**C**) Prior probability distribution encoding the knowledge obtained from the pilot study before conducting the main study. (

**D**) Posterior probability distribution updated after 20 out of 50 patients with RMC treated in the main study showed response.

**Figure 3.**Bayesian updating of response probability to an investigational therapy in patients with chemotherapy-refractory renal medullary carcinoma (RMC). Prior probability distributions are colored blue and posterior probability distributions are colored red. (

**A**) Uniform prior, also known as the Laplace prior, encoding the assumption that all response values in the unit interval of (0, 1) are equally likely. (

**B**) Posterior probability distribution updated from the uniform prior after 20 out of 50 patients with RMC who were treated in the main study showed response. (

**C**) Prior probability distribution encoding the knowledge obtained from the main study. (

**D**) Posterior probability distribution updated after incorporating the results of the pilot study wherein 7 out of 10 patients with RMC showed response.

**Figure 4.**Frequentist and Bayesian Inference. (

**A**) In a randomized controlled trial (RCT) testing a new therapy versus control, the null hypothesis is expressed as θ = 0 for the relative treatment effect difference between the new therapy and the control. Bayesian models can be used to obtain posterior probabilities of a treatment effect being correct relative to alternative treatment effect values (confirmationist inference) or wrong (refutationist inference). (

**B**) Frequentist models do not use prior distribution but can be used to investigate purely refutational RCT evidence against the embedded statistical model and the assumption that the test hypothesis (typically the null hypothesis of no treatment difference) is true. For example, if the null hypothesis and all other model assumptions are true, the physical act of random treatment assignment would be expected to generate a random distribution of the data D yielded by repeated replications of the RCT. The amount of divergence of the observed data from this expected random distribution is a measure of refutational evidence against the null hypothesis that θ = 0 and all other underlying model assumptions. Similar considerations can be applied to generate refutational evidence against other tested hypotheses corresponding to different values of θ.

**Figure 5.**Bayesian updating of the DFS HR estimate of the KEYNOTE-564 phase 3 RCT that compared adjuvant pembrolizumab versus placebo in ccRCC. The informative prior probability distribution (blue) is designed to account for the winner’s curse based on an empirical analysis of the results of 23,551 medical RCTs of relative treatment efficacy available in the Cochrane Database of Systematic Reviews. The likelihood (black) is based on the reported frequentist results of KEYNOTE-564, demonstrating an HR of 0.68 with 95% frequentist confidence intervals of 0.53 to 0.87. The posterior distribution (red) combines the prior information (blue) and information from the data (black) and lies in-between. The resulting posterior distribution (red) accounts for the winner’s curse and yields a Bayesian posterior mean HR of 0.76 with 95% posterior CrI 0.59–0.96. The posterior probability that the HR is larger than 1.0 is 0.8%.

**Figure 6.**Selection diagrams distinguishing the causal effects of the two major types of random procedures used in research. (

**A**) In a nonrandomized trial, the baseline covariates of patients can confound the estimation of the relative treatment effect because they can influence both treatment assignment and the outcome of interest. The selection node S indicates that sampling biases influence the enrichment of these baseline patient covariates in the study. (

**B**) In an RCT, the treatment assignment of each patient or other study unit is only influenced by the random allocation procedure. Therefore, the baseline patient covariates can no longer be systematic confounders of the relative treatment effect but still influence the outcome, thus serving as prognostic factors. The physical act of randomization justifies the estimation of uncertainty measures as random errors for the relative treatment effect parameter comparing the enrolled groups (comparative inference). (

**C**) In survey studies, the random sampling of patients from the population of interest removes systematic sampling biases and provide a physically justifiable distribution for the probability that the enrolled sample estimates for each sampled group are generalizable to the broader population. (

**D**) In pure randomization inference, random allocation and random sampling remove systemic confounding and sampling bias thus allowing the physically justifiable estimation of uncertainty estimates for both the relative treatment effect and sample generalizability.

**Figure 7.**Example Kaplan–Meier survival plots from three hypothetical RCTs. The shaded gray area represents the midpoint of the treatment and control group survival estimates plus or minus the half-width of the 95% CI for the difference of each group’s Kaplan–Meier probability estimates. This gray polygon is centered at the midpoint between the two groups so that if it crosses one survival curve, it will also cross the other. It thus indicates that p > 0.05 (not multiplicity adjusted) for the null hypothesis of no treatment group difference in that time point, at time points where the gray polygon crosses the survival curves. HRs and their CIs and p-values were estimated using a univariable Cox proportional hazards model. (

**A**) Example RCT with consistent signal of survival difference between the treatment and control (p < 0.05, corresponding to at least 4 bits of information against the null hypothesis). The corresponding Cox regression model yielded 14 bits of refutational information against the null hypothesis of no difference under the assumption that all other background model assumptions are correct. (

**B**) Example RCT with no strong survival difference signal between the treatment and control groups, as indicated by the gray area consistently crossing the survival curves. The consistently narrow width of the gray polygon indicates that the trial results are compatible at the 0.05 level with no clinically meaningful difference between the treatment and control groups throughout the study. This is supported by the corresponding Cox model, which wielded only 2 bits of refutational information against the null hypothesis, as well as a 95% CI compatible with HR effect sizes ranging from 0.74, favoring the treatment group, to 1.1, favoring the control group. (

**C**) This example RCT also has no strong survival difference signal between the treatment and control groups. The consistently wide gray area indicates that the signal is very low at all time points. Therefore, no inferences can be made on whether or not there is a treatment difference based on these survival curves. Accordingly, the corresponding Cox model yielded very low refutational information against the null hypothesis and a very wide 95% CI compatible with HR effect sizes as low as 0.54, strongly favoring the treatment group, and as high as 1.30, strongly favoring the control group.

**Figure 8.**Selection diagrams distinguishing the causal effects of stratification, covariate-adaptive randomization, and blocking. (

**A**) Surveys can obtain samples from explicitly specified stratification variables, which divide the population into smaller subgroups called “strata”. This induces a selection bias specifically for the stratification variables. Patients are then selected randomly from each stratum to form the final sample. (

**B**) Clinical trials can ensure balance of specific baseline patient covariates by choosing the treatment assignment of each patient after adaptively accounting for their baseline patient covariates and for the treatment assignment of previously enrolled patients. Minimization is the most commonly used covariate-adaptive randomization method used in clinical trials. This covariate-adaptive “randomization” is actually a largely nonrandom treatment allocation method because it is influenced by the characteristics of earlier patients along with the baseline covariates of the current patient. (

**C**) RCTs can limit the random allocation of treatments in such a way that each treatment group is balanced with respect to explicitly specified blocking variables, reducing the heterogeneity of the outcome. An additional non-mutually exclusive strategy would be to covariate adjust in the statistical model for the effect of the blocking variables on the outcome.

**Figure 9.**Selection diagrams distinguishing per intention to treat (ITT), per protocol (PP), and as treated (AT) in RCTs. (

**A**) Diagram illustrating the scenario whereby patients randomly assigned to a treatment did not always receive it. The relative treatment effect parameter from the PP analysis is more relevant for direct patient care but is susceptible to confounding biases from covariates that may have influenced treatment receipt. (

**B**) Diagram illustrating the scenario whereby patients randomly assigned to a treatment did not always receive it, and those that received it did not always use it. The relative treatment effect parameter from the AT analysis is more relevant for direct patient care but is susceptible to confounding biases from covariates that may have influenced treatment receipt and treatment use.

**Figure 10.**Selection diagrams representing the data-generating processes of prognostic and predictive effects in RCTs. (

**A**) Prognostic biomarkers are baseline patient variables that directly influence the outcome and not the relative treatment effect. Thus, relative treatment effect parameters such as HRs and odds ratios (ORs) are assumed to be stable for all patients in the RCT cohort. (

**B**) Predictive biomarkers are baseline patient variables that influence the relative treatment effect via their effect on the mediator pathway that transmits the effect of treatment assignment on the RCT outcome. HRs, ORs, and other relative treatment effect parameters can change depending on the values of the predictive biomarker. (

**C**) In patients with breast cancer, HER2 amplification status acts as both a prognostic and predictive biomarker.

**Figure 11.**Example forest plot from a hypothetical RCT of an investigational treatment versus control. The forest plot is used to look for predictive effects expressed as differences in HR estimates in different subgroups compared with the overall RCT cohort. The dotted vertical line highlights the relative treatment effect point estimate for the overall cohort, also known as the main effect. The size of the black squares corresponds to the sample size of each subgroup. The white square represents the overall RCT cohort. The horizontal lines represent the 95% Cis. The shaded gray area represents the indifference zone for the HR estimate in the overall cohort, assuming that relative treatment effects between 80% and 125% of the 95% CI for the overall cohort do not represent clinically meaningful differences between each subgroup and the overall cohort. In this example, the 95% CI for the HR in the overall cohort is 0.36–0.73, corresponding to an indifference zone of 0.29–0.91. Therefore, treatment effect homogeneity is suggested for all subgroups with the 95% CI that are only compatible with values within the indifference zone (gray area). Treatment effect heterogeneity is suggested in subgroups with 95% CI that do not overlap with the dotted vertical line. All other subgroups are inconclusive.

**Figure 12.**Selection diagrams representing the data-generating processes of clinical endpoints in RCTs. (

**A**) In RCTs where no subsequent options are available, intermediate events such as disease progression will directly correlate with survival. Thus, the prognostic variables that influence disease progression will also influence survival directly or indirectly via the disease progression pathway. Blocking or adjusting for these variables will increase the reliability of disease progression and survival estimates. (

**B**) In RCTs where subsequent therapies are available, random allocation removes all other causal influences on the treatment assignment of the first therapy, physically justifying the use of uncertainty estimates of the direct relative treatment effect on patient survival and the relative treatment effect for intermediate endpoints such as disease progression. These are the parameters used for intermediate survival endpoints such as PFS or DFS. However, the effect of the original treatment assignment on survival will also be mediated indirectly by subsequent therapies and disease progression events, which can be confounded by patient covariates at the time of subsequent treatment allocation. (

**C**) Example RCT to evaluate the effect of adjuvant therapy or placebo in patients with localized ccRCC. Baseline prognostic factors, such as tumor stage, that influence disease recurrence can be balanced by blocking and adjusting in the statistical model to facilitate estimation of the DFS endpoint. However, upon disease recurrence, the choice of subsequent therapies will be influenced by covariates such as the International Metastatic Renal Cell Carcinoma Database Consortium (IMDC) risk score for metastatic RCC. This confounding influence and mediating effect of subsequent therapies and disease progression need to be modeled for reliable estimation of the OS endpoint.

**Figure 13.**Selection diagrams representing the data-generating processes of clinical endpoints in RCTs to evaluate treatment regimes. (

**A**) RCTs evaluating static treatment regimes prespecify a fixed subsequent treatment strategy that all enrolled patients will use upon disease progression to the randomly assigned first treatment. Thus, the only variable that influences whether a patient receives the subsequent treatment is the presence of disease progression to the first treatment. (

**B**) RCTs evaluating dynamic treatment regimes may randomly allocate both the first and subsequent treatment assignment. This facilitates reliable estimation of the effect of sequential decision rules for the initial and subsequent therapy strategy to optimize long-term outcomes such as OS.

**Figure 14.**Selection diagram representing the data-generating processes of clinical endpoints in RCTs that allow crossover. Random allocation removes all other causal influences on the assignment of the first therapy, physically justifying the use of uncertainty estimates of the direct relative treatment effect on patient survival and the relative treatment effect for intermediate endpoints such as disease progression. These parameters are used for intermediate survival endpoints such as PFS or DFS. Due to potential crossover, the randomly assigned initial treatment will influence the choice of subsequent treatment. The effect of the original treatment assignment on survival will be mediated indirectly by such subsequent therapy choices and disease progression events, which can also be confounded by patient covariates at the time of subsequent treatment allocation. Depending on how the first treatment assignment influences the subsequent treatment during crossover, the OS parameter can be biased toward a false-positive or false-negative direction.

RCT Measure | Examples | Role in RCT Interpretation | Additional Comments |
---|---|---|---|

Uncertainty estimates for the outcome differences between groups | CIs for HR, RR, OR, mean survival difference, or 1-year risk reduction | The major goal of RCTs is to generate valid uncertainty estimates for the differences between groups (comparative inference). This is achieved via random allocation. | Point estimates can be extrapolated from uncertainty intervals |

Point estimates for the outcome differences between groups | HR, OR, RR, mean survival difference, or 1-year risk reduction | The differences between groups are the focus of RCTs | Point estimates alone without uncertainty estimates can be misleading |

p values for the outcome differences between groups | p value for the null hypothesis of HR = 1.0 | Refutational signals for tested hypotheses (usually the null hypothesis) and the background assumptions of the embedded statistical models | Can be converted into bits of refutational information (S values) |

Group-specific measures | Median or mean survival, objective response rate, 1-year survival probability for each group | Descriptive measures providing information on the characteristics of the enrolled patients | Uncertainty measures such as SEs and CIs are valid in RCTs where random sampling has also been performed. Otherwise, measures of variability such as standard deviation or interquartile range are more appropriate. |

Goal | Approach Used in Random Sampling | Approach Used in Random Allocation | Additional Comments |
---|---|---|---|

Study design | Sampling theory | Experimental design | Random allocation may refer to random treatment assignment in RCTs, natural genetic variation in Mendelian randomization, or other natural random allocation processes used as instrumental variables |

Describe the population of enrolled patients | Sample | Cohort | Cohorts of patients are not randomly sampled. They are randomly allocated to different exposures such as a treatment or control. |

Use of uncertainty measures | Justified for group-specific parameters | Justified for comparative parameters representing differences between groups | Measures of variability such as interquartile range and standard deviation are preferred for group-specific parameters in the absence of random sampling |

External validity | Generalizability from sample to broader population | Transportability from cohort to target population | Refers to the extension of knowledge between one population (sample or cohort) to another |

Study underserved populations or minorities | Representative sampling | Representative causal mechanisms | Ethical oversight is warranted to ensure inclusiveness of RCTs with the goal to reduce healthcare disparities |

Mitigate imbalances induced by the random procedure | Stratification | Blocking | Covariate adjustment can also account for random imbalances in RCTs |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Msaouel, P.; Lee, J.; Thall, P.F.
Interpreting Randomized Controlled Trials. *Cancers* **2023**, *15*, 4674.
https://doi.org/10.3390/cancers15194674

**AMA Style**

Msaouel P, Lee J, Thall PF.
Interpreting Randomized Controlled Trials. *Cancers*. 2023; 15(19):4674.
https://doi.org/10.3390/cancers15194674

**Chicago/Turabian Style**

Msaouel, Pavlos, Juhee Lee, and Peter F. Thall.
2023. "Interpreting Randomized Controlled Trials" *Cancers* 15, no. 19: 4674.
https://doi.org/10.3390/cancers15194674