Exploring Study Design Foibles in Randomized Controlled Trials on Convalescent Plasma in Hospitalized COVID-19 Patients

Background: Sample size estimation is an essential step in the design of randomized controlled trials (RCTs) evaluating a treatment effect. Sample size is a critical variable in determining statistical significance and, thus, it significantly influences RCTs’ success or failure. During the COVID-19 pandemic, many RCTs tested the efficacy of COVID-19 convalescent plasma (CCP) in hospitalized patients but reported different efficacies, which could be attributed to, in addition to timing and dose, inadequate sample size estimates. Methods: To assess the sample size estimation in RCTs evaluating the effect of treatment with CCP in hospitalized COVID-19 patients, we searched the medical literature between January 2020 and March 2024 through PubMed and other electronic databases, extracting information on expected size effect, statistical power, significance level, and measured efficacy. Results: A total of 32 RCTs were identified. While power and significance level were highly consistent, heterogeneity in the expected size effect was relevant. Approximately one third of the RCTs did not reach the planned sample size for various reasons, with the most important one being slow patient recruitment during the pandemic’s peaks. RCTs with a primary outcome in favor of CCP treatment had a significant lower median absolute difference in the expected size effect than unfavorable RCTs (20.0% versus 33.9%, P = 0.04). Conclusions: The analyses of sample sizes in RCTs of CCP treatment in hospitalized COVID-19 patients reveal that many underestimated the number of participants needed because of excessively high expectations on efficacy, and thus, these studies had low statistical power. This, in combination with a lower-than-planned recruitment of cases and controls, could have further negatively influenced the primary outcomes of the RCTs.


Introduction
Similar to what happened in several previous infectious outbreaks, plasma collected from recovered subjects was the first antibody-based therapy used to fight the recent COVID-19 pandemic [1,2].In the USA, COVID-19 convalescent plasma (CCP) was first deployed under a registry [3].After an analysis of registry data had identified a signal of efficacy [4], the Food and Drug Administration (FDA) issued emergency use authorization, which led to CCP transfusions in more than 500,000 COVID-19 patients [5].In addition, CCP was the most intensively studied anti-SARS-CoV-2 therapeutic agent in the COVID-19 pandemic, with nearly 50 randomized controlled trials (RCTs) focusing on CCP being published (Supplementary Table S1) .Such trials have identified the correct place for CCP among therapies against COVID-19, and it is more effective in blocking viral replication and disease progression when transfused with a high concentration of anti-SARS-CoV-2 neutralizing antibodies (nAbs) at the early stage (i.e., within 5 days from symptom onset), particularly in seronegative immunocompromised patients [53][54][55].While previous RCTs have consistently showed that early CCP administration in COVID-19 outpatients is effective in reducing disease progression and the risk of hospitalization, later studies on the in-hospital use of CCP have yielded mixed results [56,57].Various underlying factors have been suggested to be contribute to the discrepancy of results, including differences in nAb titers, inpatient characteristics, and the timing of CCP transfusion [58].Furthermore, differences in study design among RCTs conducted during the four-year pandemic period may have been another reason.In particular, heterogeneity in sample size estimation, which is a parameter of power analysis closely related to the treatment effect and is crucial for determining the success or failure of a trial, is likely to have played a critical role in suboptimal study designs [59,60].
In this study, we systematically investigated the sample size calculations of published RCTs evaluating CCP treatment in hospitalized COVID-19 patients.

Material and Methods
The aim of this systematic review was to evaluate sample size calculation and its possible influence on study results in RCTs conducted on using CCP treatment in patients hospitalized for COVID-19.A literature search of the PubMed (through Medline), EMBASE, Cochrane central, medRxiv, and bioRxiv databases was carried out between 1 January 2020 and 31 March 2024, using the English language as a filter.The Medical Subject Heading (MeSH) and search queries used were as follows: "("COVID-19" OR "SARS-CoV-2" OR "coronavirus disease 2019") AND ("convalescent plasma" OR "immune plasma" or "hyperimmune plasma") AND ("randomized trial" OR "RCT")".We also screened the reference list of all the retrieved studies and review articles for additional studies not captured in our initial literature search.Finally, a PRISMA flowchart of the literature reviewing process was produced, and it is shown in Figure 1.
Only RCTs that enrolled patients hospitalized with COVID-19 of any disease severity and treated with CCP were included in this systematic review.The exclusion criteria were outpatient setting and the absence of a sample size calculation (sample size estimation was retrieved from the 'Method' section of each published study and/or from the study design of the registered protocol).The CCP treatment (intervention) was compared with any controls (i.e., standard treatment, placebo, CCP, or standard plasma).The following parameters were extracted from each study (Table 1 and Supplementary Table S1): study design, date of initial recruitment of patients, sample size estimation (expected size effect, statistical power, and significance level), number of expected and number of actually enrolled cases and controls, early termination of the study (and cause of study termination), primary outcome, and 28-day mortality rate (when present, reported as primary or secondary) in the CCP and control arms.When possible, the expected size effect was calculated as absolute or relative difference to render the results from different studies homogeneous and comparable.The articles underwent an independent evaluation for inclusion by two assessors (M.F. and D.F.), and disagreements were resolved by a third senior assessor (C.M.).
Within-trial Risk of Bias (ROB) was assessed using the Cochrane ROB tool for RCTs.The Cochrane ROB tool for RCTs addresses six specific domains: sequence generation, allocation concealment, blinding, incomplete data, selective outcome reporting, and other issues relating to bias [61].The protocol was registered on PROSPERO (registration number: CRD42024537859).
Regarding statistical analysis, categorical variables were compared using a Chi-square test and presented as frequency and percentages, while continuous variables were compared with an independent t-test and paired t-test and presented as the mean ± standard deviation (SD).A p value less than 0.05 is considered statistically significant.
Note: RCTs, randomized controlled trials.Only RCTs that enrolled patients hospitalized with COVID-19 of any disease severity and treated with CCP were included in this systematic review.The exclusion criteria were outpatient setting and the absence of a sample size calculation (sample size estimation was retrieved from the 'Method' section of each published study and/or from the study design of the registered protocol).The CCP treatment (intervention) was compared with any controls (i.e., standard treatment, placebo, CCP, or standard plasma).The following parameters were extracted from each study (Table 1 and Supplementary Table S1): study design, date of initial recruitment of patients, sample size estimation (expected size effect, statistical power, and significance level), number of expected and number of actually enrolled cases and controls, early termination of the study (and cause of study termination), primary outcome, and 28-day mortality rate (when present, reported as primary or secondary) in the CCP and control arms.When possible, the expected size effect was calculated as absolute or relative difference to render the results from different studies homogeneous   1 The size effect was reported as absolute difference (AD), relative difference (RD), or as an odds ratio (OR) in the primary outcome between cases and controls.

Results
A total of 247 studies were initially identified after querying electronic databases and manual searching.After the removal of 23 duplicates, we screened the titles and abstracts of 224 studies.After the exclusion of 130 records, 94 full-text articles were identified and assessed for eligibility, resulting in the selection of 48 RCTs.Finally, after the exclusion of 16 RCTs (see Supplementary Table S1 for the reasons for their exclusion), 32 RCTs [7,8,10,[12][13][14][15][16][17][18][20][21][22]25,26,28,[30][31][32]34,35,[37][38][39]41,43,[45][46][47][48][49]51] were included in the systematic review.The study selection process is summarized in the PRISMA flow diagram in Figure 1.The main characteristics of the studies included in the systematic review are summarized in Table 1.All studies included in this analysis involved patients hospitalized for COVID-19 of various degrees of severity, with the exception of two RCTs [16,47] which recruited critically ill patients admitted to an intensive care unit (ICU).All but two RCTs [31,49] began recruitment in 2020, during the first or second pandemic wave.In 10 of the 32 RCTs (31.2%), the number of cases/controls enrolled was lower than that planned by the study design.The reasons reported by the authors for early study termination were futility at interim analysis (three RCTs) [13,14,43], the EUA from FDA (one RCT) [10], the presence of high-titer nAbs in recipients at admission before CCP transfusion (one RCT) [15], vaccination coverage and the availability of anti-SARS-CoV-2 monoclonal antibodies (mAbs) (one RCT) [49], and slow recruitment due to the trial taking place during the interpandemic period (four RCTs) [21,26,31,32].For 19 of the 32 (59.4%) selected RCTs, the primary outcome also included mortality rate, with the primary endpoint being reached in 6 studies (18.8%).The 28-day mortality rates differed widely among the studies, with the highest rate being recorded in the CONFIDENT [16] (35.4% in CCP group and 45.0% in control group) and COPLA-II [20] (53.2% in CCP group and 46.8% in control group) trials and the lowest rate being recorded in the ConPlas [17] (3.9% in CCP group and 8.2% in control group) and TSUNAMI [51] (6.1% in CCP group and 7.9% in control group) studies.Two studies [21,49] did not report deaths in either the treatment or control arm.Among the 28 RCTs reporting deaths as a primary or secondary outcome, two RCTs [16,34] (7.1%) reported a 28-day mortality rate significantly lower in CCP-treated patients than controls.Regarding sample size estimation, the great majority of RCTs were designed with a statistical power of 80% and a 5% level of significance.Wide inter-study variation in the expected relative difference in the primary outcome between the CCP treatment and control groups was observed, ranging from 25% to 50%.This heterogeneity among the RCTs was also evident when the expected absolute improvement in the primary outcome of CCP-treated patients was considered, ranging from 15% to 50%.
Regarding ROB analysis in included studies, we assessed seven studies with a low risk of bias in all the items considered (see Supplementary Figure S2).The remaining 25 studies were judged to have a high or unclear risk of bias for one or more domains.Nearly 70% of studies were open-label studies; hence, they were at risk of performance or detection bias (in studies with unmasked evaluators).

Discussion
Since the publication of the first RCTs on the use of CCP, it has been evident that the clinical effect of CCP depends on several factors, with the most important one being the phase of the viral infection (the earlier the plasma is transfused, the more effective the CCP treatment is) and its content in nAbs (the more nAbs there are in the CCP, the more effective it is).The latter factor matches the serologic status of CCP recipients: patients with a reduced or absent antibody response against SARS-CoV-2, such as immunocompromised patients, respond better to high-titer CCP therapy [55].In addition to the timing and dosing of CCP, there are other key determining factors of CCP effectiveness, among which the study design must be mentioned, in particular the sample size estimation [62,63].Sample size calculation is an essential component of a study protocol.The ex ante determination of the minimum number of observations that have to be recorded is essential in order to detect a supposed treatment effect, and thus, it is closely related to the success or the failure of the trial [64].In turn, the calculation of the sample size of a new trial depends on the expected effectiveness of the treatment compared to the control.The greater the difference observed, the smaller the number of events to be collected.In other words, the sample size needed to assess the treatment effectiveness is higher when the real treatment effect is lower (Supplementary Table S2 and Supplementary Figure S1).Generally, RCTs with a small sample size are easy to conduct and economically sustainable, particularly for independent, non-sponsored trials, and, thus, they are usually preferred over those with a very large sample size, which are expensive and time consuming.By contrast, smallsize RCTs are prone to having low statistical power and promote misleading inferences, while large-size RCTs are generally far more efficient in producing consistent evidence [65].The a priori estimation of a given difference in the efficacy between an intervention and comparator during the planning of a study design is usually based on previous trials on the same topic, but, unfortunately, this is not possible for a new disease.As COVID-19 was a new illness, the investigators could not design RCTs using prior experience (the majority of the RCTs started concomitantly in 2020 during the first or second pandemic wave), and thus, they utilized results regarding CCP efficacy from trials conducted in previous coronavirus epidemics, such as the SARS and MERS epidemics, or results from uncontrolled SARS-CoV-2 studies.Thus, as shown by the analyses of the sample sizes of 32 RCTs, many estimated a 30 to 50% a priori relative reduction (or improvement) in the primary outcome by the intervention (CCP) to calculate the number of hospitalized COVID-19 patients to enroll.This approach was, however, wrong for at least two reasons: first, it became evident immediately after the outbreak of the pandemic that the COVID-19 pandemic had a different degree of severity compared to the two previous coronavirus epidemics.Furthermore, studies conducted (and published) in the early phase of the COVID-19 pandemic clearly showed that a "Lazarus effect" was not possible with either CCP or other antibody-based or small-molecule antivirals, meaning that a drug could be considered effective if it led to an improvement in the primary outcome in a range between 10% and 20% [57].In fact, the closest historical account of the use of CCP in a pandemic was the use of convalescent serum in the 1918 influenza pandemic, where a favorable size effect of 20% was estimated from retrospective analysis [66].A posteriori, this was also confirmed to be true for CCP, as documented by the most recent literature review, which showed an overall 13% reduced risk of mortality of CCP compared with standard-of-care treatment or placebo in hospitalized COVID-19 patients [56].This issue was further complicated by the wide heterogeneity in the methodologies used by the different RCTs for sample size estimation (relative or absolute difference and odds ratios), hampering among-study comparisons.
In addition to this methodological heterogeneity, COVID-19 presented challenges that are not usually encountered in RCTs, including the fact that the efficacy of antibody therapy varied with length of illness and the rapid progression of the underlying disease.For example, the efficacy of CCP in preventing the progression of disease to hospitalization exceeds that of monoclonal antibodies when given in the first 5 days of infection [54], but CCP has little or no efficacy when administered after the third day of hospitalization [4], whereas the overall survival benefit associated with CCP, considering over thirty RCTs, was 13%, and the estimated efficacy in reducing mortality when administered in the first three days of hospitalization using high-titer plasma was 37% [56].Given that the efficacy of CCP diminishes rapidly with time, the inevitable delays associated with enrollment, randomization, and CCP administration in RCTs served to further reduce the likelihood of finding a favorable effect.Hence, the rapid progression of COVID-19, combined with the reduced efficacy of CCP as a function of time, significantly increased the heterogeneity of patients enrolled in RCTs, which further reduced the likelihood of finding statistically significant effects given the sample sizes studied.
Approximately one third of the RCTs evaluated in the present systematic review were terminated prematurely for a series of reasons, but no company-sponsored RCTs of antibody-based or small molecule antivirals were terminated prematurely.Among the reasons for premature termination, the most relevant reason is problems regarding patient enrolment between the pandemic waves or because of the EUA of CCP granted by the FDA, both of which deprived the US-based RCTs on CCP of many patients.Despite this, it is noteworthy that at least two RCTs including hospitalized patients were completed in the USA [18,36], showing that it was possible to test CCP under EUA.It is noteworthy that many CCP RCTs did not have commercial sponsors and, thus, in terms of patient recruitment, they did not have the financial incentives that are often associated with pharmaceutical trials.Likewise, the cessation of some RCTs at interim analysis for futility could have been due to the vicious circle created by the discrepancy between the virtual (expected) and the real (observed) treatment effect.Adding to these hurdles, the enrolment of patients in CCP trials often made them ineligible to participate in other RCTs for other COVID-19 therapies supported by pharmaceutical companies that usually provide payment for each participant, thus creating further disincentives for continued recruitment in CCP RCTs during the highs and lows of the SARS-CoV-2 waves.
Therefore, such RCTs could not have sufficient statistical power for two reasons, i.e., reduced enrolment in the context of an already reduced sample size that had been calculated by overestimating the CCP treatment effect.In support of the latter argument, we observed that the expected absolute difference between intervention and comparator in the primary outcome was significantly lower in favorable versus unfavorable studies.Furthermore, it is noteworthy that an industry-led double-blinded phase 3 RCT testing the efficacy of mAb tixagevimab-cilgavimab versus placebo in hospitalized COVID-19 patients showed benefits for the mAb combination, in which the design assumed a 20% effect (power 90%, 5% significance level) for sample size estimation, and the expected recruitment target was fully achieved [67].
In conclusion, the results of our systematic review, which was performed on 32 RCTs, clearly underline the important role of sample size calculation in the design of different studies evaluating CCP efficacy in hospitalized COVID-19 patients.From our systematic analysis of the literature on this topic, we have found that the sample size estimate is a key determinant of treatment effectiveness.Indeed, it is a fact that most RCTs lack statistical power due to an overlooked size effect that could negatively influence their results and restrict them from reaching enough events, in both cases and controls, to correctly evaluate the effect of CCP treatment.But what would have been the results of these studies if they had set their size effect at a lower (below 20%), and more realistic, difference level?At this time, it is not possible (and not methodologically correct) to try to answer to this question ex post.Perhaps, however, if this had happened, the history of the results of many RCTs and of the clinical use of CCP worldwide during the pandemic would have turned out differently.

Figure 1 .
Figure 1.Flow chart of study inclusion process.

Figure 1 .
Figure 1.Flow chart of study inclusion process.

Table 1 .
Characteristics of the 32 RCTs included in the analysis.