3.1. Description of the Included Studies
Our search retrieved 745 interventional studies. After excluding diagnostic, prognostic, preventive studies, phase 1 trials and those not directly focusing on the management of COVID-19, we selected 415 studies for inclusion in this systematic survey, including 178 phase 2, and 237 later phase RCTs (Figure A1
, Table A1
Most of the included trials are conducted by academic investigators (75.7%) and only one in four is sponsored by the pharmaceutical industry. The planned recruitment ranges between 7 and 12,000 participants (median: 160, interquartile range [IQR]: 67–400). Most trials include two intervention arms (74.8%), but one in four evaluates more than two, and up to 19 interventions. Moreover, 79.8% of the trials are conducted in a hospital setting, including 6.5% conducted in the intensive care unit (ICU), while 15.2% are conducted in the community. Descriptions of disease severity are heterogeneous, with the recruitment setting being the most consistent measure of disease. Details on the characteristics of the included studies are available in Table 1
Overall, 3948 unique outcomes are evaluated in the included studies, including 1691 from phase 2 trials and 2257 from later phase trials. We identified 25 generic outcome categories (Table 2
). Similar number of outcomes are evaluated in phase 2 (median: 8.5, IQR: 5–13) and later phase (median: 7, IQR: 4–11) trials (Figure A2
and Figure A3
). Mortality and adverse events, the most frequently assessed outcomes, are only assessed in 64.6% and 48.4% of all trials, respectively. All remaining outcomes are evaluated in less than half of the trials, highlighting an important heterogeneity in outcomes selection (Table 3
and Table 4
). Treatment success or failure is only evaluated in 41.6% of phase 2 trials and 44.1% of the later phase trials. Interestingly, the frequency that different outcomes are evaluated as outcomes or as primary outcomes are very similar for phase 2 and later phase trials.
The most frequently reported outcomes among studies conducted in a community setting (thus recruiting less severely ill patients), were viral detection or load (55.6%), need for hospital admission (50.8%), and symptoms (49.2%). In contrast, the most frequently evaluated outcomes in studies recruiting patients with more severe COVID-19, were mortality and adverse events, which were evaluated in 71.6%, and 50.3% of studies recruiting hospitalized patients, and in 88.9% and 66.7% of those recruiting critically ill patients, respectively.
3.2. Outcome Measurement Instruments
3.2.1. Mortality/Survival (Assessed by 284 Outcomes)
All-cause mortality is evaluated in all but six trials measuring mortality. When mortality was not further described, we presumed it referred to all-cause mortality. Time to death is assessed in 16 trials, and cause-specific mortality in six, mainly focusing on SARS-CoV2 mortality, but also including mortality due to pulmonary or cardiovascular complications.
3.2.2. Clinical Outcomes
1. (Time to) Treatment success or treatment failure: Treatment success or the time to treatment success was evaluated by 220 outcomes. Ordinal scales describing different levels of COVID-19 severity are used for assessing treatment success in 113 (51.4%) of these outcomes. Most scales are very similar to the most frequently used WHO scale, which is a 9-point ordinal scale (from 0 to 8), with each point describing a worse clinical status (Table 5
]. Treatment success is defined as an improvement in ordinal scales such as the WHO clinical progression scale by 2 points or 1 point in 57.5% and 24.8% of all outcomes using the scale to evaluate treatment success, while in the remaining outcomes, no specific threshold is provided. Complete resolution of the symptoms and signs of COVID-19 (clinical recovery) is used as a measure of treatment success in 51/220 (23.2%) outcomes and clinical improvement in 38/220 (17.3%) outcomes. The definition of complete resolution varies. Often, no further information is provided. In the remaining cases, it is defined as a composite outcome including several of the following components: complete resolution of breathlessness, tachypnoea, hypoxia, desaturation, cough, anosmia, myalgia, fever, or of oxygen requirements; a negative COVID-19 PCR; hospital discharge; or radiological resolution. A definition of clinical improvement as an outcome is also frequently lacking. In the remaining cases, it is defined as an improvement in several of the previously listed components. Improvement is either based on prespecified thresholds, or on a subjective clinicians’ judgement. Finally, 14 outcomes (6.4%), use specific thresholds (0, ≤2 or ≤4) of the National Early Warning Score (NEWS or NEWS-2) to define treatment success.
Treatment failure, or time to treatment failure is evaluated by 76 outcomes. In most cases (40/76, 52.6%), treatment failure is defined as a composite outcome consisting of several components with clear thresholds, such as: death, need for ICU admission, need for invasive ventilation, need for other organ support (e.g., vasopressors or renal replacement therapy), need for non-invasive ventilation (NIV), need for supplemental oxygen, a deterioration in oxygenation, need for hospital admission or re-admission or emergency visit, ventricular tachyarrhythmia. Ordinal clinical severity scales such as the WHO scale are used to define treatment failure in 16/76 (21.1%) outcomes, while the need for rescue therapy is used in 9/76 (11.8%) outcomes. The remaining 11 (14.5%) outcomes do not provide specific criteria and/or state treatment failure will be based on the clinician’s judgement of deterioration in the clinical condition of the patient.
2. Severity scores: Standardized scores are used to evaluate disease severity and progression in 277 outcomes. Ordinal disease severity scales (such as the WHO scale) are the most frequently used scores (144/277 outcomes, 51.2%), followed by the Sequential Organ Failure Assessment (SOFA) Score [19
], a validated score for describing the severity of organ dysfunction (54/277 outcomes, 19.5%), and the NEWS score [20
]. Acute Physiology and Chronic Health Evaluation II (APACHE II, 5/277), clinical sign score (5/277), Pneumonia Severity Index (PSI, 3/277), BRESCIA-COVID, Murray score, Sepsis Induced Coagulopathy, Small Identification Test, SMART-COP score, and the Vienna Vaccine Safety Initiative (ViVI) disease severity score are used less often.
3. Symptoms: 188 outcomes focus on symptoms, which are either assessed using visual analogue scales, or validated instruments. Composite scores evaluating several symptoms, including breathlessness, cough, sputum production, pyrexia, anosmia, myalgia, headache, or gastrointestinal symptoms are used in 40 outcomes (21.3%). Four composite outcomes specifically assess respiratory symptoms (2.2%). Each of the remaining outcomes focus on a single symptom. These include fever (72/188, 38.3%), breathlessness (18, 9.6%), cough (12, 6.4%), and less often anxiety, depressive symptoms, anosmia, cognitive dysfunction, nausea, insomnia, or fatigue. In this category we also included the assessment of heart rate (8, 4.3%) or blood pressure (5, 2.7%).
3.2.3. Physiological Outcomes
1. Oxygenation (evaluated by 215 outcomes): Oxygenation is evaluated using the partial pressure of oxygen (PaO2), fraction of inspired oxygen (FiO2), oxygen saturation (SatO2), or respiratory rate. Oxygenation is often measured as the PaO2 or SatO2 corrected for FiO2 (95/215, 44.2%). In this category we also included measurements of the partial pressure of carbon dioxide (PaCO2) and pH, which are only rarely evaluated as outcomes.
2. Pulmonary function and physiology (28 outcomes): There is significant heterogeneity in this domain, with different outcomes evaluating peak flow rate, forced vital capacity (FVC), the ratio of forced expiratory volume in 1 second (FEV1) to FVC, vital capacity, diffusing capacity, lung compliance, and respiratory muscle function.
3. Viral detection and load (235 outcomes): The vast majority assess virologic clearance by a specific timepoint, or the time until virologic clearance. A small number of outcomes track changes in viral load over time, or differences in the viral detection and load when using different samples (nasal, nasopharyngeal, oropharyngeal swabs or sputum).
4. Viral antibodies: The development of antibodies against SARS-CoV2 is assessed in 31 outcomes. Evaluation of specific antibody types (IgA, IgG, or IgM) is only described in five trials.
5. Radiological outcomes (61 outcomes): Definitions of this outcome are inadequate. In most cases, it is broadly stated that the progression, regression, or resolution of the radiological findings are monitored. Details are only provided in six outcomes, which monitor the extent of the lesion as a proportion of the full lung volume, or perform lung densitometry. Development of fibrosis is evaluated in seven outcomes. Computed tomography (CT) is used in 21 (34.4%) outcomes, a chest X-ray (CXR) in 8 (13.1%), either a CT or a CXR in three, either CT or CXR or lung ultrasound in one and nuclear imaging in one outcome. The imaging modality used is not declared in the remaining 28 (45.9%) outcomes.
6. Inflammatory biomarkers (321 outcomes, each describing either a single or multiple biomarkers): The most frequently evaluated biomarkers are the total white cell count, neutrophils, lymphocytes, eosinophils, monocytes, c-reactive protein, interleukins 1, 6, and 8, followed by other interleukins, procalcitonin, tumour necrosis factors, complement components, lymphocytes subtypes, immunoglobulins, and other inflammatory biomarkers.
7. Other biomarkers: 309 outcomes evaluate either a single or multiple non-inflammatory biomarkers. Mostly, these are surrogates for safety or adverse events. The most frequently captured biomarkers are d-dimers, cardiac enzymes, kidney function, liver function, clotting, red blood cells and haemoglobin, followed by a variety of other molecules.
8. Pharmacokinetics/Pharmacodynamics: Here, we categorized 33 outcomes, mostly evaluating plasma drug concentrations (12/33, 36.4%), but also half-life, maximum/minimum observed concentration, time to reach the maximum/minimum observed concentration, area under the plasma concentration-time curve.
3.2.4. Adverse Events
Adverse events (448 outcomes): 108 (24.1%) outcomes evaluate any adverse event; either their frequency, or participants experiencing at least one adverse event. 80 (17.9%) outcomes specifically assess serious adverse events, as defined by the Common Terminology Criteria for Adverse Events (CTCAE). Nineteen (4.2%) outcomes focused on drug reactions, 14 (3.1%) on grade 3 or 4 adverse events, as defined by the CTCAE, and 22 (4.9%) assessed the rate of study drugs discontinuation due to adverse events or due to any reason. The remaining outcomes focused on specific adverse events, mostly cardiac (38, 10.3%), secondary infections (37, 10.0%), thrombotic or bleeding events (29, 8.1%), or local administration reactions (13, 3.6%)
3.2.5. Life Impact (13 Outcomes)
The EuroQol 5 Dimensions (EQ-5D) is used in four outcomes, followed by the Research and Development Corporation’s (RAND) 36-Item Health Survey (SF-36), which is used in three outcomes. Other instruments include the WHO Disability Assessment Schedule (WHODAS 2.0), the Control, Autonomy and Pleasure (CASP-19), and the Nottingham Health Profile.
3.2.6. Resources Use
1. Need for a (higher) level of care (352 outcomes): Need for hospital admission is evaluated by 68 outcomes (19.3%), need for hospital re-admission by 9 (2.6%), need for intensive care admission by 82 (23.4%), need for invasive ventilation by 167 (47.4%), and need for extracorporeal membrane oxygenation (ECMO) by 26 (7.4%; merged with the outcome need for ventilation in the tables). In studies conducted in the hospital setting, need for hospital admission at a specific follow-up timepoint, refers to the proportion of patients who remain inpatients at that timepoint. Similarly, for studies conducted in the ICU, and the need for ICU admission.
In this category, we also included composite outcomes consisting of one of the above outcomes and mortality (e.g., need for ICU admission or death), as these composite outcomes focus on the need for a higher level of care, while death is added to account for patients who decease before accessing the higher level of care, or those who are not eligible for higher level of care due to their baseline clinical status. Such approaches could be crucial to account for bias, especially in situations such as the COVID-19 pandemic, when hospitals and ICUs are over-burdened and not infrequently unable to accommodate a significant proportion of the patients, leading to the introduction of stricter criteria for triaging patients. Moreover, some outcomes in this category also evaluate time-to-higher level of care (e.g., time-to-hospital admission).
2. Duration of stay in a specific level of care (469 outcomes): Of those, 206 (43.9%) focus on the length of hospital stay, 96 (20.5%) on the length of ICU stay, and 167 (35.6%) on the duration of invasive ventilation. Delays in discharging patients who are medically optimized due to social or other reasons could introduce bias in the outcome length of hospital stay. To account for this issue, 11 outcomes are defined as the time to discharge or to a NEWS ≤2, maintained for 24 h and another outcome as the time until participants are deemed medically optimized for discharge by a clinician.
3. Need for supplemental oxygen or NIV: This category includes 105 outcomes evaluating the need for supplemental oxygen or NIV in any setting. Most evaluate the need for supplemental oxygen administration at specific follow-up timepoints; 34 (32.4%) outcomes assess the need for NIV (including continuous positive airway pressure [CPAP] or bilevel positive airway pressure [BiPAP]), and 21 (20.0%) the need for high-flow oxygen. One outcome evaluates the need for domiciliary oxygen after hospital discharge.
4. Duration of supplemental oxygen or NIV (95 outcomes): Twelve (12.6%) evaluate the duration of NIV, and seven (7.4%) evaluate the duration of high-flow oxygen.
5. Need for other organ support (other than invasive ventilation, 44 outcomes): 26 (59.1%) outcomes focus on the need for vasopressors, and 18 (40.9%) for renal replacement therapy.
6. Other outcomes: Here, we grouped 145 outcomes that could not be categorized in the previous categories and were evaluated in <10 RCTs each. Need for concurrent treatments is assessed in 22 outcomes, including 7 that specifically focus on the administration of antibiotics. Exercise capacity is assessed by 13 outcomes (mostly using the 6-minutes walking test), COVID-19 transmission by 9, resource requirements, and costs by 8 outcomes. Other outcomes include the use of prone positioning, ability to perform activities of daily living, incidence, and progression of cytokine storm syndrome, resilience, lost workdays, and discharge destinations.