The Predictive Performance of Risk Scores for the Outcome of COVID-19 in a 2-Year Swiss Cohort

Various scoring systems are available for COVID-19 risk stratification. This study aimed to validate their performance in predicting severe COVID-19 course in a large, heterogeneous Swiss cohort. Scores like the National Early Warning Score (NEWS), CURB-65, 4C mortality score (4C), Spanish Society of Infectious Diseases and Clinical Microbiology score (COVID-SEIMC), and COVID Intubation Risk Score (COVID-IRS) were assessed in patients hospitalized for COVID-19 in 2020 and 2021. Predictive accuracy for severe course (defined as all-cause in-hospital death or invasive mechanical ventilation (IMV)) was evaluated using receiver operating characteristic curves and the area under the curve (AUC). The new ‘COVID-COMBI’ score, combining parameters from the top two scores, was also validated. This study included 1,051 patients (mean age 65 years, 60% male), with 162 (15%) experiencing severe course. Among the established scores, 4C had the best accuracy for predicting severe course (AUC 0.76), followed by COVID-IRS (AUC 0.72). COVID-COMBI showed significantly higher accuracy than all established scores (AUC 0.79, p = 0.001). For predicting in-hospital death, 4C performed best (AUC 0.83), and, for IMV, COVID-IRS performed best (AUC 0.78). The 4C and COVID-IRS scores were robust predictors of severe COVID-19 course, while the new COVID-COMBI showed significantly improved accuracy but requires further validation.


Introduction
COVID-19, caused by the virus SARS-CoV-2 and primarily affecting the respiratory system, caused a global pandemic, with its origin in Wuhan, China [1].The World Health Organization declared a public health emergency of international concern in January 2020, which lasted until May 2023 [2][3][4].While, nowadays, most infected individuals experience asymptomatic or mild to moderate COVID-19, courses with the need for hospitalization are still prevalent and severity ranges from the need for simple oxygen supplementation to Acute Respiratory Distress Syndrome (ARDS), respiratory failure, and death [5,6].The severe course of COVID-19 requires immediate medical attention and intensive care management [5].In addition to pharmacological therapy, effective management includes supportive care and mechanical ventilation [7].Despite the advances in treatment options, changes in the virulence of SARS-CoV-2 variants, and the comprehensive availability of vaccines, recently reported in-hospital mortality for COVID-19 still ranges from 6% to 14%, depending on the population and virus variant studied [8,9].
The listed prediction models were either not specifically designed to predict the outcome of COVID-19 or were built based on data from early 2020, when COVID-19 treatment options were limited and vaccinations were not yet available [22][23][24].Furthermore, data used for the development of these risk scores were mostly from small cohorts of 100 to 200 patients.External validation and comparison of the established scores in a large, heterogeneous Swiss cohort are still missing to date.
When aiming to predict a severe course of COVID-19, it is important to take into account that the implementation of invasive mechanical ventilation is always an individual decision that is not dependent only on objective indication.A high proportion of patients or their relatives do explicitly decide against such an invasive step, regardless of the acuteness of their respiratory situation [27].Thus, when mechanical ventilation is clinically indicated, individual patients' beliefs and circumstances play an important role in the respective outcome.Consequently, 'in-hospital death' and 'invasive mechanical ventilation' should not be separated in terms of severe outcomes because both could indicate the same level of severity.
Even though the SARS-CoV-2 pandemic has strongly subsided, the identification of risk factors and the prediction of a severe course remain important.The early identification of individuals at high risk can prevent the progression of COVID-19 to ARDS by allowing physicians to initiate prompt interventions and appropriate treatment strategies.Additionally, the prediction of a severe course can provide valuable prognostic information for healthcare providers, affected patients, and their relatives.Finally, the identification of risk factors for a severe course of COVID-19 helps define suitable eligibility criteria for clinical trials further investigating preventive and therapeutic options for COVID-19.

Objectives
This study aimed to externally validate different available scoring systems in a large, heterogeneous Swiss cohort and compare their predictive accuracy for a severe course of COVID-19.A secondary aim was the validation of a prediction model that combines parameters used in the two best-performing scores.

Study Design and Setting
This project was a retrospective, observational, single-center study.Adult patients who were hospitalized for COVID-19 for at least one night at the Cantonal Hospital Baselland, Switzerland (KSBL), between March 2020 and December 2021 and fulfilling the eligibility criteria (see Section 2.2) were included in this study.The KSBL is a public teaching hospital providing medical care for a population of approximately 250,000.

Study Population
Adult patients (18 years or older) who were hospitalized for COVID-19 as their main diagnosis for at least one night at the KSBL were eligible for inclusion in this study.Patients who were transferred to the KSBL from another acute care hospital or declined the hospital's general research consent were excluded.In cases where a patient was hospitalized multiple times for COVID-19 within the given period, only the first hospitalization was included in the data collection to avoid bias.

Outcomes and Scores
The primary outcome was a severe course of COVID-19, defined as the composite of either in-hospital death or invasive mechanical ventilation.The outcomes of secondary interest were the individual endpoints in-hospital death and invasive mechanical ventilation.In-hospital death was defined as all-cause death during the hospitalization of interest at KSBL or during an immediate subsequent hospitalization in the case of transfer to a different acute care institution.Invasive mechanical ventilation was defined as endotracheal intubation, tracheostomy, or extracorporeal membrane oxygenation during the hospitalization of interest at KSBL or during an immediate subsequent hospitalization in the case of transfer to a different acute care institution.
The National Early Warning Score (NEWS), CURB-65 score (CURB-65), 4C mortality score (4C), Spanish Society of Infectious Diseases and Clinical Microbiology score (SEIMC), and COVID Intubation Risk Score (COVID-IRS) at the time of admission were retrospectively calculated for 1051 patients consecutively hospitalized for COVID-19 at KSBL in 2020 and 2021.Table 1 provides an overview of the scores that were compared in this study, together with their original purpose and the parameters used.Their respective formulas for calculation can be found in Appendix A, Tables A1-A6.
The above-mentioned scores were designed to predict either in-hospital death or mechanical ventilation.Since the severe course was a composite of these two outcomes, we defined a new score 'COVID-COMBI' as a combination of the best-performing scores for each outcome: 4C for in-hospital death and COVID-IRS for mechanical ventilation.The COVID-COMBI represents a plain summation of the two scores, except that the only common parameter, 'respiratory rate', was only weighted once, according to the rule in the COVID-IRS.The COVID-COMBI ranges between 0 and 32, with higher scores indicating a higher risk of a severe course.Table 2 presents the formula for the calculation of COVID-COMBI.Neutrophil-lymphocyte ratio <4 4-7.9 8-13.9 ≥14 2.5 a The number of comorbidities out of chronic cardiac disease, chronic respiratory disease (excluding asthma), chronic renal disease (estimated glomerular filtration rate ≤30), mild to severe liver disease, dementia, chronic neurological conditions, connective tissue disease, diabetes mellitus (diet-, tablet-, or insulin-controlled), HIV or AIDS, and malignancy.Abbreviations: SpO 2 : peripheral oxygen saturation.GCS: Glasgow Coma Scale.FiO 2 : fraction of inspired oxygen.

Data Collection and Management
After the verification of eligibility, patient outcomes and parameters for score calculations were manually extracted from the electronic health records.These included discharge reports, nursing documentation, emergency reports, intensive care unit (ICU) reports, laboratory records, and radiology findings.Data were entered into a REDCap ® (Research Electronic Data Capture) database.
Scores were calculated with parameters available at the time of admission or for laboratory values up to 24 h after admission.Specifically, relevant vital signs and symptoms and records of mental status were taken from the emergency department (ED) documen-tation (first documented in-house measurement).FiO 2 was estimated from the oxygen supplementation flow rate by means of the Vincent formula [28,29].For relevant laboratory values and radiological findings, the first in-house result up to 24 h after admission was used for the score calculations, when indicated.Information about relevant comorbidities and patient history was taken from documented anamnesis and previously documented diagnosis lists.
The data presented in this study are not publicly available due to restrictions in data privacy but are available upon reasonable request from the corresponding author.

Statistical Analysis
Patient data were analyzed descriptively and presented as absolute and relative frequencies or median and interquartile ranges (IQRs).For the score calculation, variables with missing values were imputed using the k-nearest neighbor algorithm [30].As a sensitivity analysis, all validations were additionally performed on the original, nonimputed dataset.
Predictive performance for each outcome was compared by means of receiver operating characteristic (ROC) curves and the respective area under the curve (AUC) with 95% confidence intervals (95% CI).The resulting ROC curves were compared pairwise with a Z-test following DeLong's method [31].All reported p-values were two-sided at a significance level of 0.05.Data imputation and analysis were performed with R version 4.1.0using the packages 'bnstruct', 'rms', and 'pROC' [32].

Ethical Considerations
This study was approved by the ethics committee of Northwestern and Central Switzerland (ENKZ, BASEC Project-ID 2022-01636, approved on 22 September 2022).Patients who declined consent to the further use of their clinical routine data for research purposes (general research consent) were excluded from this study.

Patient Characteristics and Outcomes
Between March 2020 and December 2021, 1274 patients were hospitalized with a main diagnosis of COVID-19 for at least one night at the KSBL.The first hospitalization for COVID-19 at the KSBL occurred on 3 March 2020.After excluding 150 patients who denied consent for the use of their data for research purposes and 73 patients who were admitted from another hospital, 1051 patients were included in this study.
Table 3 summarizes the characteristics and outcomes of the included patients.The median age was 65 years, ranging from 19 to 99 years (IQR: (54, 79)), and 59.7% of the patients were male (n = 627).The most prevalent comorbidity was arterial hypertension (45.8%, n = 481), followed by obesity (31.0%, n = 286), diabetes (23.0%, n = 242), and chronic kidney disease (19.5%, n = 205).The majority of the included patients were hospitalized at a stage when vaccination was not yet available (68.8%, n = 615).This included patients who were hospitalized when vaccines were already approved in Switzerland but not yet recommended for their respective age and risk group.After the comprehensive availability of vaccines, almost two-thirds of the patients were still not vaccinated upon admission (63.8%, 178 out of 279).The most common COVID-19 symptom upon admission was coughing (68.5%, n = 715), while 46.2% were suffering from dyspnea (n = 483).New onset of confusion was rare (3.5%, n = 37).While median heart rate and body temperature were within the physiological range at presentation, 38.4% of patients were febrile with a body temperature ≥ 38 • C (n = 397).Median systolic blood pressure was slightly increased at 134 mmHg.The majority of patients (60.3%, n = 634) presented with a peripheral oxygen saturation (SpO 2 ) ≥ 92% at room air.The remaining 39.7% (n = 417) either presented with a SpO 2 < 92% at room air or were already supplemented with oxygen by the paramedics.White blood cell count was predominantly within the normal range upon admission (median 6.1, IQR (4.5, 8.0)), but the majority of patients presented with an elevated NLR (median 5.1, IQR (3.1, 8.5)), indicating high levels of inflammation.Accordingly, elevated CRP, urea, and LDH values were common.The majority of patients presented with slightly reduced kidney function, with an eGFR < 90 mL/min/1.73m 2 (69.9%, n = 719 out of 1029).
Out of the 1051 included patients, 162 patients experienced a severe course (15.4%).In total, 112 patients died (10.7%) and 74 patients were mechanically ventilated (7.0%) during their hospitalization.Out of the mechanically ventilated patients, 67.6% were discharged alive (n = 50).Table 4 summarizes the calculated scores of the overall population at admission and by outcome, respectively.Patients who suffered from a severe course and those who died in-hospital scored higher for all established scores, except qSOFA, compared to the overall population.Patients who required invasive mechanical ventilation scored higher than the overall population for NEWS, 4C, and COVID-IRS scores but not qSOFA, CURB-65, and COVID-SEIMC scores.qSOFA scores did not differ amongst the overall population and the sub-groups of patients with adverse outcomes.In the new COVID-COMBI score, patients with any of the three adverse outcomes presented with higher values than the overall population.

Prediction of Severe Course, In-Hospital Death, and Invasive Mechanical Ventilation
Predictive accuracy was assessed on the imputed dataset.Figure 1 presents the ROC curves of all analyzed scores for the prediction of the respective outcomes.From the established scores, 4C had the best accuracy in predicting severe course (AUC 0.76, 95% CI (0.72, 0.79)), followed by the COVID-IRS (AUC 0.72, 95% CI: (0.67, 0.76)).The new COVID-COMBI score showed significantly better accuracy to predict severe course, with an AUC 0.79 (95% CI: (0.75, 0.82), p = 0.001, see Figure 1a).
The predictive accuracy of the qSOFA score was poor for all analyzed outcomes (all AUC < 0.7).
The predictive accuracy of the qSOFA score was poor for all analyzed outcomes (all AUC < 0.7).

Discussion
This study evaluated the accuracy of six scores-qSOFA, NEWS, CURB-65, 4C, COVID-SEIMC, and COVID-IRS-in predicting severe course, in-hospital death, and invasive mechanical ventilation for hospitalized COVID-19 patients.Additionally, a newly developed composite score, COVID-COMBI, which combined the parameters of the two best-performing scores (4C and COVID-IRS), was validated.Its predictive accuracy was compared with that of the established scores.Our study has three main findings: 1.
The 4C and COVID-IRS both showed good accuracy for the prediction of severe course.

2.
The new COVID-COMBI score showed significantly better performance than all other established scores in predicting severe course.

3.
The new COVID-COMBI score showed good accuracy for the prediction of in-hospital death and invasive mechanical ventilation.

Predictive Accuracy of Established Scores
The population studied in this project was a large, heterogeneous cohort, with patients hospitalized for COVID-19 in Switzerland within a two-year period.The 4C mortality score, COVID-SEIMC, and COVID-IRS are specifically designed for COVID-19 risk stratification.Hence, it is not surprising that they performed well in the prediction of the outcomes they were designed for.In the prediction of in-hospital death, the 4C reached an excellent AUC of 0.83 in our cohort, followed by the COVID-SEIMC, with an AUC of 0.80.Previous studies reported similar AUCs between 0.79 and 0.85 for the 4C [24,33,34] and between 0.75 and 0.85 for the COVID-SEIMC [23,33,35].The good predictive accuracies of 4C and COVID-SEIMC confirm older age, male sex, and comorbidities as risk factors for COVID-19-related death.However, indicators of the acuteness of the situation such as SpO 2 and laboratory parameters urea, CRP, LDH, and NLR also seem to play an important role.
The COVID-IRS, on the other hand, reached a reasonable AUC of 0.78 for the prediction of invasive mechanical ventilation in our cohort, which is a lower predictive accuracy than that reported in previous studies.In an internal validation by Garcia-Gordillo et al., the COVID-IRS reached an AUC of 0.88 [22].An external validation on a cohort of 285 patients hospitalized in Taiwan in May and June 2021 reported an AUC of 0.82 [35].The higher AUC in the Taiwanese cohort could be explained by the fact that patients with a 'do-not-intubate' order were excluded from the study [35].However, in our study, COVID-IRS still reached the highest predictive accuracy for invasive mechanical ventilation amongst the analyzed scores, followed by NEWS.This result highlights the strong predictive value of vital signs indicating acute respiratory status (high respiratory rate, low peripheral oxygen saturation, or high SaO 2 /FiO 2 ratio) and biomarkers of inflammation (LDH, NLR) for intubation, which has been previously reported [36][37][38][39][40].
Overall, in view of the different setting and the heterogeneity of our cohort, these results indicate the 4C's and COVID-IRS's robustness against changing virulence, treatment options, and vaccination status.Notably, the NEWS score showed moderate predictive accuracy for invasive mechanical ventilation (AUC 0.75), aligning with its intended purpose of detecting early clinical deterioration and previous results [6,17,41].
The six different scores were designed for different populations and different endpoints; hence, their direct comparison is tentative.However, previous studies reported good predictive accuracy of scores for COVID-19 outcomes they were not designed for [16,18,20].The results of this study confirm that scores such as the NEWS, although developed for a different outcome, feasibly compete with COVID-IRS and 4C in the prediction of mechanical ventilation and in-hospital mortality, respectively.Especially when seeking to predict the composite outcome 'severe course', it is pragmatic to search for predictive parameters within those that predict the respective single outcome.
When examining the composite endpoint 'severe course', the prediction accuracy of both 4C and COVID-IRS remained acceptable, with AUCs of 0.76 and 0.72, respectively.This result was rather unexpected since the two scores are based on a completely different set of parameters: While the 4C takes demographics (age, sex) and comorbidities into account, the COVID-IRS focuses solely on vital signs and laboratory values (Table 1).The only parameter used by both scores is the patient's respiratory rate upon admission.This observation suggested that a combination of these factors could further improve prediction accuracy.
Interestingly, the qSOFA did not effectively predict any of the adverse outcomes (AUC < 0.7 for all), which suggests limited utility in the COVID-19 patient population.This finding is consistent with previous studies, indicating that it may not be suitable for predicting severe outcomes in COVID-19 patients [33,41,42].

Predictive Accuracy of COVID-COMBI
The validation of COVID-COMBI (Table 2), combining elements of both the 4C and COVID-IRS, confirmed the above-stated hypothesis: The COVID-COMBI score reached significantly better prediction accuracy than all other assessed scores, with an AUC of 0.79.This result suggests that a severe course of COVID-19 is often influenced by multiple factors and, therefore, can only be reliably predicted by a combination of pre-existing risk factors, indicators of acute respiratory status, and inflammatory markers.Furthermore, the prediction accuracy of COVID-COMBI for in-hospital death with an AUC of 0.82 was just marginally worse than that of the 4C.On the other hand, in the prediction of mechanical ventilation, COVID-COMBI performed significantly worse than the COVID-IRS, even though it incorporates all parameters also used in the COVID-IRS.This finding may reflect the complexity of predicting mechanical ventilation, which depends not only on clinical severity but also on non-intubation decisions.As reported multiple times before, older patients and those with a high number of comorbidities are at high risk for a severe course of 43].At the same time, these patients are particularly likely not to be mechanically ventilated in case of respiratory deterioration, either due to a poor individual prognosis or their personal beliefs and life circumstances [44].This relationship may result in poor predictive power of age and number of comorbidities for mechanical ventilation or even a negative association.
With an AUC of 0.79, the accuracy of COVID-COMBI for the prediction of severe course can be classified as moderate to good.It is important to keep in mind that the COVID-COMBI is a plain combination of previously validated scoring systems.Predictive accuracy could possibly be further improved by including other reportedly strong predictors that are routinely available in the scoring system, such as COVID-19 vaccination status, immunosuppression, and obesity [45][46][47][48].

Limitations
Our study has several limitations.First, the single-center, retrospective design with a population of only 1051 patients may limit the generalizability of its results.The unavailability of certain parameters in the clinical routine data prevented us from calculating other relevant scores.Missing values in some parameters, despite being imputed, might have introduced some bias; however, sensitivity analyses of the non-imputed dataset yielded consistent findings.Finally, we did not stratify our cohort by virus variant, vaccination status, or pharmacological treatment due to the small size of the subsets.On an even larger cohort, this could potentially enhance the accuracy of the results.

Conclusions
Our study demonstrates that both the 4C mortality score and the COVID-IRS score have robust predictive accuracy for the prediction of severe course in patients hospitalized for COVID-19.Notably, the newly developed COVID-COMBI score outperformed both scores in the prediction of severe disease progression.Additionally, the COVID-COMBI score performed well in the prediction of in-hospital death and invasive mechanical ventilation.The findings from this study highlight the potential utility of the COVID-COMBI score in clinical practice.By integrating comprehensive risk factors and clinical routine parameters, the COVID-COMBI can help healthcare providers identify high-risk patients early, enabling timely interventions and potentially improving patient outcomes.The score's robust performance across various outcomes indicates its practical value in managing COVID-19 patients.Future research should focus on the prospective and multicentric external validation of COVID-COMBI to confirm its predictive accuracy in diverse healthcare settings.The number of comorbidities out of chronic cardiac disease, chronic respiratory disease (excluding asthma), chronic renal disease (estimated glomerular filtration rate ≤30), mild to severe liver disease, dementia, chronic neurological conditions, connective tissue disease, diabetes mellitus (diet-, tablet-, or insulin-controlled), HIV or AIDS, and malignancy.Abbreviations: brpm: breaths per minute.GCS: Glasgow Coma Scale.BUN: blood urea nitrogen.

Figure 1 .
Figure 1.Receiver operator characteristic curves for the prediction of (a) severe course, (b) in-hospital death, and (c) invasive mechanical ventilation.Abbreviations: AUC: area under the curve.qSOFA: quick Sequential Organ Failure Assessment.NEWS: National Early Warning Score.4C: 4C mortality score.SEIMC: Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica (Spanish Society of Infectious Diseases and Clinical Microbiology).IRS: intubation risk score.

Figure 1 .
Figure 1.Receiver operator characteristic curves for the prediction of (a) severe course, (b) in-hospital death, and (c) invasive mechanical ventilation.Abbreviations: AUC: area under the curve.qSOFA: quick Sequential Organ Failure Assessment.NEWS: National Early Warning Score.4C: 4C mortality score.SEIMC: Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica (Spanish Society of Infectious Diseases and Clinical Microbiology).IRS: intubation risk score.

Table 2 .
New COVID-COMBI score for the risk stratification of severe course in patients hospitalized for COVID-19.For score calculation, all points are added up.

Table 4 .
Scores at admission, overall and by patient outcome.Median and interquartile range.
a ≤