Lung Ultrasound, Clinical and Analytic Scoring Systems as Prognostic Tools in SARS-CoV-2 Pneumonia: A Validating Cohort

At the moment, several COVID-19 scoring systems have been developed. It is necessary to determine which one better predicts a poor outcome of the disease. We conducted a single-center prospective cohort study to validate four COVID-19 prognosis scores in adult patients with confirmed infection at ward. These are National Early Warning Score (NEWS) 2, Lung Ultrasound Score (LUS), COVID-19 Worsening Score (COWS), and Spanish Society of Infectious Diseases and Clinical Microbiology score (SEIMC Score). Our outcomes were the combined variable “poor outcome” (non-invasive mechanical ventilation, intubation, intensive care unit admission, and death at 28 days) and death at 28 days. Scores were analysed using univariate logistic regression models, receiver operating characteristic curves, and areas under the curve. Eighty-one patients were included, from which 21 had a poor outcome, and 9 died. We found a statistically significant correlation between poor outcome and NEWS2, LUS > 15, and COWS. Death at 28 days was statistically correlated with NEWS2 and SEIMC Score although COWS also performs well. NEWS2, LUS, and COWS accurately predict poor outcome; and NEWS2, SEIMC Score, and COWS are useful for anticipating death at 28 days. Lung ultrasound is a diagnostic tool that should be included in COVID-19 patients evaluation.


Introduction
Pneumonia caused by SARS-CoV-2 (COVID-19) has been an unprecedented health disruption in recent decades. In December 2019, a lung disease outbreak due to a previously unknown pathogen was announced in the city of Wuhan, China. Since then, according to the Coronavirus Resource Center at Johns Hopkins University, as of 1 June 2021, more than 170 million cases and 3.5 million deaths have been reported all around the world [1]. Of these data, more than 50 million cases correspond to Europe and, in particular, more than 3.5 million to Spain [2]. This has resulted in 250,000 hospital admissions in Spain, from which 25,000 went to intensive care units [3].
One of the difficulties in assessment and management lies on knowing which patients will have a more unfavourable evolution. Acute respiratory distress syndrome (ARDS) is the worse consequence of SARS-CoV-2 inflammatory process in the lungs, needing frequent high-flow nasal oxygen therapy or noninvasive mechanical ventilation (NIMV), invasive mechanical ventilation (IMV), or extracorporeal membrane oxygenation (ECMO) in critical cases [4]. This leads to an overuse of intensive care unit resources, extended hospital stays, and a high mortality rate of this population.
Several scoring systems have tried to classify patients at admission [5][6][7][8][9]. Classifying patients has the utility of trying to predict which of them may have a worse outcome in order to provide them an earlier administration of ventilatory support or drugs that change the patient's prognosis. These scores are usually based on clinical and/or analytical data.
However, lung ultrasound has been reported to be a promising tool for the diagnosis and follow-up of pneumonia, interstitial diseases, and ARDS [10,11]. Lung damage originated from COVID-19 is typically patchy and subpleural, which makes COVID-19 specially interesting to be evaluated by lung ultrasound [12]. Its role in the pandemic has so far been secondary, but it has proven to be a useful tool for the stratification of severity and risk of death in these patients [13], as supported by a strong correlation between lung ultrasound and CT, with greater accessibility and speed at lower cost [14].
The aim of our study is to evaluate some of these scoring systems in order to validate them and assess which one of them better predicts a worse outcome in our cohort.

Study Design and Setting
We conducted a single-center prospective cohort study to validate four COVID-19 prognosis scores. The study protocol was approved by the regional ethics committee with the code 0259-N-21. This study was written in accordance with the TRIPOD statement for risk prediction models [15]. It was conducted at the San Cecilio University Hospital in Granada (Spain), with a reference population of almost 500,000 people.

Selection of Participants
This study included consecutive adult patients (age 18 years or older) by quota sampling, with infection confirmed by real-time reverse transcription-polymerase chain reaction (RT-PCR) to be SARS-CoV-2 positive [16], in the first 24 h of admission to the COVID-19 inpatient ward from 1 January 2021 to 20 March 2021.
Patients with suspected or confirmed bacterial pulmonary superinfection, those unable to cooperate with the examination (for physical or cognitive reasons), and those who voluntarily refused to participate were excluded from the study. All patients were given a study summary and signed an informed consent form with the possibility of revocation.

Measurements
Registration data, anthropometric variables, symptoms and onset date, comorbidities, vital signs, radiological severity, analytical data, and ultrasound lung involvement were compiled. The researchers on the COVID-19 transitional hospitalisation ward collected those clinical variables at the time of evaluation. The Emergency Department requested the analytical data at the first hospital contact, whereas the missing data were completed during the examination.
Ultrasound imaging was performed with an ultrasound scanner PHILIPS ® SPARQ (Koninklijke Philips N. V.; Amsterdam, The Netherlands). The predefined lung programme was established, using a convex transducer 6-2 MHz. The Clinical Care Ultrasound Group of the hospital trained 6 Internal Medicine physicians in lung ultrasound, who obtained the images. At least 2 examiners were present when the ultrasound images were acquired in order to reduce the inherent inter-individual variability in this technique. We used a 12 lung fields protocol, as it is the protocol that has shown the best balance between accuracy and acquisition time [17]. The clavicular midline and paravertebral line divided each hemithorax into anterior, lateral, and posterior fields. These fields were further subdivided into superior and posterior fields, resulting in 6 fields in each lung ( Figure 1). Prior examination of the patients medical history was avoided to prevent possible biases in image acquisition. into superior and posterior fields, resulting in 6 fields in each lung ( Figure 1). Prior examination of the patients medical history was avoided to prevent possible biases in image acquisition.

Outcomes
Outcomes collected were non-invasive mechanical ventilation or high-flow nasal oxygen therapy (NIMV), intubation (IMV), intensive care unit (ICU) admission, and death at 28 days. The combined outcome variables NIMV or IMV and NIMV or IMV or ICU admission or death at 28 days were also elaborated. Our primary variable was the combined variable poor outcome (NIMV or IMV or ICU admission or death at 28 days) and the secondary variable death at 28 days.

Scores Selection
The studied scores cover clinical, analytical, and imaging variables associated with poor prognosis in the evolution of COVID-19.
National Early Warning Score (NEWS) 2 [5], recognised in previous studies as a useful prognostic tool and superior to other scales in COVID 19 for both severe illness and in-hospital mortality [18], was used to analyse the prognostic value of clinical data. This takes into account vital signs, such as heart rate, systolic blood pressure, respiratory rate, peripheral oxygen saturation (SaO2), and body temperature; and it also includes the need for oxygen supply and the level of consciousness.

Outcomes
Outcomes collected were non-invasive mechanical ventilation or high-flow nasal oxygen therapy (NIMV), intubation (IMV), intensive care unit (ICU) admission, and death at 28 days. The combined outcome variables NIMV or IMV and NIMV or IMV or ICU admission or death at 28 days were also elaborated. Our primary variable was the combined variable poor outcome (NIMV or IMV or ICU admission or death at 28 days) and the secondary variable death at 28 days.

Scores Selection
The studied scores cover clinical, analytical, and imaging variables associated with poor prognosis in the evolution of COVID-19.
National Early Warning Score (NEWS) 2 [5], recognised in previous studies as a useful prognostic tool and superior to other scales in COVID 19 for both severe illness and inhospital mortality [18], was used to analyse the prognostic value of clinical data. This takes into account vital signs, such as heart rate, systolic blood pressure, respiratory rate, peripheral oxygen saturation (SaO 2 ), and body temperature; and it also includes the need for oxygen supply and the level of consciousness.
Furthermore, the most widely recognised international index for lung ultrasound in COVID-19 is the Lung Ultrasound Score (LUS) [11]. It scores from 0 to 3 each of the fields analysed, depending on the severity of the findings in each of them, as shown in Figure 1. The final sum of the score for all fields is used as a prognostic value. It was developed only correlating the score with clinical severity [7].
In addition to the scores mentioned above, two other indexes that include two or more of these categories were analysed. The COVID-19 Worsening Score (COWS) measures both LUS >15 points and clinical and analytical variables of the previously validated COVID-GRAM score [19], including dyspnea, symptom days, arterial oxygen pressure to oxygen inspired fraction ratio, and number of co-morbidities, including dementia, chronic obstructive pulmonary disease (COPD), chronic renal disease (CRD), diabetes mellitus (DM), hypertension, obesity, neoplasm, hepatitis B virus, cerebrovascular disease, cardiovascular disease, heart failure, and immunosuppression. The COWS measures the likelihood of developing severe disease in hospitalised patients [8]. Another prognostic tool is the COVID-19 Spanish Society of Infectious Diseases and Clinical Microbiology score (SEIMC Score), which studies clinical and analytical variables, such as age, low age-adjusted SaO 2 , neutrophil to lymphocyte ratio, estimated glomerular filtration rate, dyspnea, and sex in hospitalised patients and correlates it with 30-day mortality [9].

Analysis
Based on the data available at the beginning of the study, an estimated sample calculation of 22 patients was made to perform mean differences. We performed a simple logistic regression assuming an alpha of 0.05, a beta of 0.8, a probability of the event in the null hypothesis of 0.3, and a probability of the event in the alternative hypothesis of 0.7, with an estimated proportion of the population with the alternative hypothesis of 0.3. We set >15 as the severity cutoff point in the LUS, as used by COWS [8].
Descriptive statistics on the patients subjected to the ultrasonography exploration were performed by means and ranges for discrete variables, median, and inter-quartile ranges for continuous variables. For this analysis, normality was verified using the Shapiro-Wilk test. Non-normal variables means differences were tested using the Mann-Whitney test and the chi-square test for categorical variables. The main statistical analyses were derived in order to explore the relationship between our principal and second outcomes with several indices detailed above, such as NEWS2, LUS, COWS, or SEIMC Score. The methods consisted of univariate logistic regression models, estimating the risk of low index versus high index per unit depending on the classification. Receiver operating characteristic (ROC) curves were calculated to evaluate the predictive performance of these five scores for our primary and secondary outcomes. Areas under the curve (AUC) and 95% confidence intervals (CI) were also calculated. The cut-off point for the AUC was set at 0.7. This value indicates a favourable result, especially for the validation of a new score [20].
All tests were two-sided, and we determined a p-value < 0.05 and an AUC > 0.7 as significant. Data derivation procedure was calculated using SSPS software, version 24 (IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY, USA: IBM Corp.).

Characteristics of Study Subjects
Between 1 January and 20 March, a total of 769 patients were admitted to the COVID-19 hospitalisation ward of the San Cecilio University Hospital with a positive SARS-CoV-2 PCR, and 656 of those were admitted diagnosed of COVID-19 pneumonia. Of the remaining, 198 had one or more of the exclusion criteria described above. Finally, of the 458 eligible patients, 81 successive patients could be included in the study depending on the availability of human and material resources, as described in Figure 2.
The baseline characteristics of the participants are listed in Appendix A. The median age was 62 years (interquartile range 52-72), being significantly higher in dead patients (73) than in alive patients (59). The majority of participants were male (66.7%) although among the deceased, there was an equal number of males and females (4). Some comorbidities were more frequent in poor outcome and dead patients, such as COPD, CRD, immunosuppression (these three were statistically significant in death outcome), obesity, neoplasia, and heart failure. Radiologically, the median chest X-ray severity according to RALE Diagnostics 2021, 11, 2211 5 of 13 was 4 (statistically significant for both main outcomes) and the incidence of pulmonary thromboembolism 8.6%, more frequent in favourable outcome (10% vs. 5%) and death (11.1% vs. 8.3%) without reaching statistical significance. In clinical variables, the median number of symptom days was higher in the living (8 vs. 5) and RR in poor outcome (26 vs. 21, p < 0.01), while in poor outcome and death, altered level of consciousness (5% vs. 3.3% and 22.2% vs. 1.3% with p < 0.01) and supplementary O 2 (85% vs. 56.7% with p < 0.05 and 77.8% vs. 62.5%) were more frequent. Analytically, poor outcome and dead patients presented a higher frequency of levels above the established cut-off points in LDH and CRP scores although lower in D-dimer and neutrophil-lymphocyte ratio. They also had lower PaO 2 -FiO 2 ratio and median eGFR (p < 0.01 in death outcome), lymphocyte, and platelet values (p < 0.05 in both outcomes) while higher NT-proBNP levels (p < 0.05 in death outcome). The baseline characteristics of the participants are listed in Appendix A. The median age was 62 years (interquartile range 52-72), being significantly higher in dead patients (73) than in alive patients (59). The majority of participants were male (66.7%) although among the deceased, there was an equal number of males and females (4). Some comorbidities were more frequent in poor outcome and dead patients, such as COPD, CRD, immunosuppression (these three were statistically significant in death outcome), obesity, neoplasia, and heart failure. Radiologically, the median chest X-ray severity according to RALE was 4 (statistically significant for both main outcomes) and the incidence of pulmonary thromboembolism 8.6%, more frequent in favourable outcome (10% vs. 5%) and death (11.1% vs. 8.3%) without reaching statistical significance. In clinical variables, the median number of symptom days was higher in the living (8 vs. 5) and RR in poor outcome (26 vs. 21, p < 0.01), while in poor outcome and death, altered level of consciousness (5% vs. 3.3% and 22.2% vs. 1.3% with p < 0.01) and supplementary O2 (85% vs. 56.7% with p < 0.05 and 77.8% vs. 62.5%) were more frequent. Analytically, poor outcome and dead patients presented a higher frequency of levels above the established cut-off points in LDH and CRP scores although lower in D-dimer and neutrophil-lymphocyte ratio. They also had lower PaO2-FiO2 ratio and median eGFR (p < 0.01 in death outcome), lymphocyte, and platelet values (p < 0.05 in both outcomes) while higher NT-proBNP levels (p < 0.05 in death outcome).

Main Results
Differences between the subgroups for the two main outcomes under study are described in Table 1. In total, 21 patients in our cohort had a poor outcome, and nine patients died. Univariate analysis of each of the scores showed a statistically significant correlation between poor outcome and NEWS2 (OR 1.611, p < 0.001), LUS > 15 (OR 3.5, p = 0.019), and COWS (OR 3.968, p = 0.019). The study of death at 28 days showed statistically significant

Main Results
Differences between the subgroups for the two main outcomes under study are described in Table 1. In total, 21 patients in our cohort had a poor outcome, and nine patients died. Univariate analysis of each of the scores showed a statistically significant correlation between poor outcome and NEWS2 (OR 1.611, p < 0.001), LUS > 15 (OR 3.5, p = 0.019), and COWS (OR 3.968, p = 0.019). The study of death at 28 days showed statistically significant correlations in NEWS2 (OR 1.315, p = 0.041) and SEIMC Score (OR 1.190, p = 0.008). The performance of each of the scores was represented in the pooled ROC curves (Figure 3), detailing their AUC, optimal cut-point value, sensitivity, and specificity in Table 2. As in the univariate analysis, NEWS (AUC 0.785), LUS > 15 (0.617), and COWS (AUC 0.751) showed the best performance at poor outcome. It also coincides in death at Diagnostics 2021, 11, 2211 6 of 13 28 days, with NEWS (AUC 0.654) and SEIMC Score (AUC 0.854) showing the best results although COWS (AUS 0.690) also performs well. Statistical differences between the AUC of the scores were also studied according to DeLong's non-parametric method [21], as shown in Appendix B. As a result, NEWS (p = 0.028), SEIMC (p = 0.004), and COWS (p < 0.001) are superior to LUS as a continuous variable for predicting poor outcome, while COWS is also superior to LUS for predicting death at 28 days (p = 0.007). There is also significant superiority of NEWS over SEIMC for predicting poor outcome (p = 0.049).
The performance of each of the scores was represented in the pooled ROC curves (Figure 3), detailing their AUC, optimal cut-point value, sensitivity, and specificity in Table 2. As in the univariate analysis, NEWS (AUC 0.785), LUS >15 (0.617), and COWS (AUC 0.751) showed the best performance at poor outcome. It also coincides in death at 28 days, with NEWS (AUC 0.654) and SEIMC Score (AUC 0.854) showing the best results although COWS (AUS 0.690) also performs well. Statistical differences between the AUC of the scores were also studied according to DeLong's non-parametric method [21], as shown in Appendix B. As a result, NEWS (p = 0.028), SEIMC (p = 0.004), and COWS (p < 0.001) are superior to LUS as a continuous variable for predicting poor outcome, while COWS is also superior to LUS for predicting death at 28 days (p = 0.007). There is also significant superiority of NEWS over SEIMC for predicting poor outcome (p = 0.049).

Discussion
Most physicians agree on the importance of clinical findings of haemodynamic instability [22] and analytical findings of hyperinflammation syndrome [23] in the management of patients with COVID-19, supported by the classical radiological studies of chest X-ray and chest CT [24]. However, weighing each of these findings separately as a better predictor of the disease outcome proves to be very complicated. In this context, different scores that group several of these variables and assign a weight to them have emerged continuously up to the present time [25][26][27] as well as alternatives to conventional diagnostic tests, such as lung ultrasound [28]. However, a new question arises when trying to use these indices in clinical practice, where effectiveness in the management of time and resources is essential in a pandemic context. For this reason, we conducted this comparative study between representative scores of the most commonly used variables in medical practice (clinical, analytical, and imaging), avoiding some that have raised doubts about their usefulness [29].
Although several studies have attempted to deal with this problem, some of them have focused on validating previous indices not specific to COVID-19 [30,31], comparing them with new indices developed particularly for this disease [32,33] or comparing scores that evaluate one or two categories of the variables used to assess the severity of the disease [34]. However, none of them have validated COVID-specific indices taking into account the new diagnostic alternative of lung ultrasound or scores derived from it. More and more clinicians are becoming proficient in this technique, and it is no longer a diagnostic test offered by the radiology department but an increasingly accessible tool in the emergency department and on the hospital ward.
The results described above validate many of the original studies that developed the scores analysed. NEWS2 was confirmed as a useful tool both for predicting severe disease and for studying in-hospital death, showing significant results for both outcomes in our cohort. LUS findings had already shown good correlation with disease severity, repeating these results in this study with a cut-off point of 15. The SEIMC Score data for 30-day mortality were also confirmed, being significantly associated as well with 28-day death in our study. Finally, the COWS was developed to measure the probability of developing severe disease and, in our study, not only confirmed this result but also showed an acceptable diagnostic ability for 28-day mortality according to the AUC (however, it did not reach the pre-determined statistical significance), with a high sensitivity for both outcomes.
From the detailed analysis of the results, we appreciate the importance of the cut-off points. Setting the same cut-off point for the two prognostic variables (death and poor prognosis) may imply decreasing the utility, predicting one of the variables in favour of the other. In the case of LUS, on the other hand, it seems that its usefulness as a discrete variable is less than as an ordinal variable. Categorising into mild and severe groups with a cut-off point of 15 (as used by the COWS) has greater statistical significance in our sample than its AUC. The NEWS is a clinical scale, which measures the impact of COVID-19 at the time of the evaluation. Perhaps for this reason it is a better predictor of poor outcome, as clinical variables are established criteria for starting mechanical ventilation. However, its usefulness in predicting death as an outcome is less in the AUC analysis although it is possible that with greater sample power it would reach statistical significance, as it does in simple logistic regression. The COWS has a similar issue, in which a larger sample size would possibly show statistical significance in the analysis of death as an outcome (AUC 0.69, having used 0.7 as the cut-off point for statistical significance). Finally, the SEIMC Score appears to be an index developed specifically to predict death as an outcome, including the variables that, in the cohort analysed, best predicted this outcome. Specifically, the inclusion of age and eGFR as variables in their calculation may be two of the reasons for this. Our study provides external validity to these results but confirms that it is not a useful score for predicting poor prognosis.
Among the strengths of our study, as an underused clinical tool equal to other more commonly used techniques such as blood tests, it is worth noting the inclusion of lung ultrasound and the scores derived from it. Finally, the outcomes measured included both poor outcome and death at 28 days, in contrast to other studies that assessed only one or the other, and there were been any missing data among participants.
The study was designed with the purpose to reduce the limitation of being an operatordependent technique, with two or more operators always required to be present during image acquisition and interpretation. However, the biases in lung ultrasound performance by our team of operators, although mitigated by the presence of other colleagues, were still present and cannot be completely ruled out. Potential limitations of the study could arise from the sample size. Although the initial calculation showed us that the sample of 24 patients was sufficient, probably with a larger sample, we could perform a mutivariate analysis, including joint clinical and ultrasound parameters, and confirm the clear tendency of some variables to reach statistical significance without reaching it. Participants included among the eligible patients were recruited on a randomised basis, so we believe that the patients included are a representative sample of the total population admitted to the inpatient ward of our hospital and therefore do not represent a bias in itself. In addition, some parameters, such as D-dimer or cardiovascular disease, are more frequent in favourable outcomes than in other studies, which could probably change with more cases favouring regression to the mean. Though a careful selection was made from among the many existing scores, selecting representative examples of the variables most commonly used in clinical practice, the continuous proliferation of these indices makes it impossible to compare all of them. At last, the exclusion of patients with potential bacterial superinfection in the lung may be a bias that eliminates patients with more severe evolution, while the inclusion of lung ultrasound implies an operator-dependent technique that may make reproducibility in other studies difficult.
This study raises some issues that will need to be addressed in further studies. Which type of transducer (convex or linear) is most suitable for lung ultrasound should be defined and demonstrated. Most of these scores have also been developed and validated in the hospitalised population, and studies are needed to test their usefulness in the out-ofhospital setting. Our study indicates the necessity of establishing LUS cut-off points for mild, moderate, and severe disease in future studies. We also consider it necessary to establish guidelines in the elaboration of these scores and to combine those that measure the same variables into indices with greater statistical power.

Conclusions
In summary, NEWS2, LUS, and COWS are good scores for predicting poor outcome of COVID-19, while NEWS2, SEIMC Score, and COWS are useful scores for predicting death at 28 days. Both COWS and NEWS are the scores that best predict the evolution of COVID Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement: Not applicable.
Acknowledgments: To Valme Sanchez Cabrera, for her support in managing the database, and to Alejandra Ciria García, for her assistance in text edition.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.