Validation of Prognostic Scores in Extracorporeal Life Support: A Multi-Centric Retrospective Study

Multiple prognostic scores have been developed for both veno-arterial (VA) and veno-venous (VV) extracorporeal membrane oxygenation (ECMO), mostly in single-center cohorts. The aim of this study was to compare and validate different prediction scores in a large multicenter ECMO-population. Methods: Data from five ECMO centers included 300 patients on VA and 329 on VV ECMO support (March 2008 to November 2016). Different prognostic scores were compared between survivors and non-survivors: APACHE II, SOFA, SAPS II in all patients; SAVE, modified SAVE and MELD-XI in VA ECMO; RESP, PRESET, ROCH and PRESERVE in VV ECMO. Model performance was compared using receiver-operating-curve analysis and assessment of model calibration. Survival was assessed at intensive care unit discharge. Results: The main indication for VA ECMO was cardiogenic shock; overall survival was 51%. ICU survivors had higher Glasgow Coma Scale scores and pH, required cardiopulmonary resuscitation (CPR) less frequently, had lower lactate levels and shorter ventilation time pre-ECMO at baseline. The best discrimination between survivors and non-survivors was observed with the SAPS II score (area under the curve [AUC] of 0.73 (95% CI 0.67–0.78)). The main indication for VV ECMO was pneumonia; overall survival was 60%. Lower PaCO2, higher pH, lower lactate and lesser need for CPR were observed among survivors. The best discrimination between survivors and non-survivors was observed with the PRESET score (AUC 0.66 (95% CI 0.60–0.72)). Conclusion: The prognostic performance of most scores was moderate in ECMO patients. The use of such scores to decide about ECMO implementation in potential candidates should be discouraged.


Introduction
Multiorgan failure (MOF) is a common complication in critically ill patients requiring intensive care unit (ICU) admission and is associated with a high mortality rate. Therefore, multiple scoring systems such as the sepsis-related organ failure assessment (SOFA) [1], the simplified acute physiology score (SAPS II) [2] and the acute physiology and chronic health evaluation score (APACHE II) [3] have been developed to quantify the severity of illness, to understand the evolution of the acute illness, to evaluate the impact of treatment and to predict outcome in critically ill patients [1][2][3]. Due to the rapid progression in therapeutic options for such patients, prognostic scores have also been developed for those undergoing extracorporeal membrane oxygenation (ECMO) to eventually allocate expensive and complex resources.
According to the Extracorporeal Life Support Organization (ELSO) [4], indications for veno-arterial (VA) and veno-venous (VV) are severe refractory cardiogenic shock and respiratory failure with an expected mortality risk above 50%, respectively. However, these indications are still controversial and differ among centers. Therefore, scoring systems might be helpful to identify subgroups of patients in whom the initiation of ECMO would be very beneficial or associated with a very low likelihood of survival.
For VA ECMO patients, the SAVE score [5], the modified SAVE score (with addition of lactate) [6] and the MELD-XI [7] are largely used; and in VV ECMO patients, the RESP score [8], the PRESERVE score [9], the ROCH score [10] and the PRESET [11] are reportedly used to predict outcome and to guide decision-making for whom to support with ECMO; this would be beneficial in cases of limited resources such as the COVID-19 pandemic in order to enable better allocation. Most scores are derived from small single-center cohorts [6,[9][10][11] and have not been validated in large multicenter cohorts.
In our study we compared specific ECMO scores with general ICU scores in a large multicenter cohort of patients from five European high-volume ECMO centers and analyzed which scores performed most accurately in the two most used ECMO modes.

Study Population
Consecutive patients with severe ARDS or cardiogenic shock requiring ECMO either in VV or VA mode between March 2008 to November 2016 were included from five European high-volume ECMO centers (Brussels, Milan, Stockholm, Pavia, and Regensburg). Patients <18 years and with configurations other than VV or VA were excluded. The requirement of individual patient consent and necessity of approval for the data report complied with the declaration of Helsinki and were waived by the local ethics committee because of the study's design and data collection from routine care.
Indications for ECMO were based on local ECMO protocols and ELSO guidelines [4]. Contraindications were in agreement with ELSO guidelines [4] such as advanced age, chronic irreversible organ dysfunction, malignancies with fatal prognosis within 1 year, and contraindication for therapeutic anticoagulation.

Data Collection
Routine data (e.g., demographics, diagnosis group, biochemistry, cardiac and respiratory parameters) were assessed before ECMO initiation and were extracted from the electronic patient data management systems. Survival was assessed at ICU discharge.
The following scores assessing the severity of illness were applied to both the VA and VV cohorts: APACHE II [3], SOFA [1] and SAPS II [2]. Additionally, specific ECMO scores such as SAVE [5], modified SAVE [6], and MELD-XI [7] scores were assessed in the VA cohort, whereas RESP [8], PRESET [11], PRESERVE [9], and ROCH [10] score were evaluated in the VV cohort. More details of each score are presented in Supplemental Tables S1-S10. Only patients with a complete data set were included in the analysis. The primary objective of this retrospective multicenter study was to compare ECMO-specific scores with general ICU scores and to predict mortality in VA and VV ECMO. Secondary outcome included the identification of the most accurate predictive score for each subgroup of patients.

Statistical Analyses
Unless otherwise indicated, descriptive data were expressed as medians and interquartile range (IQR) or as frequencies (%) of each category. The subgroups of patients (survivors and non-survivors) were compared using the Chi-square test for categorical variables and the Mann-Whitney U test for continuous variables. Scores were retrospectively calculated according to original publications [1][2][3][5][6][7][9][10][11]. In order to assess discrimination and calibration, each score was put as a test variable with mortality (no/yes) as the outcome variable in a univariate logistic regression analysis. Discrimination was assessed by area under the receiver-operating characteristics curve (AUC), where an AUC of 0.50 suggests no discrimination, 0.50 to 0.69 considered moderate, 0.70 to 0.79 acceptable, 0.80 to 0.89 excellent, and more than 0.9 as outstanding [12]. AUC was compared using an algorithm suggested by DeLong et al. [13]. Calibration was assessed with a Hosmer-Lemeshow (HL) test and visually by calibration plots using the module pmcalplot in Stata [14]. Model comparison also included calculation of Akaike and Bayesian Information Criterions (AIC and BIC, respectively), which are used to assess model fit while penalizing the number of estimated parameters. The model with the lowest AIC and BIC score was preferred. A two-sided p-value < 0.05 was considered a statistically significant difference. Data analyses were performed with the software package Stata (v.16.0, StataCorp, 4905 Lakeway Drive, College Station, TX 77845, USA).

Results
A total of 629 ECMO patients were included in this study; 300 in the VA and 329 in the VV ECMO cohort.

VA ECMO Population
The cohort consisted mainly of men (66.3%) with a median age of 57 years ( Table 1). The main indication for VA ECMO was cardiogenic shock (53%), septic shock (20%), and refractory cardiac arrest (19%). A total of 153 (51%) patients survived to ICU discharge. Pre-ECMO cardiac arrest and mechanical ventilation > 7 days were observed less frequently in survivors than in non-survivors (26% vs. 50%, p < 0.001; 40% vs. 51%, p = 0.045), respectively. Blood gas analysis before ECMO initiation revealed lower levels of lactate and higher levels of bicarbonate and pH among survivors (Table 1). Predictive scores for VA ECMO are presented in Table 2. APACHE II, SAPS II, SAVE and modified SAVE score, but not MELD-XI and SOFA were significantly different between survivors and non-survivors ( Figure 1). Expected mortality rates were quite different between scores, ranging from 8.5 to 76%. Compared to observed mortality rate, the greatest amount of overestimation was observed with SAPS II and SAVE scores ( Figure 2). Best discrimination for ICU survival was offered by SAPS II and APACHE II score (AUC = 0.727 (95% CI: 0.669 to 0.784); AUC = 0.716 (95% CI: 0.658 to 0.774)) with good calibration (HL Chi 2 statistic of 13.23 (p = 0.10) and 8.11 (p = 0.42)). Other scores, such as SOFA, SAVE, modified SAVE, and MELD-XI performed less accurately ( Figure 3). Calibration plots for each score are depicted in Figure S1. APACHE II showed best calibration, SAVE and SAPS II deviated in calibration for extreme scores. Poor calibration was observed for MELD-XI and SOFA ( Figure S1).

Discussion
This study provides new insights in the validation of established general ICU and dedicated ECMO scores in a large-scale mixed cohort of patients supported with either VA or VV ECMO from five high-volume European ECMO centers. In total, 629 ECMO patients were included and analyzed.
Survivors on VA support were younger, had higher GCS, higher pH, and lower levels of lactate, and were less often ventilated >7 days compared to non-survivors. Expected mortality rates between scores for this patient cohort were overestimated with APACHE II, SAPS II, SAVE, and underestimated with SOFA, modified SAVE, and MELD-XI. General ICU scores such as APACHE II and SAPS II best discriminated survivors from nonsurvivors. The specific ECMO scores, SAVE, and modified SAVE were inferior. SOFA score performed the worst.
Similar results were seen in the VV cohort. Expected and observed mortality rates were best matched by the APACHE II, SOFA, and RESP score. However, the absolute

Discussion
This study provides new insights in the validation of established general ICU and dedicated ECMO scores in a large-scale mixed cohort of patients supported with either VA or VV ECMO from five high-volume European ECMO centers. In total, 629 ECMO patients were included and analyzed.
Survivors on VA support were younger, had higher GCS, higher pH, and lower levels of lactate, and were less often ventilated >7 days compared to non-survivors. Expected mortality rates between scores for this patient cohort were overestimated with APACHE II, SAPS II, SAVE, and underestimated with SOFA, modified SAVE, and MELD-XI. General ICU scores such as APACHE II and SAPS II best discriminated survivors from non-survivors. The specific ECMO scores, SAVE, and modified SAVE were inferior. SOFA score performed the worst.
Similar results were seen in the VV cohort. Expected and observed mortality rates were best matched by the APACHE II, SOFA, and RESP score. However, the absolute values only partly reflected AUC values, in which PRESET score discriminated best, although suboptimally.
Overall, general ICU scores were superior in the VA cohort as compared to those devised for ECMO. This did not apply for the VV cohort. These differences might stem from the fact that general ICU scores include more variables reflecting cardiac than respiratory parameters [2,3].
The number of included variables differed between scores (Table S1). General ICU scores such as APACHE II [3] and SAPS II [2] consist of 15 and 17 variables, respectively, whereas VA ECMO scores are composed of 9 or 10 variables [5,6]. Similar is true for VV ECMO scores (using up to 10) [8][9][10][11]. SOFA score is in the middle with six variables, however, SOFA performed the worst in the VA cohort.
In general, scores performed worse in the current study than in the score derivation studies [1][2][3][5][6][7][8][9][10][11] and none performed exceptionally well [12]. In contrast to previous studies based on register data [5,8] with, e.g., full physiologic data of only 23% in the SAVE study [5], this analysis represents data from complete datasets only. Therefore, these two European cohorts (VV and VA) challenge the performance of the scores in a heterogenous population. Unfortunately, the discrimination between survivors and non-survivors was moderate at best. In the current study, a large discrepancy between the predicted and the observed mortality was up to 41% in the VA and up to 53% in the VV. One can argue that for the general ICU scores these mismatches might relate to the different patient populations (e.g., septic patient cohort for the compilation of SOFA score [1]). However, for the specific ECMO scores, similar observations were seen in the current analysis. Thus, the clinician might incorporate the comparison of his own patient population with the studied population, respectively, to guide further management.
Primary endpoints differed between studies and ranged from ICU mortality to survival at 6 months [1,9]. For the current analysis, we chose successful discharge from ICU because this value can be easily assessed without any nonresponse bias. The PRESERVE study chose survival at 6 months post-ICU discharge and therefore our data might be limited when applied to this score. However, ICU survival in the current cohort was even lower than the predicted 6-months survival by the PRESERVE score.
The simple scores might be less inaccurate, while more complex scores may be difficult to use for the bedside clinician. However, due to improvement in technologies, most of the scores can be automatically calculated by patient data management systems at bedside. The current validation study on a large ECMO cohort reflects clinical day-to-day routine: scores might be helpful, but only a piece of the complex puzzle of a critically ill patient, made by a bundle of several therapeutic issues. Therefore, a clinical decision should not rely solely on risk scores, but be incorporated in the complex interaction of clinical status, experience, clinical studies, patients' wishes, as well as variables not evaluated in these scores such as frailty [15]. Indeed, in the ICU, it is hard to mirror patient status with only 3-17 score parameters; however, until further evidence is provided in intensive care, we have to find a compromise between evidence and eminence-based practice until we can further translate patient status into absolute score numbers.

Limitations
A direct causal relationship cannot be inferred due to the retrospective study design. The participating units reflect highly experienced ECMO centers. Therefore, the results might not be generalizable. However, due to the multicenter approach, differences might be harmonized. Survival was defined as successful discharge from ICU in contrast to some of the derivation studies [2,3,[5][6][7][8][9][10]. However, the observed mortality rates in the current analysis were higher than expected according to the predicted mortality rate of many of the derivation studies. Comparison between centers was not performed since the aim was to apply the scores in a large-scaled multicenter cohort. ENCOURAGE score [16] was not assessed due to missing values. In contrast to other studies [5,8], the current data was not derived from registries, which should be considered a strength. Only complete patient data sets were included in the analysis and the data generated from five independent centers likely eliminate single-center specifics and increase the potential of result generalizability. Further prospective studies are needed.

Conclusions
The performance of most risk scores was suboptimal in patients on VV and VA ECMO. In VA ECMO patients, best discrimination between survivors and non-survivors was seen using non-ECMO scores, whereas in VV, PRESET score performed best. The use of such scores to decide about ECMO implementation in potential candidates should be discouraged.
Supplementary Materials: The following are available online at https://www.mdpi.com/2077-037 5/11/2/84/s1, Figure S1: Comparison of predictive performance for all VA ECMO scores; Figure S2: Comparison of predictive performance for all VV ECMO scores; Table S1: Comparison of general ICU scores and ECMO scores; Table S2: APACHE II score: expected mortality rate according to scoring; Table S3: SOFA score: expected mortality rate according to scoring; Author Contributions: C.F. and M.V.M. were responsible for conceiving and designing the study and its hypotheses, acquiring study funding, collecting, analyzing, and interpreting the data, and writing and revising the manuscript prior to submission. L.A.R.-G., T.B.E., F.S.T., L.M.B., M.B., L.N. and F.P. were involved in the collection and interpretation of data and critical revision of the manuscript prior to submission. T.B.E. was involved in data analysis and critical revision of the manuscript prior to submission. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement:
The requirement of individual patient consent and necessity of approval for the data report complied with the declaration of Helsinki and were waived by the local ethics committee because of the study's design and data collection from routine care.
Informed Consent Statement: Patient consent was waived because of the study's design and data collection from routine care.
Data Availability Statement: The database will be available upon reasonable request to the authors.