Modeling Hospital Resource Management during the COVID-19 Pandemic: An Experimental Validation

: One of the main challenges posed by the healthcare crisis generated by COVID-19 is to avoid hospital collapse. The occupation of hospital beds by patients diagnosed by COVID-19 implies the diversion or suspension of their use for other specialities. Therefore, it is useful to have information that allows efﬁcient management of future hospital occupancy. This article presents a robust and simple model to show certain characteristics of the evolution of the dynamic process of bed occupancy by patients with COVID-19 in a hospital by means of an adaptation of Kaplan-Meier survival curves. To check this model, the evolution of the COVID-19 hospitalization process of two hospitals between 11 March and 15 June 2020 is analyzed. The information provided by the Kaplan-Meier curves allows forecasts of hospital occupancy in subsequent periods. The results shows an average deviation of 2.45 patients between predictions and actual occupancy in the period analyzed.


Introduction
One of the conclusions drawn from the analysis of previous health crises is that prevention strategies are necessary to cushion the impact of a pandemic of the characteristics of the COVID-19 disease (Lai et al. 2004;Zhao et al. 2020). Currently, prevention strategies are also key to control the occurrence of new outbreaks of this disease. Goscé et al. (2020) shows that immediate action in the early stages of the epidemic in the affected London boroughs would have slowed the spread. Unpredictability in the implementation of measures to contain an epidemic has the immediate consequence of overloading the healthcare system. The result is massive and unexpected hospital admissions, creating a national problem of unforeseeable dimensions.
Public health agencies report epidemic curves periodically, focusing on laboratory confirmed cases of COVID-19 using the various diagnostic tests. However, epidemic curves based on laboratory-confirmed cases show disease detection, but do not help to efficiently forecast hospital bed occupancy or intensive care unit (ICU) bed occupancy (García-Basteiro et al. 2020). The main challenge facing the governments of countries affected by the pandemic is to prevent the healthcare system from collapsing or becoming so stressed that it cannot adequately care for patients with COVID-19 or other pathologies.
Following the outbreak of the COVID-19 pandemic, a remarkable effort has been made to obtain reliable information on mortality and critical events in patients affected by the disease. For example, Vaid et al. (2020) show an application of machine learning techniques to this question, using Extreme Gradient Boosting together with comparable baseline methods to provide some prospective insight into relevant changes in patient status. These models are able to identify at-risk patients and underlying correlations that were not explicitly observed before by experts. However, relevant to the present work, this provides valuable information on the status of the pandemic, but also useful data for the management of a hospital's resources: knowing how many intensive care beds could be available a week earlier could save lives. More information on machine learning techniques for management during the COVID-19 pandemic can be found in, for example, (Roimi et al. 2021) or (Domínguez-Olmedo et al. 2021) and the references therein. To improve prospective techniques and adapt current machine learning technology, other procedures that blend statistical analysis and artificial intelligence have also proved useful. The reader can find applications of the so-called state-space models for inference in the paper (Zhou and Ji 2020) and the references cited in it (see Imani and Ghoreishi 2021) for some applicable current developments of this technique. Other aspects of COVID-19 management that are relevant and affect public health have also been addressed using machine learning tools.
Reliability Analysis or Survival Analysis is the set of procedures used to analyze data in which the latent variable is the time that elapses from a well-defined initial instant to the occurrence of a certain event or final instant. This variable represents the time of life, for example relapse of a patient with respect to a certain pathology, death, or the occupation of a bed. From a mathematical point of view, this variable is represented by a non-negative random variable (Cox and Oakes 1984). One of the most popular models to describe how to estimate and graph survival curves is the Kaplan-Meier (KM) model (Kaplan and Meier 1958). We use this model to estimate hospital occupancy for patients who have been diagnosed with COVID-19.
The Kaplan-Meier curve of bed occupancy by COVID-19 patients (KM curve) represents the management capacity of hospital institutions, and can be considered as the "fingerprint" of the system's response to an increasing number of admissions. It acts as the transfer function in the control system that mathematically expresses the balance between inputs to the system (new patients) and outputs (discharged or deceased patients), which can be represented as a convolution of the inputs with the KM curve (Calabuig et al. 2020). 1 The KM curve we refer to is the one that represents the statistical distribution of the probability that an admitted patient remains in the hospital; in this sense, it can be understood as the survival function of the virus. Our treatment is independent of the disease process during the course of the patients' stay in the hospital, and of the factors that aggravate the disease, (Vincent and Taccone 2020), focusing attention on the statistical distribution of the time a patient stays in the hospital.
The aim of this paper is to present the survival analysis performed on the data of hospital admissions and discharges of COVID-19 patients during the second quarter of 2020 in two hospitals in Spain. The aim is to show how the procedure can be used systematically to provide simple and easy-to-analyze evidence of the state of the healthcare systems at each point in time and prospectively in a short interval of one-two weeks. The main methodological hypothesis that we assume for the application of the model is that, basically, hospital management will be similar in all the waves of contagion that are being suffered to the first one that was recorded before the summer of 2020 in Spain. However, the model can be improved by feeding back data in different periods. The implementation of a machine learning algorithm would provide more optimal information for future prediction of hospital occupancy. The unavailability of more recent data has prevented us from refreshing our model and, therefore, from applying these techniques with updated data. In any case, from the data available to us, it is possible to extract the relevant KM curves that allow us to predict the behaviour of hospitals in crisis situations, although the dynamics of admissions would not be the same (although similar). Moreover, although estimating the input function is the weak point of any model, the convolution of its extrapolation with the KM curve can help to dampen the error made in this approximation. This information provides us with an estimate of hospital occupancy, allowing us to adapt in the following weeks the future occupancy of other specialties to the estimated needs. Additionally, according to (O'Hagan and Stevens 2004), it is possible to estimate the average total cost per patient and obtain an economic evaluation generated by the occupancy due to this disease. After the emergence of the COVID-19 crisis, numerous investigations on the estimation of the risk of severe disease in similar mathematical frameworks have been developed (see, for example, Liang et al. 2020;Zhao et al. 2020).
In addition to the one we propose in this work, there are different methodologies that allow estimating the bed capacity of hospitals during the pandemic. Garcia-Vicuña et al. (2020) analyze on the one hand, the inflows of patients affected by SARS-CoV-2 where the Gompertz curve is proposed after studying models based on the growth curves of the twenty most affected countries until 15 June 2020. On the other hand, they analyze the implementation of the simulation model (DAS) applied in the first outbreak in the Autonomous Community of Navarra (Spain), based on different procedures according to the information available on hospitalized patients. Bhandari et al. (2020) analyzed the survival curves of 987 COVID-19 patients from SMS Medical College, Jaipur (Rajasthan, India) where censoring is also established, as in our case, by hospital discharge or death. Although the data reported in the cited article differ significantly from those analyzed in the present work, they do show some coincidences; it should be taken into account that the health systems of the two countries on which our analysis is based (Spain and India), are different.
Given the assumptions we make-which we have already explained-, the following considerations must be taken into account for the application of our model.

1.
It should be borne in mind that this is essentially an extrapolation, so the results should be interpreted with the greatest of reservations.

2.
It should also be noted that, since data input was closed on that day, it is assumed that on August 31 the number of patients admitted for COVID-19 was zero. 3.
All patients are considered to complete their hospital stay, i.e., there are no cases that are not followed up. This has relevance within the classic Kaplan-Meier model (empty set of censored population).

4.
Some statistical aspects related to the reliability of the solution (error bars, expected deviations, etc.) are only partially presented, since we intend to convey the most relevant information of the solution of this complex problem. This information can be found in part in the tables presented at the end of the paper.

Method
The Kaplan-Meier survival model consists of estimating the probability that an individual who has a given property-which is lost over time-will continue to keep it over time t (Kaplan and Meier 1958). The time variable takes discrete values and is counted by days, starting at t = 0. If we write P(t) for this probability, we can estimate it by where N = n(0) is the total amount of individuals in the population, n(t) is the number of individuals still having the property at the time t, and d(t) is the number of individuals leaving the group-deceased, either not having the property or not being under control anymore-after the time t. The canonical example is given by the so-called survival curves: for example, the distribution of survival time of patients after heart transplantation. In our context, we apply these curves for the case in which we consider a given population of COVID-19 patients admitted to a hospital, and n(t) the number of these individuals still in the hospital after t days. This allows us to describe the hospital dynamics, i.e., the number of hospital beds that have been used by COVID-19 patients, when considering a flow of new cases. The function P(t) is then an estimate of an ideal distribution function P (t) that would describe the global behavior of the flow of patients through the hospital. We can then apply the classical statistical framework to estimate some relevant parameters of the flow. Let us recall that, in general, if we have a continuous probability distribution p(t) and we want to determine the expectation of the value of any random variable X(t) in an interval [0, t], we can write it as the integral However, the Kaplan-Meier curve does not give such a distribution function but an estimate of the probability of the virus to survive in the control group of patients after t days, and so we have to rewrite this equation to get the desired estimated value of any random variable in an interval [0, t]. Let [0, T] be a discrete interval measured in days. Let us consider a real function J : [0, T] → R that gives any property that can be assumed to be proportional to the number of patients (for example, the number of beds to be used, hospital rooms, number of intensive-care ventilators, or the number of patients itself). Our model is based on the construction of an estimate of the total amount of this quantity by means of the estimate of the number of new cases entering the hospital, and its convolution with the "pressure function" J representing the "new needs" of the variable J at the time t. Let us compute the total amount H of the "need" J after some days t = 0, 1, 2, . . . . At the day t = 1, if there is a need equal in t = 0 equal to J(0), it produces a real need given by the rate of the value of J(0) for the estimate of the rate of the patients that are still at the hospital, P(1). Therefore, we have that at the day t = 1 the estimate of H is given by For t = 2, we have the part of I(0) corresponding to the rate of patients that are still in the center, P(2)-that is, J(0)P(2)-, plus the part of J(1) corresponding the rate of patients that have been 2 − 1 = 1 days, that is J(1)P (1), what gives a total amount of H(2) = J(0)P(2) + J(1)P(2) Following this rule, we can write the general formula of the accumulated need H at the day t as the convolution equation Note that we are starting to compute J at t = 0-to initialize the original need-, but the model starts to work at t = 1. Using this equation, if we consider J(t) to be the function I(t) of the number of new cases entering the hospital on the day t, we have that the estimate of the total number of infected patients D(t) still in the hospital on the day t is given by the equation Of course, we do not know the exact value of the function P, but we can use the estimate of the probability given by the formula we started from to obtain an estimate of D. Anyway, we can also give a parameterization of such a function P and calculate an error band, as will be seen later, although it is not necessary to use this method. This simple model allows us to calculate, from the information obtained during the first wave of the epidemic (before August 2020), the number of patients in the hospital at a given time, once a good estimate of the function of new cases I is known. Also, as has been explained above, the model can be used for doing a prospective of any particular service of the hospital, for example the number of occupied beds in rooms of intensive care. It is not our intention to explain here the construction and main properties of our adaptation of the Kaplan-Meier model to this special case; the interested reader can find all the explanations and main equations in (Calabuig et al. 2020) and the references therein. To estimate the occupancy, we programmed an algorithm using R software (R Core Team 2020). In addition, we use several standard R packages for the computations and plots, as survival (Therneau and Grambsch 2000) or ggplot2 (Wickham 2016).

Material
Based on the anonymized statistical data provided on admissions and discharges of COVID-19 patients from the Andalusian Health Service in the hospitals of Granada from March 2020 to September 2020, the Kaplan-Meier survival curve for the SARS-COV2 in these hospitals was calculated. The sample consists of 1.176 patients who were admitted between 11 March and 15 August, the date on which the de-escalation period ended in the Andalusia region.
Let us provide a basic statistical description of the database of admissions and discharges by group. The database consists of 625 men and 551 women. Figure 1a shows the box plot that allows us to visualize the differences and the distribution of observations by gender. Considering the age of the men and women hospitalized for COVID-19, the mean age of the 625 men is 65.4 years with a standard deviation of 15.7 and the mean age of the 551 women is 66.9 years and a standard deviation of 18.5. There were only two outliers in both men and women, very young patients. A computing unpaired two-samples Wilcoxon test provides significant differences between the ages at admission at 5% significance with a p-value of 0.03069. Figure 1b shows the box plot with respect to the number of days of hospital stay. An asymmetric distribution is observed with a large number of patients having prolonged stays. Patients who suffer a worsening of the disease and do not die, have prolonged stays in intensive care units and later in recovery units of more than 30 days. Table 1 shows the results for stays by gender in the period analyzed.

Kaplan-Meier Curves Obtained
The Kaplan-Meier curve for the period analyzed in the hospitals of Granada can be seen in Figure 2, and the results obtained are shown in Table 2. For example, the probability that a patient is occupying a bed for 8 days is between 0.47 and 0.54 assuming a significance level of 1%. For a set of observations of 1159 patients, the mean estimate for the survival curve gives 11.764 days of average length of stay, with 9 days being the median estimate. The estimate of the deviation yields a value of 0.333 days.  Another issue to take into account is described by the risk function 2 , that is, the probability that a patient admitted for COVID-19 will leave the hospital on day t. While the survival function focuses primarily on the "non-occurrence" of the event (probability that the patient is occupying a bed on day t), the hazard function focuses on the "occurrence" of the event, i.e., that the patient leaves the hospital. This function provides useful information because it provides an occupancy rate. On the other hand, the cumulative risk function shown in Figure 3, provides the cumulative sum for the considered event (a patient leaves a free bed) on day t.
The results of the Kaplan-Meier curves and hazard ratios by gender and age groups are presented in the corresponding figures below. Figure 4 shows the Kaplan-Meier curve by gender for hospital occupancy by COVID-19 patients in hospitals in Granada between 11 March-15 June 2020. We used the Bonferroni method (see for example Tripathi and Pandey 2017) to verify that there are significant differences between both curves, being the probability of staying at least t days higher in men than in women. The computations show statistical differences between man and woman with a p-value equal to 0.00015. The risk functions are shown in Figure 5.   Figure 6 shows the Kaplan-Meier curves by age ranges for hospital occupancy by COVID-19 patients in Granada hospitals between 11 March and 15 June 2020. Using the Bonferroni method, we studied whether there were significant differences between all of them. The only pair that does not present significant differences is between the curves of the age ranges 31-49 years and 50-65 years with a p-value of 0.3371. Therefore, these two age ranges can be analyzed together when estimating hospital occupancy. Additionally, the cumulative hazard function (Figure 7) shows us that the risk of leaving the hospital is high for patients aged 0-30 years after 25 days. Patients in the 31-49 age range have a constant risk of leaving the hospital after 55 days. The cumulative risk rises for patients over 65 years of age at 75 days and finally, at 90 days, for patients between 50 and 65 years of age.

Occupancy in Intensive Care Units
Patient occupancy in intensive care units (ICU) represents the most sensitive problem in the management of this disease for several reasons. Firstly, because the number of beds is more limited. Secondly, because the occupancy rate has remained around 75% between 2013 and 2018, a level close to saturation (Eurostat 2020). The third and most worrying reason lies in the fact that patients in the Intensive Care Units of the COVID-19 have a much longer average length of stay. Consequently, the analysis of KM curves in these units becomes more necessary than in the rest.
We analyzed the KM curve of virus survival in the Intensive Care Units (ICU). We test our model based on the data provided by the Andalusian Health Service for the ICU of the Virgen de las Nieves Hospital during the months of March to June. The sample consisted of 75 patients who were admitted between 20 March and 15 August. Figure 8 shows the KM curve for the analyzed unit. On this case a manifest prolongation of stays is observed, for example, the probability of being at least 19 days in the ICU is 50% and between 9 and 11 days is 75%. The estimation of the cumulative hazard function estimated by the maximum likelihood method is shown in Figure 9 and Table 3 shows the events, risk and confidence interval for the probability that a patient is at least t days occupying an ICU bed. The mean length of stay in this unit is 20.40 days, with a standard deviation of 2.32 and a median of 14 days.  Analogous to what happened in the occupancy of beds by patients in the COVID-19, Figures 10 and 11 show a significant difference in occupancy by gender and age interval. As in the previous case, men have longer stays and, on the other hand, patients over 50 years of age show the same behaviour.

Estimated Hospital Occupancy
Based on the data provided on admissions and discharges of patients with COVID-19 by the Andalusian Health Service during the months of March to June (first wave of COVID-19 infections), Kaplan-Meier curves were calculated to analyze the incidence on occupancy in hospitals in Granada. According to (Calabuig et al. 2020), it has been explained that an estimate of hospital occupancy can be made by convolving the KM curve with the estimated admissions function.
We will show how we can estimate the prediction error using validation or test sets. To do this, we use patient admission and discharge data from the COVID-19 between 16 June and 9 September to test the goodness of fit. The survival curve calculated in the period from 11 March to 15 June provides a COVID-19 patient occupancy model. With this nonparametric estimation we can approximate the balance of patient occupancy between 16 June and 9 September from the convolution of a (linear) extrapolation of the admissions during the previous period with the KM curve. This estimate can be compared with the actual COVID-19 patient occupancy data for the same period. Figure 12 shows the calculated simulated balance curve (blue line). On the other hand, the red curve represents the real balance for the analyzed period. It can be seen that, although linear extrapolation is a rather poor model for estimating the number of new patients, the result is quite good, revealing that convolution with the KM curve gives a robust model that provides reasonable results even with naive estimates of the admission functions. To quantify the error of the prediction, we use the root mean square error where y i represents the real balance of occupancy and y i denote the simulated occupancy for each day i ∈ {1, . . . , n}. We obtain a value for the RMSE of 2.4245 patients. The simulated curve is higher than the real one because in the analyzed period the incidence in occupancy was much lower. The knowledge of treatments for the fight against the disease has led a decrease in patient stays, reducing them to 7.88 days. Even so, the deviation obtained by the RMSE is sufficiently small and does not significantly affect the occupancy of a hospital floor. Some researchers have endeavored to compute good estimates of this function using, for example, Gompertz-type curves, as done by (García-Basteiro et al. 2020), or SIR-type models as in (Cooper et al. 2020) or (Varotsos and Krapivin 2020). However, this is, in fact, the weak point of the models, since, as it has been shown during the whole crisis, it is really difficult to model the spread of the virus and, therefore, the number of new infections. Therefore, our model, which provides relatively good estimates with poor estimates of the admission function, could be useful at least for short-term forecasting.
On the basis of admissions to the ICU units of the Virgen de las Nieves hospital, from 16 August to 15 October 2020, we tested the goodness of fit of the model. In order to do it, we first estimated the discharges from this unit through convolution with the Kaplan-Meier curves, that were obtained in the first period using ICU data. Using this estimate, the balance of admissions and discharges or exitus of this unit can be calculated. Thus, a simulated bed occupancy curve for the ICU units of the hospital can be obtained. On the other hand, we have the real data, which correspond to the discharges or exitus in the same period. Figure 13 shows both balances. We can calculate the root mean square error (RMSE) between the actual values y i of ICU occupancy and the estimated values y i . We obtain a value for RMSE of 2.1533 patients. As with non-ICU bed occupancy, we obtain an overestimation of the model. This variation could be partly due to a substantial improvement in survival among SARS-COV2 patients admitted to critical care, (Dennis et al. 2021).

Conclusions
This paper presents a methodology to provide a short-term estimate of hospital bed occupancy. When contagion in a given region is communal, hospitals suffer high stress, generating serious occupancy problems. Predicting patient occupancy allows healthcare managers to update hospital activity in the short term in order to optimize available resources.
The Kaplan-Meier curve of the bed occupancy by COVID-19 patients, together with convolution-type mathematical operators, provide a prediction of the balance of available beds in the following weeks. Additionally, the hazard function estimates the probability that a patient will leave a bed free on day t. Based on the data provided by the Andalusian Health Service on admissions and discharges of patients with COVID-19 in the hospitals of Granada from March to August, Kaplan-Meier curves and virus survival risk curves are calculated. The sample consists of 1176 patients who were admitted between March 11 and August 15. Assuming that no pharmacological solutions and treatments have been found to significantly reduce the severity of the disease, it can be expected that the temporal distribution of bed occupancy by COVID-19 patients will not change significantly in the coming months. In order to test the goodness of fit of the model, we estimate the balances based on the discharges of new patients with COVID-19 between August 16 and October 15 at the Hospital Virgen de las Nieves in the city of Granada.
The comparison between the estimated occupancy balances and the actual occupancy balances gives satisfactory results. The experiment was repeated with the occupancy of patients in the intensive care units, obtaining also satisfactory comparative results. The comparative study between COVID-19 patients in the ICUs and the rest of the COVID-19 patients is necessary, since the descriptive statistics show different results. The most seriously ill patients with this disease require longer hospitalizations and more exhaustive care, and therefore this group and the management of the corresponding service should be studied independently. It should be noted that the estimation of ICUs occupancy is crucial to analyze the activity of a hospital and its elasticity to provide a suitable response in a crisis. In summary, we can affirm that the models presented here and based on survival curves can be useful tools for the management of occupancy in the different units of a hospital, as well as for the management of all human resources and general services.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on request due to restrictions eg privacy. The data presented in this study are available on request from the corresponding author.