Is It Possible to Predict the Length of Stay of Patients Undergoing Hip-Replacement Surgery?

The proximal fracture of the femur and hip is the most common reason for hospitalization in orthopedic departments. In Italy, 115,989 hip-replacement surgeries were performed in 2019, showing the economic relevance of studying this type of procedure. This study analyzed the data relating to patients who underwent hip-replacement surgery in the years 2010–2020 at the “San Giovanni di Dio e Ruggi d’Aragona” University Hospital of Salerno. The multiple linear regression (MLR) model and regression and classification algorithms were implemented in order to predict the total length of stay (LOS). Lastly, using a statistical analysis, the impact of COVID-19 was evaluated. The results obtained from the regression analysis showed that the best model was MLR, with an R2 value of 0.616, compared with XGBoost, Gradient-Boosted Tree, and Random Forest, with R2 values of 0.552, 0.543, and 0.448, respectively. The t-test showed that the variables that most influenced the LOS, with the exception of pre-operative LOS, were gender, age, anemia, fracture/dislocation, and urinary disorders. Among the classification algorithms, the best result was obtained with Random Forest, with a sensitivity of the longest LOS of over 89%. In terms of the overall accuracy, Random Forest and Gradient-Boosted Tree achieved a value of 71.76% and an error of 28.24%, followed by Decision Tree, with an accuracy of 71.13% and an error of 28.87%, and, finally, Support Vector Machine, with an accuracy of 65.06% and an error of 34.94%. A significant difference in cardiovascular disease, fracture/dislocation, and post-operative LOS variables was shown by the chi-squared test and Mann–Whitney test in the comparison between 2019 (before COVID-19) and 2020 (in full pandemic emergency conditions).


Introduction
The proximal fracture of the femur and hip is the most common reason for hospitalization in orthopedic departments. Hip fractures put patients at risk of cardiovascular, pulmonary, thrombotic, infectious, and bleeding complications that can lead to death [1]. The only strategy to prevent immediate negative outcomes is to proceed in a timely manner with surgery. Despite the procedure, however, patients experience increased mortality, health complications, and reduced quality of life [2][3][4].
Although hip fractures account for less than 20% of all osteoporosis-associated fractures [5], considered second only to cardiovascular disease by the World Health Organization [6], they are often used as an indicator of the health of the population and to evaluate the economic impact of this condition. In fact, they account for the majority of morbidity-related and mortality-related health expenditure in men and women over the age of 50 [7,8]. Specifically, globally, 1.3 million fractures were reported in the year 1990, and this figure is estimated to reach 7-21 million by 2050, with an associated expenditure Several studies use advanced data processing in order to support doctors in the prevention, diagnosis, and treatment of diseases [15][16][17][18][19][20][21] or the management of hospital resources [22][23][24][25][26]. In the orthopedic field, many articles study the performance associated with the flow of patients who are admitted for fractures of the lower limbs. For example, Lefaivre et al. determined the effect of delayed surgery on discharge times, in-hospital death, the presence of major and minor medical complications, and the incidence of sores in hip fracture patients. Bracy et al. [27], on the other hand, showed how the institution of orthopedic-hospitalist comanagement (OHC) improves the efficiency of hip-fracture management, as measured by inpatient LOS and time to surgery [28]. Fisher et al. have shown how early mobilization helps reduce the total LOS [29]. With the aim of reducing the total LOS, Fast Tracks were born, a combination of clinical and organizational factors optimized to reduce convalescence and perioperative morbidity, including functional recovery with a consequent reduction in hospitalizations. Husted et al. highlighted the benefits of orthopedic Fast Track in Denmark [30].
Furthermore, in Italy, several studies were conducted to investigate the epidemiology of the problem [31,32] and the choice of prostheses [33], and to improve the process. Scala et al. analyzed how with a Lean Six Sigma approach, a reduction in the total LOS of 39% is achieved for patients admitted with fractures of the femur [34]. Latessa et al. instead used the same methodology to implement Fast Track, with a statistically significant reduction of 12.7% in the LOS [35]. Although there are studies at national and international level that use predictive algorithms for the study of the total LOS [36][37][38][39], there are no other studies in the literature that analyze hip fractures in a large number of patients, including multiple clinical variables and the impact of COVID-19. The hypothesis of this paper is that particular clinical conditions or patient demographics may have a significant impact on LOS and on which healthcare management needs to focus more, to achieve benefits including cost containment considerations. In addition, the COVID-19 pandemic, with all the protocols put in place, may have further affected the process under consideration.

Materials and Methods
This study analyzed the data relating to patients who underwent hip-replacement surgery in the years 2010-2020 at the "San Giovanni di Dio e Ruggi d'Aragona" University Hospital of Salerno (Italy). Specifically, all patients who had hip surgery as their primary procedure were selected, with the following ICD-9 codes: Our data, provided by the Hospital's Health Department, are completely anonymous, and no personal information is linked or linkable to a specific person. The output is the total LOS in days obtained as the difference between the date of discharge and date of admission. All clinical variables were obtained by analyzing the main and secondary diagnoses reported in the discharge form. Therefore, without a detailed characterization of the clinical picture of each patient, the variables simply indicate the presence (1 Yes) or absence (0 No) of conditions related to that comorbidity. The variable Fracture/Luxation makes it possible to differentiate the proportion of elderly patients who underwent elective surgery from those who suffered a traumatic event. Figure 1 shows the distribution of all the variables in the dataset.

Regression and Classification Models
The 15 variables defined above (i.e., gender, age, pre-operative LOS, diabetes, hypertension, obesity, anemia, vitamin D deficiency, tumor, fracture/dislocation, brain disorders, urinary disorders, cardiovascular disease, respiratory disease, and anticoagulant therapy) were used as inputs for the study of total LOS, i.e., the output. The first processing involved the implementation of the MLR model. To this end, IBM SPSS Statistics Version 26.0 software (IBM Corp., Armonk, NY, USA) was used. This software was also used to verify all the preliminary hypotheses on residuals, autocorrelation, the presence of outliers, and the multicollinearity. After this first processing, further regressive algorithms were used, i.e., Random Forest RF, Gradient-Boosted Tree GBT, XGBoost, and Linear Regression LR. RF is a supervised-learning algorithm in which multiple learning algorithms are combined to improve performance. Although it can produce an overfitting, the resulting model is accurate and powerful. GBT is a non-parametric statistical learning algorithm used for both classification and regression problems. As RF, the decision model produced is a set of simple forecasting models, typically decision trees, which are progressively added to each step to improve the result obtained by the previous Weak Learner. The Decision Tree (DT) is a tree-like decision model where the target value is predicted by simple decision rules identified from the data. DTs are simple to understand and require little data preparation, but its disadvantages include overfitting and the creation of biased trees if some classes dominate. XGBoost algorithm is a gradient-boosting algorithm, built through the progressive addition of decision trees in order to improve the performance of the previous tree. In addition, models are fitted using any arbitrary differentiable loss function and gradient descent optimization algorithm. This gives the technique its name, "gradient boosting", since the loss gradient is minimized as the model is fit, in a similar manner to a neural network. LR is a model that assumes a linear relationship between output and input. Different techniques can be used to prepare or train the linear regression equation from data, the most common of which is called Ordinary Least Squares. Learning, in this case, means estimating the value to be attributed to the coefficients, starting from the available data. Next, the classification algorithms, i.e., Random Forest (RF), Decision Tree (DT), Gradient-Boosted Tree (GBT), and Support Vector Machine (SVM) were implemented. SVM algorithm finds a hyperplane in an N-dimensional space (N-the number of features) that has a maximum margin, i.e., the maximum distance between data points of both classes. To this end, a loss function is used. SVM is effective in high dimensional spaces but it does not directly provide probability estimates. The other algorithms are defined above. This second part was developed with Knime Analytics Platform. For all algorithms, the dataset was broken down into training set and test set, at 80% and 20%, respectively.

Regression and Classification Models
The 15 variables defined above (i.e., gender, age, pre-operative LOS, diabetes, hypertension, obesity, anemia, vitamin D deficiency, tumor, fracture/dislocation, brain disorders, urinary disorders, cardiovascular disease, respiratory disease, and anticoagulant therapy) were used as inputs for the study of total LOS, i.e., the output. The first processing involved the implementation of the MLR model. To this end, IBM SPSS Statistics Version 26.0 software (IBM Corp., Armonk, NY, USA) was used. This software was also used to verify all the preliminary hypotheses on residuals, autocorrelation, the presence of outliers, and the multicollinearity. After this first processing, further regressive algorithms were used, i.e., Random Forest RF, Gradient-Boosted Tree GBT, XGBoost, and Linear Regression LR. RF is a supervised-learning algorithm in which multiple learning algorithms are combined to improve performance. Although it can produce an overfitting, the resulting model is accurate and powerful. GBT is a non-parametric statistical learning algorithm used for both classification and regression problems. As RF, the decision model produced is a set of simple forecasting models, typically decision trees, which are progressively added to each step to improve the result obtained by the previous Weak Learner. The Decision Tree (DT) is a tree-like decision model where the target value is predicted by simple decision rules identified from the data. DTs are simple to understand and require little data preparation, but its disadvantages include overfitting and the creation of biased trees if some classes dominate. XGBoost algorithm is a gradient-boosting algorithm, built through the progressive addition of decision trees in order to improve the performance of the previous tree. In addition, models are fitted using any arbitrary differentiable loss function and gradient descent optimization algorithm. This gives the technique its name, "gradient boosting", since the loss gradient is minimized as the model is fit, in a similar manner to a neural network. LR is a model that assumes a linear relationship between output and input. Different techniques can be used to prepare or train the linear regression equation from data, the most common of which is called Ordinary Least Squares. Learning, in this case, means estimating the value to be attributed to the coefficients, starting from the available data. Next, the classification algorithms, i.e., Random Forest (RF), Decision Tree (DT), Gradient-Boosted Tree (GBT), and Support Vector Machine (SVM) were implemented. SVM algorithm finds a hyperplane in an N-dimensional space (N-the number of features) that has a maximum margin, i.e., the maximum distance between data points of both classes. To this end, a loss function is used. SVM is effective in high dimensional spaces but it does not directly provide probability estimates. The other algorithms are defined above. This second part was developed with Knime Analytics Platform. For all algorithms, the dataset was broken down into training set and test set, at 80% and 20%, respectively.

Statistical Analysis
To analyze the impact of COVID-19 on the sample under examination, two sub-groups were extracted: • Group 1: Patients discharged in 2019 and, therefore, before COVID-19. • Group 2: Patients discharged in 2020 in full pandemic.
Statistical tests were implemented to identify any differences in the two groups. Before proceeding with the selection of the statistical tests, the Kolmogorov-Smirnov test was performed which showed the non-normality of the two distributions. For this reason, the Mann-Whitney U (MW) and chi-squared test with a 95% confidence interval were used.

Results
Preliminary to the elaboration, the hypotheses underlying the implementation of the MLR model were verified. The Durbin-Watson test had an output of 1.934. The test always has a value ranging between 0 and 4. A value of 2.0 indicated that there was no autocorrelation detected in the sample. Continuing with the analysis of the residuals, from the graph showing "standardized expected value regression" on the x-axis against "standardized residual regression", shown in Figure 2, a random distribution around zero was observed, which supported the hypothesis of homoscedasticity. The residuals therefore had a constant variance.

Statistical Analysis
To analyze the impact of COVID-19 on the sample under examination, two subgroups were extracted: • Group 1: Patients discharged in 2019 and, therefore, before COVID-19. • Group 2: Patients discharged in 2020 in full pandemic.
Statistical tests were implemented to identify any differences in the two groups. Before proceeding with the selection of the statistical tests, the Kolmogorov-Smirnov test was performed which showed the non-normality of the two distributions. For this reason, the Mann-Whitney U (MW) and chi-squared test with a 95% confidence interval were used.

Results
Preliminary to the elaboration, the hypotheses underlying the implementation of the MLR model were verified. The Durbin-Watson test had an output of 1.934. The test always has a value ranging between 0 and 4. A value of 2.0 indicated that there was no autocorrelation detected in the sample. Continuing with the analysis of the residuals, from the graph showing "standardized expected value regression" on the x-axis against "standardized residual regression", shown in Figure 2, a random distribution around zero was observed, which supported the hypothesis of homoscedasticity. The residuals therefore had a constant variance. Concluding the residual analysis, the Quartile-Quartile plot (Q-Q plot) presented in Figure 3 was used to evaluate the distribution trend. If the two sets came from a population with the same distribution, the points were expected to fall approximately along this reference line. The greater the departure from this reference line, the greater the evidence for the conclusion that the two data sets came from populations with different distributions. Concluding the residual analysis, the Quartile-Quartile plot (Q-Q plot) presented in Figure 3 was used to evaluate the distribution trend. If the two sets came from a population with the same distribution, the points were expected to fall approximately along this reference line. The greater the departure from this reference line, the greater the evidence for the conclusion that the two data sets came from populations with different distributions. Although the curve did not exactly retrace the ideal line, the slight variation affect the good performance of the model. Before implementing the model, the absence of multicollinearity was tested u Pearson correlation and the tolerance and variance inflation factor (VIF), while t ence of outliers was determined through the calculation of Cook's distance.   Although the curve did not exactly retrace the ideal line, the slight variation did not affect the good performance of the model. Before implementing the model, the absence of multicollinearity was tested using the Pearson correlation and the tolerance and variance inflation factor (VIF), while the presence of outliers was determined through the calculation of Cook's distance. Table 1 shows the results of the Pearson correlation.
The results of the Pearson correlation showed that the LOS had the highest correlation with the pre-operative LOS, included by definition in LOS, while for the other variables, the correlation was always lower than 0.7.
For the tolerance and VIF, the former always assumed a value greater than 0.2, while the latter was always less than 10, suggesting the absence of multicollinearity. Lastly, Cook's distance was always less than 1.
Having verified the hypotheses, the MLR model was implemented. Table 2 shows an R 2 value just above the 0.5 threshold, showing that it was quite representative of the specific case study. Table 3 shows the details of the coefficients and the t-test applied to the variables with a significance of 95%.
The results of the t-test highlighted that gender, age, pre-operative LOS, anemia, fracture/dislocation, and urinary disorders were significantly correlated with the total LOS. Standardized coefficients help to compare the effect of each individual independent variable to the dependent variable. In this case, assuming the value 0 when comorbidities were absent, a patient with anemia conditioned the dependent variable more by having the highest beta coefficient associated with it, if the pre-operative LOS was excluded. In addition, according to the beta column, women (gender: 1 male/2 female) with advanced age, as this was a continuous variable, significantly influenced the dependent variable of the model.  In addition to the MLR model, further regression algorithms were tested. Table 4 shows the results obtained in terms of R 2 and root mean squared error. Among the algorithms, XGBoost and LR had the best performance, with an R 2 value of 0.552, followed by GBT, with 0.543, and, finally, RF, with 0.448. However, even the best value of R 2 , obtained with XGBoost/LR, did not improve the performance of the MLR model. The results obtained with the best algorithms used are shown in graphic form in Figures 4 and 5.
After the regression models, four different classification algorithms were tested. For implementation, the LOS was divided into three categories, as indicated below: 1.
LOS > 12 days. Among the algorithms, XGBoost and LR had the best performance, with an R 2 value of 0.552, followed by GBT, with 0.543, and, finally, RF, with 0.448. However, even the best value of R 2 , obtained with XGBoost/LR, did not improve the performance of the MLR model. The results obtained with the best algorithms used are shown in graphic form in   After the regression models, four different classification algorithms were tested. For implementation, the LOS was divided into three categories, as indicated below: 1. LOS ≤ 6 days. 2. 6 days < LOS ≤ 12 days.  With an accuracy of 71.76% and an error of 28.24%, RF and GBT had the best performance, followed by DT, with an accuracy of 71.13% and an error of 28.87%, and, finally, SVM, with an accuracy of 65.06% and an error of 34.94%. For all the algorithms, optimal results were not achieved in all three categories. The results, however, showed a high ability to predict longer LOS, which weigh heavily on healthcare costs. The details of the classification for the best algorithm are shown in Table 6. To analyze the global feature importance, a Global Surrogate Random Forest was used. Global Surrogate Random Forest is a Random Forest model trained to approximate the predictions of already implemented RF models. Random Forest is trained on standard pre-processed input data with optimized parameters "tree depth", "number of models," and "minimum child node size". The surrogate model was trained successfully. Specifically, focusing on class 3, that is, the one to which the longest stay corresponded, which was the one that was of greatest relevance to health management, the model returned an accuracy of 0.942, and the overall significance characteristic shown in Figure 6.
Among the variables that most affected the model from class 3, in accordance with the specific procedure analyzed, excluding the pre-operative LOS, were age, fracture/dislocation and vitamin D deficiency. Gender, anemia, and urinary disorders, which in the MLR model were significantly related to total hospitalization, in this case, had a non-significant impact and were included in the variable, other.
Lastly, the impact of COVID-19 on the model parameters was analyzed. Specifically, the pre-COVID-19 (year 2019) and during-COVID-19 (year 2020) data were compared using statistical analysis. The results are reported in Table 7. used. Global Surrogate Random Forest is a Random Forest model trained to approximate the predictions of already implemented RF models. Random Forest is trained on standard pre-processed input data with optimized parameters "tree depth", "number of models," and "minimum child node size". The surrogate model was trained successfully. Specifically, focusing on class 3, that is, the one to which the longest stay corresponded, which was the one that was of greatest relevance to health management, the model returned an accuracy of 0.942, and the overall significance characteristic shown in Figure 6. Among the variables that most affected the model from class 3, in accordance with the specific procedure analyzed, excluding the pre-operative LOS, were age, fracture/dislocation and vitamin D deficiency. Gender, anemia, and urinary disorders, which in the MLR model were significantly related to total hospitalization, in this case, had a non-significant impact and were included in the variable, other.
Lastly, the impact of COVID-19 on the model parameters was analyzed. Specifically, the pre-COVID-19 (year 2019) and during-COVID-19 (year 2020) data were compared using statistical analysis. The results are reported in Table 7.  The statistical tests highlighted a significant difference in cardiovascular disease, fracture/dislocation, and post-operative LOS.

Discussion
In this study, a set of variables was analyzed in order to be able to predict the LOS for hip-replacement surgery. The analysis was conducted at "San Giovanni di Dio e Ruggi d'Aragona" University Hospital of Salerno (Italy), analyzing the data recorded from 2010 to 2020.

Results of Regression and Classification Models
This work is an extension of a previous work, published by the same research group, in which MLR and ML algorithms were used to investigate the LOS only for the years 2019-2020 [14]. Using this previous article as a reference, the same tools were used in this study. The results obtained for the regression models showed that the best was MLR, with an R 2 value of 0.616, which was slightly lower than the previous result, of 0.687. The model was therefore quite representative of the case study in which it was implemented. The statistical test instead showed that the variables that most influence the model, with the exception of the pre-operative LOS, which by, definition depends on it, were gender, age, anemia, fracture/dislocation, and urinary disorders. This result was in line with those previously reported in the literature. For example, Ricci et al. [40] and Latessa et al. [35] highlighted a different LOS according to gender, while Scala et al. [34] showed an influence of cardiovascular diseases. Husted et al. [41], on the other hand, showed that age, sex, comorbidity, and pre-and post-operative hemoglobin levels influence post-operative outcomes in general, including LOS and patient satisfaction, while Calgue et al. [42] showed that significant effects are also due to the type of fracture.
The classification models did not show significant results for the three categories envisaged by the work. With an accuracy of 71.76% and an error of 28.24%, RF and GBT had the best performance, which did not reach the accuracy of over 83% obtained by GBT in [14]. Although the model as a whole could not be validated, the confusion matrix showed the high capacity of the model in predicting cases with LOS greater than 14 days. This is strategically important for healthcare facilities, as these are the cases that have the greatest impact on resource consumption and healthcare costs.

COVID-19's Impact
The impact of the SARS-CoV-2 pandemic on the sample was analyzed. Comparing the same variables for the year 2019 (pre-COVID- 19) and the year 2020 (during COVID-19), the statistical tests highlighted a significant difference in terms of cardiovascular disease, fracture/dislocation, and post-operative LOS. In particular, there was an increase in the number of patients undergoing surgery with cardiovascular comorbidities or a diagnosis of fracture/dislocation. This, unlike the results reported in the literature [35,41,42], did not cause an increase in postoperative LOS, which actually decreased. This phenomenon can be explained both by the protocols put in place to contain the pandemic and limit the time spent in hospital and by the reduced number of beds, which were mostly dedicated to COVID patients.

Uniqueness of the Present Study, Clinical Implications, and Limitations
The strength of the work is that it considers a large number of data and variables that help to further characterize the sample, also including the changes caused by the pandemic. The ability to understand which variables have the greatest impact on the LOS can help healthcare managers to allocate resources or implement specific pathways, such as fast tracks [30], for privileged access to treatment and the elimination of inefficiencies.
However, this work is not without limitations. In particular, the effect that multiple procedures have on LOS is not considered, and the results cannot be generalized, since this is a single-center study. In addition, variables that could be used to analyze the socioeconomic status of the patients were not included, and the data source, hospital discharge records, did not allow the precise characterization of the degree of severity of the comorbidities studied.

Conclusions
In this study, the data of 2515 patients undergoing hip-replacement surgery at "San Giovanni di Dio e Ruggi d'Aragona" University Hospital of Salerno (Italy) in the years 2010-2020 were processed using regression and classification models. Both elaborations showed that the variables that most influenced the LOS were age and the presence of fracture/dislocation. These results, together with the good performance of the models, could be used by healthcare managers to create specific pathways, according to the age or the main diagnoses that lead to interventions. This can help both bed management, through LOS prediction and turnover planning, but also all other hospital resources. The analysis of the impact of COVID-19, therefore, could be an important pointer to capture the inadvertent positive effects of the pandemic from an organizational perspective, such as the establishment of specific protocols that led to the effective and efficient use of hospital facilities.
Future developments will include the implementation of additional data processing and classification techniques, focusing in more detail on patients' pathways and how they have changed due to the pandemic. Furthermore, additional variables will be included in the models in addition to the specific characterization of those already provided. Data Availability Statement: The datasets generated and/or analyzed during the current study are not publicly available for privacy reasons, but they are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.