# Forecasting Hospital Readmissions with Machine Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. The Data

## 3. Methodology

#### 3.1. Support Vector Machines

#### 3.1.1. Kernel Methods

#### 3.1.2. Over-Fitting

#### 3.1.3. Weights

#### 3.2. Random Forests

#### 3.3. Performance Metrics

#### 3.4. Empirical Results

#### 3.4.1. SVM Models

#### 3.4.2. Random Forest Models

#### 3.4.3. Comparative Results

## 4. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- OECD. Health at a Glance 2019: OECD Indicators; OECD Publishing: Paris, France, 2019. [Google Scholar] [CrossRef]
- Cardiff, K.; Anderson, G.; Sheps, S. Evaluation of a Hospital-Based Utilization Management Program. Healthc. Manag. Forum
**1995**, 8, 38–45. [Google Scholar] [CrossRef] - Ashton, C.M.; Wray, N.P. A conceptual framework for the study of early readmission as an indicator of quality of care. Soc. Sci. Med.
**1996**, 43, 1533–1541. [Google Scholar] [CrossRef] - Zhao, P.; Yoo, I.; Naqvi, S.H. Early Prediction of Unplanned 30-Day Hospital Readmission: Model Development and Retrospective Data Analysis. JMIR Med. Inform.
**2021**, 9, e16306. [Google Scholar] [CrossRef] [PubMed] - Anderson, G.F.; Steinberg, E.P. Predicting hospital readmissions in the Medicare population. Inq. J. Med. Care Organ. Provis. Financ.
**1985**, 22, 251–258. [Google Scholar] - Tabak, Y.P.; Sun, X.; Nunez, C.M.; Gupta, V.; Johannes, R.S. Predicting Readmission at Early Hospitalization Using Electronic Clinical Data. Med. Care
**2017**, 55, 267–275. [Google Scholar] [CrossRef][Green Version] - Kelly, J.F.; McDowell, H.; Crawford, V.; Stout, R.W. Readmissions to a geriatric medical unit: Is prevention possible? Aging Clin. Exp. Res.
**1992**, 4, 61–67. [Google Scholar] [CrossRef] - Jencks, S.F.; Williams, M.V.; Coleman, E.A. Rehospitalizations among Patients in the Medicare Fee-for-Service Program. N. Engl. J. Med.
**2009**, 360, 1418–1428. [Google Scholar] [CrossRef] - Ashton, C.M.; Del Junco, D.J.; Souchek, J.; Wray, N.P.; Mansyur, C.L. The Association between the Quality of Inpatient Care and Early Readmission. Med. Care
**1997**, 35, 1044–1059. [Google Scholar] [CrossRef] - Benbassat, J.; Taragin, M. Hospital Readmissions as a Measure of Quality of Health Care. Arch. Intern. Med.
**2000**, 160, 1074–1081. [Google Scholar] [CrossRef] - Fischer, C.; Lingsma, H.; Marang-van de Mheen, P.J.; Kringos, D.S.; Klazinga, N.S.; Steyerberg, E.W. Is the Readmission Rate a Valid Quality Indicator? A Review of the Evidence. PLoS ONE
**2014**, 9, e112282. [Google Scholar] [CrossRef][Green Version] - Wang, H.; Cui, Z.; Chen, Y.; Avidan, M.; Ben Abdallah, A.; Kronzer, A. Predicting Hospital Readmission via Cost-Sensitive Deep Learning. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2018**, 15, 1968–1978. [Google Scholar] [CrossRef] [PubMed] - Kansagara, D.; Englander, H.; Salanitro, A.; Kagen, D.; Theobald, C.; Freeman, M.; Kripalani, S. Risk Prediction Models for Hospital Readmission. JAMA
**2011**, 306, 1688–1698. [Google Scholar] [CrossRef] [PubMed][Green Version] - Zhou, H.; Della, P.R.; Roberts, P.; Goh, L.; Dhaliwal, S.S. Utility of models to predict 28-day or 30-day unplanned hospital readmissions: An updated systematic review. BMJ Open
**2016**, 6, e011060. [Google Scholar] [CrossRef] [PubMed] - Li, Q.; Yao, X.; Échevin, D. How Good Is Machine Learning in Predicting All-Cause 30-Day Hospital Readmission? Evidence From Administrative Data. Value Health
**2020**, 23, 1307–1315. [Google Scholar] [CrossRef] - Futoma, J.; Morris, J.; Lucas, J. A comparison of models for predicting early hospital readmissions. J. Biomed. Inform.
**2015**, 56, 229–238. [Google Scholar] [CrossRef][Green Version] - Zhou, J.; Li, X.; Wang, X.; Chai, Y.; Zhang, Q. Locally weighted factorization machine with fuzzy partition for elderly readmission prediction. Knowl.-Based Syst.
**2022**, 242, 108326. [Google Scholar] [CrossRef] - Mahmoudi, E.; Kamdar, N.; Kim, N.; Gonzales, G.; Singh, K.; Waljee, A.K. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: Systematic review. BMJ
**2020**, 369, m958. [Google Scholar] [CrossRef][Green Version] - Huang, Y.; Talwar, A.; Chatterjee, S.; Aparasu, R.R. Application of machine learning in predicting hospital readmissions: A scoping review of the literature. BMC Med. Res. Methodol.
**2021**, 21, 96. [Google Scholar] [CrossRef] - Pitoglou, S.; Koumpouros, Y.; Anastasiou, A. Using Electronic Health Records and Machine Learning to Make Medical-Related Predictions from Non-Medical Data. In Proceedings of the 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), Sydney, Australia, 3–7 December 2018; pp. 56–60. [Google Scholar] [CrossRef]
- Du, G.; Zhang, J.; Luo, Z.; Ma, F.; Ma, L.; Li, S. Joint imbalanced classification and feature selection for hospital readmissions. Knowl.-Based Syst.
**2020**, 200, 106020. [Google Scholar] [CrossRef] - Du, G.; Zhang, J.; Ma, F.; Zhao, M.; Lin, Y.; Li, S. Towards graph-based class-imbalance learning for hospital readmission. Expert Syst. Appl.
**2021**, 176, 114791. [Google Scholar] [CrossRef] - Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Breiman, L. Bagging predictors. Mach. Learn.
**1996**, 24, 123–140. [Google Scholar] [CrossRef][Green Version] - Albacete, F.J.V.; Peláez-Moreno, C. 100% Classification Accuracy Considered Harmful: The Normalized Information Transfer Factor Explains the Accuracy Paradox. PLoS ONE
**2014**, 9, e84217. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**Hyperplane selection and support vectors. The pronounced black contour represents the SVs thus defining the margins with the dashed lines. The plain single line describes the separating hyperplane.

**Figure 2.**The non-separable two-class scenario in the input space(

**left**) and the two-dimensional data space in a three-feature space after the projection (

**right**). The two classes are represented by the different colors: blue and red.

**Figure 3.**Overview of a 3-fold Cross Validation training scheme. It shows that each fold is used as a testing sample, while the remaining folds are used for training the model for each parameters’ value combination.

No | Independent Variables | Characterization of Each Variable |
---|---|---|

Panel A: General Information/Patient Data | ||

1 | Patient Age | Quantitative variable, Integer |

2 | Patient Gender | Qualitative variable, Categorical |

3 | Length of Stay | Quantitative variable, Integer |

4 | Patient Transfer | Qualitative variable, Binary |

5 | ICD-10 Diagnosis on Admission | Qualitative variable, Categorical |

6 | ICD-10 Diagnosis at Discharge | Qualitative variable, Categorical |

7 | Admission Clinic | Qualitative variable, Categorical |

8 | Discharge Clinic | Qualitative variable, Categorical |

9 | Clinic Change | Qualitative variable, Binary |

10 | Hospitalization Outcome | Qualitative variable, Categorical |

11 | Past Hospitalization | Qualitative variable, Binary |

Panel B: Operational Status of the Clinic | ||

12 | Clinic’s Occupancy Rate | Quantitative variable, Continuous |

13 | Clinic’s Number of Doctors | Quantitative variable, Integer |

14 | Clinic’s Number of Nurses | Quantitative variable, Integer |

Panel C: Laboratory results | ||

15 | Blood Sugar (Glucose) | Quantitative variable, Continuous |

16 | Indication (Normal Range) Blood Sugar | Qualitative variable, Categorical |

17 | Potassium | Quantitative variable, Continuous |

18 | Indication (Normal Range) Potassium | Qualitative variable, Categorical |

19 | Sodium | Quantitative variable, Continuous |

20 | Indication (Normal Range) Sodium | Qualitative variable, Categorical |

21 | Blood Urea Nitrogen | Quantitative variable, Continuous |

22 | Indication Blood Urea (Normal Range) Nitrogen | Qualitative variable, Categorical |

23 | Blood Creatinine | Quantitative variable, Continuous |

24 | Indication (Normal Range) Blood Creatinine | Qualitative variable, Categorical |

**Table 2.**Classification Results using a confusion matrix. True positives (TP)—number of samples correctly classified as readmissions. True negatives (TN)—number of samples correctly classified as non-readmissions. False positives (FP)—number of samples incorrectly classified as readmissions. False negatives (FN)— number of samples incorrectly classified as non-readmissions.

Predicted | |||
---|---|---|---|

0 | 1 | ||

Actual | 0 | TN (True Negatives) | FP (False Positives) |

1 | FN (False Negatives) | TP (True Positives) |

Parameter C | Parameter γ | |
---|---|---|

SVM Linear Kernel | 0.06 | --- |

SVM RBF Kernel | 194.38 | 0.0001 |

Confusion Matrix (SVM, Linear Kernel) | |||
---|---|---|---|

Predicted | |||

0 | 1 | ||

Actual | 0 | TN 732 | FP 220 |

1 | FN 68 | TP 98 |

Confusion Matrix (SVM, RBF Kernel) | |||
---|---|---|---|

Predicted | |||

0 | 1 | ||

Actual | 0 | TN 730 | FP 222 |

1 | FN 67 | TP 99 |

SVM Linear Kernel | ||||

Recall | Accuracy | Precision | F1-Score | AUC |

0.59 | 0.74 | 0.31 | 0.40 | 0.77 |

SVM RBF Kernel | ||||

Recall | Accuracy | Precision | F1-Score | AUC |

0.60 | 0.74 | 0.31 | 0.41 | 0.76 |

Total Number of Decision Trees | |
---|---|

Weighted Random Forest | 25 |

Balanced Random Forest | 730 |

Confusion Matrix (Weighted Random Forest) | |||
---|---|---|---|

Predicted | |||

0 | 1 | ||

Actual | 0 | TN 942 | FP 10 |

1 | FN 125 | TP 41 |

Confusion Matrix (Balanced Random Forest) | |||
---|---|---|---|

Predicted | |||

0 | 1 | ||

Actual | 0 | TN 704 | FP 248 |

1 | FN 50 | TP 116 |

Weighted Random Forest | |||||

Recall | Specificity | Accuracy | Precision | F1-Score | AUC |

0.25 | 0.98 | 0.88 | 0.80 | 0.38 | 0.74 |

Balanced Random Forest | |||||

Recall | Specificity | Accuracy | Precision | F1-Score | AUC |

0.70 | 0.74 | 0.73 | 0.32 | 0.44 | 0.78 |

**Table 11.**Feature importance ranking, the significance of each feature in the classification of the random forest model in decreasing order.

Importance | Feature |
---|---|

0.141501 | ICD-10 Diagnosis at Discharge |

0.129996 | ICD-10 Diagnosis on Admission |

0.059492 | Clinic’s Occupancy Rate |

0.056195 | Hospitalization Outcome |

0.05464 | Blood Urea Nitrogen |

0.054075 | Patient Age |

0.052316 | Potassium |

0.051854 | Blood Sugar (Glucose) |

0.048263 | Length of Stay |

0.043971 | Blood Creatinine |

0.042001 | Sodium |

0.030861 | Discharge Clinic |

0.03023 | Clinic’s Number of Doctors |

0.029398 | Clinic’s Number of Nurses |

0.024448 | Indication (Normal Range) Blood Sugar |

0.020731 | Admission Clinic |

0.020721 | Patient Gender |

0.020681 | Past Hospitalization |

0.019997 | Indication (Normal Range) Blood Creatinine |

0.019523 | Indication (Normal Range) Potassium |

0.016082 | Indication Blood Urea (Normal Range) Nitrogen |

0.01222 | Patient Transfer |

0.01015 | Indication (Normal Range) Sodium |

0.00112 | Clinic Change |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Michailidis, P.; Dimitriadou, A.; Papadimitriou, T.; Gogas, P.
Forecasting Hospital Readmissions with Machine Learning. *Healthcare* **2022**, *10*, 981.
https://doi.org/10.3390/healthcare10060981

**AMA Style**

Michailidis P, Dimitriadou A, Papadimitriou T, Gogas P.
Forecasting Hospital Readmissions with Machine Learning. *Healthcare*. 2022; 10(6):981.
https://doi.org/10.3390/healthcare10060981

**Chicago/Turabian Style**

Michailidis, Panagiotis, Athanasia Dimitriadou, Theophilos Papadimitriou, and Periklis Gogas.
2022. "Forecasting Hospital Readmissions with Machine Learning" *Healthcare* 10, no. 6: 981.
https://doi.org/10.3390/healthcare10060981