Machine Learning for Evaluating Hospital Mobility: An Italian Case Study

Santamato, Vito; Tricase, Caterina; Faccilongo, Nicola; Iacoviello, Massimo; Pange, Jenny; Marengo, Agostino

doi:10.3390/app14146016

Open AccessArticle

Machine Learning for Evaluating Hospital Mobility: An Italian Case Study

by

Vito Santamato

¹

,

Caterina Tricase

²

,

Nicola Faccilongo

²,

Massimo Iacoviello

³

,

Jenny Pange

⁴

and

Agostino Marengo

^5,*

¹

Department of Clinical and Experimental Medicine, University of Foggia, Viale Luigi Pinto, 71122 Foggia, Italy

²

Department of Economics, University of Foggia, Via Romolo Caggese 1, 71121 Foggia, Italy

³

Department of Surgical and Medical Sciences, University of Foggia, Viale Luigi Pinto, 71122 Foggia, Italy

⁴

Laboratory of New Technologies and Distance Learning, Department of Early Childhood Education, School of Education, University of Ioannina, Panepistimioupoli, 45110 Ioannina, Greece

⁵

Department of Agricultural Sciences, Food, Natural Resources, and Engineering, University of Foggia, Via Napoli 25, 71121 Foggia, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 6016; https://doi.org/10.3390/app14146016

Submission received: 23 April 2024 / Revised: 5 June 2024 / Accepted: 5 July 2024 / Published: 10 July 2024

Download

Browse Figures

Versions Notes

Abstract

This study delves into hospital mobility within the Italian regions of Apulia and Emilia-Romagna, interpreting it as an indicator of perceived service quality. Utilizing logistic regression alongside other machine learning techniques, we analyze the impact of structural, operational, and clinical variables on patient perceptions of quality, thus influencing their healthcare choices. The analysis of mobility trends has uncovered significant regional differences, emphasizing how the regional context shapes perceived service quality. To further enhance the analysis, SHAP (SHapley Additive exPlanations) values have been integrated into the logistic regression model. These values quantify the specific contributions of each variable to the perceived quality of service, significantly improving the interpretability and fairness of evaluations. A methodological innovation of this study is the use of these SHAP impact scores as weights in the data envelopment analysis (DEA), facilitating a comparative efficiency analysis of healthcare facilities that is both weighted and normative. The combination of logistic regression and SHAP-weighted DEA provides a deeper understanding of perceived quality dynamics and offers essential insights for optimizing the distribution of healthcare resources. This approach underscores the importance of data-driven strategies to develop more equitable, efficient, and patient-centered healthcare systems. This research significantly contributes to the understanding of perceived quality dynamics within the healthcare context and promotes further investigations to enhance service accessibility and quality, leveraging machine learning as a tool to increase the efficiency of healthcare services across diverse regional settings. These findings are pivotal for policymakers and healthcare system managers aiming to reduce regional disparities and promote a more responsive and personalized healthcare service.

Keywords:

hospital mobility; machine learning; healthcare quality assessment

1. Introduction

Italy’s National Health Service is steadfast in providing high-quality medical care to all citizens. However, the regionalized health structure introduces performance disparities across regions, prompting experts to scrutinize and benchmark regional health systems to discern models of excellence, areas needing improvement, and best practices [1].

Our analysis zeroes in on the hospital systems of Apulia and Emilia-Romagna, chosen for their distinct characteristics. Emilia-Romagna, as per the GIMBE Report 2023, is a frontrunner in delivering Essential Levels of Assistance (LEA), showcasing its prowess in providing essential services. This region’s success is partly attributed to the Guarantee System, ensuring quality, appropriateness, and uniformity in health service delivery, a domain where Emilia-Romagna shines (GIMBE Foundation, http://www.gimbe.org/ (accessed on 20 March 2024)).

Conversely, Apulia, with its unique history, demography, and geography, demonstrates remarkable adaptability and innovation in healthcare, tackling specific challenges head-on. The comparative study between Emilia-Romagna and Apulia aims not to rank their services but to reveal how differing contexts, resources, and strategies influence healthcare service efficiency and effectiveness. Emilia-Romagna leverages its extensive experience to offer proven solutions, whereas Apulia introduces innovative approaches to address unique challenges, potentially applicable in other settings.

A pivotal aspect of our research is evaluating hospital efficiency through the lens of resident patient quality perception, considering Apulia and Emilia-Romagna as a unified territory for health mobility analysis. This methodology enables a nuanced assessment of healthcare service efficiency and quality, factoring in patient experiences and preferences in healthcare facility selection.

Patient perceptions of hospital service quality are vital for elevating health standards. An analysis of Facebook reviews using machine learning techniques has established a significant correlation between hospital accreditation and online emotional expressions, emphasizing the importance of valuing patient feedback [2]. Additionally, the deployment of a specific conceptual framework has equipped hospital administrators with a robust tool for scrutinizing and enhancing service quality across various hospital settings [3]. These strategies highlight the critical need to integrate patient perspectives in healthcare service optimization, ensuring state-of-the-art, patient-centric care. The analysis of the territorial distribution of healthcare facilities and accessibility offers an interesting perspective on the importance of integrating the healthcare dimension into territorial planning to address inequalities in service access [4]. Furthermore, the study on healthcare inequalities and disparities in disadvantaged areas gathers evidence and field experiences, providing recommendations to tackle inequities and improve access to care [5], thus enriching our understanding of regional healthcare dynamics in Italy with examples and lessons from international contexts.

Hospital efficiency is a cornerstone for ensuring quality healthcare in Italy. The Hub & Spoke network model, transient efficiency strategies, prudent management of hospital and intensive care beds, and the adoption of a prospective payment system are instrumental in promoting hospital efficiency, contributing to a more agile and sustainable funding model [6,7,8,9]. These initiatives underscore Italy’s commitment to a comprehensive health system renewal, aiming to bolster efficiency and better meet the needs of all citizens.

The examination of perceived quality underscores the significance of considering patient hospitalization propensity as an insightful indicator. This propensity, reflecting on hospitalization outcomes and patient mobility, directly impacts the perceived quality of hospital services. Patient mobility, indicative of the quest for quality care, influences perceptions of hospital service quality through patient transfers across health facilities, both within and beyond regional borders.

Furthermore, the endeavor to enhance hospital service quality incorporates complex nuances, including patient mobility, proposing a holistic framework for the continuous improvement of hospital service quality [10]. This comprehensive approach provides a clearer, more in-depth view of perceived quality within the hospital systems of Apulia and Emilia-Romagna.

In an era marked by global interconnectedness and unparalleled knowledge exchange, understanding the variances and commonalities between different regional hospital systems is imperative. This understanding is crucial not only for national improvement but also for offering valuable insights to other countries, especially in navigating health crises like the COVID-19 pandemic.

This comparative analysis between Apulia and Emilia-Romagna transcends mere performance evaluation, offering a holistic perspective on the dynamics, processes, and strategies that can elevate care quality and citizen health. Through critical analysis and mutual learning, the aim is to forge a health system that is equitable, resilient, and efficient, catering to the needs of all citizens.

The text is structured as follows: it begins by defining the problem and outlining the goals of the research, then moves on to describe the methodological background and the practical example to which this methodology was applied. This is followed by a chapter dedicated to the details of the investigation carried out, highlighting the most significant findings, which are subsequently critically analyzed in another chapter. This latter section also includes a reflection on the originality, potential, and constraints of the suggested approach. The conclusions provide insights into the implications of using the methodology for decision support at various levels and propose directions for future research.

2. Background

Within the varied landscape of the Italian healthcare system, regional differences outline a unique context for analyzing the dynamics of access to care. Hospital networks, with their territorial specificities, offer a lens through which to explore how healthcare facilities meet the needs of a diverse population. In this scenario, our study focuses on investigating patient mobility, a crucial aspect that reflects individual choices in accessing hospital care. Recent research has confirmed that the endowment of hospital beds is a determining factor in patient mobility, especially in decentralized healthcare contexts, suggesting a direct area of interest for our work [11].

By adopting a predictive model based on logistic regression, we aim to uncover the factors that guide patients in their choice of hospital, paying particular attention to organizational, outcome, and structural variables. This advanced methodological approach, situated at the intersection of data science and healthcare research, allows for an in-depth analysis of expected mobility towards hospitals, offering valuable insights to optimize the distribution of healthcare resources and improve accessibility and quality of care.

Recent related research confirms the significant impact of mobility limitations on health outcomes among the elderly. One study highlights that severe mobility restrictions are associated with an increase in the use of healthcare services, with marked disparities observed across different regional hospital networks [12]. Further analyses indicate that mobility limitations are also correlated with cognitive deteriorations and structural brain changes, suggesting that reduced mobility may serve as an early indicator of cognitive decline [13]. In Italy, healthcare mobility for surgical interventions is strongly influenced by travel times, highlighting the importance of geographic location and the capacity of hospital facilities [14]. Similarly, in the United Kingdom, hospital choice for breast cancer surgery varies based on age, health conditions, and patient location, with a preference for specialized hospitals beyond immediate clinical needs [15]. Patients tend to select hospitals that offer robotic surgery and other advanced technologies, showing that choices are more influenced by the perception of overall quality and technological innovation than by the specific outcomes of cancer treatment [16]. The analysis of interregional pediatric healthcare mobility has highlighted deficiencies in pediatric services in some Italian regions, indicating the need for a strengthening of healthcare offerings [17].

The interplay between energy management and hospital performance points to how energy efficiency represents a critical factor for the overall efficiency of a hospital. Incorporating machine-learning-based energy monitoring methods can significantly contribute to understanding and improving hospital energy demands, directly impacting operational efficiency [18]. Our work is set within a research context that extends previous investigations examining the interaction between energy management and hospital performance, highlighting the importance of improved management practices [19]. Our study broadens the research horizon by exploring patient mobility as a key indicator of hospital efficiency and the perceived quality of healthcare services [20].

This study offers an innovative perspective on the mobility choices of patients in Italy, highlighting how a detailed analysis of these dynamics can significantly contribute to the strategic planning and management of healthcare services. Through data analysis and the application of machine learning techniques, we aim to outline strategies for a more efficient, effective, and patient-centered care system, promoting a balance between the quality of care and economic sustainability in the national healthcare context.

2.1. Application Context

Incorporating data from 2021, our study provides a sharp and scientifically rigorous overview of healthcare facilities in the regions of Apulia and Emilia-Romagna, following the guidelines of the Ministry of Health. In Apulia, the healthcare system is organized around six Local Health Authorities (ASL), each responsible for delivering a comprehensive range of health services to the local population. Similarly, Emilia-Romagna’s healthcare services are structured around eight Local Health Units (USL), which play a pivotal role in ensuring access to health services across the region. These entities, ASLs in Apulia and USLs in Emilia-Romagna, are crucial for implementing national health policies at the local level, adapting them to meet the specific needs of their communities. Through detailed analysis, significant differences in the distribution and efficiency of healthcare facilities emerge, reflecting the diversity and complexity of the Italian healthcare system. First-level hospitals, providing essential services such as emergency care, diagnostics, regular hospitalization, and outpatient services, represent the community’s first contact with healthcare. Their widespread presence across the national territory ensures access to primary care for the local community. Second-level hospitals, characterized by greater specialization compared to first-level hospitals, offer more complex services such as specialized surgery, intensive care, and hemodynamics services. These centers, distributed in numerous regions of Italy, serve as reference points for the provision of advanced care. Basic hospitals, primarily focused on primary care functions, offer basic care, outpatient services, and primary level diagnostics. Their role is crucial in connecting primary care with more specialized hospital facilities, ensuring continuity in healthcare. Institutes of Scientific Research and Care (IRCCS), dedicated to scientific research and highly specialized healthcare, play a fundamental role in the development of new therapies and medical research, offering highly specialized care. Accredited Private Healthcare Facilities, which have obtained accreditation from the National Health Service (SSN), collaborate with the public healthcare system by providing care and rehabilitation services, operating in compliance with the quality and safety standards set by the SSN.

2.2. Overview of Hospital Infrastructure in Apulia and Emilia-Romagna

To make an effective comparison between the two regions in terms of healthcare services, a hypothetical macroregion was conceived, focusing exclusively on the mobility of residents, to offer a clear and precise picture of the health situation and patient mobility exclusively between Emilia-Romagna and Apulia. The analysis, based on 2021 data, aims to evaluate the efficiency, accessibility, and quality of the present healthcare facilities, as well as to understand the dynamics of choice and preference of patients in relation to the healthcare services offered by the two regions. Table 1 details the distribution of these facilities in the two regions, highlighting a marked predominance of private hospitals in Emilia-Romagna, with 27 base hospitals and 15 second-level hospitals, while Apulia shows a balanced distribution, with a significant presence of 5 public second-level hospitals.

Table 2 presents the results of the χ² test, conducted to assess the differences in the distribution of hospitals and hospital complexes in the two regions and overall.

The very low p-value (<0.001) indicates a significant association between the region and the distribution of hospitals by type and sector, suggesting notable differences in the distribution of hospitals between the two regions, with a different distribution by type (first level, second level, basic level) and sector (public and private) in each region.

In Emilia-Romagna, the predominance of accredited private base-level hospitals highlights a financial accessibility issue to basic services, with the cost of such services potentially being unsustainable for all segments of the population. Moreover, although the region shows considerable commitment to research and specialized care through the presence of IRCCS (Institutes of Scientific Research and Care) and second-level hospitals, the geographical distribution and capacity of these advanced centers may not be optimal. The current configuration can cause disparities in access to specialized care, negatively affecting the equity of the regional healthcare system.

The hospital network configuration in Apulia presents significant challenges due to its asymmetric composition. The absence of base-level hospitals in the private sector leads to a marked reliance on public facilities for essential healthcare services, exacerbating the risk of overload in these institutions and potentially degrading the efficacy of the healthcare system. This distribution pattern, focused on second-level hospitals, risks inadequately meeting the basic needs of the local population, especially in less urban areas, where access to specialized services may be limited.

In Emilia-Romagna, the private hospital network, accredited to the regional health system, is strongly oriented towards basic services, with 27 base-level hospitals out of a total of 46. The presence of 4 IRCCS (one in the private sector and three in the public sector) and 12 second-level hospitals in the public sector highlights a substantial commitment towards research and specialized care.

In Apulia, the private hospital network, also accredited, is mainly composed of first-level structures (4 out of 30), with no presence of base-level hospitals. This distribution suggests a focus on specialized care in the private sector. However, in the public sector, the presence of 5 second-level structures underlines a parallel commitment to provide a broad spectrum of health services.

Considering macroregion, the difference in the distribution of hospital levels becomes evident. While Emilia-Romagna focuses on basic and specialized services, Apulia shows a greater emphasis on second-level structures in the public sector, compensating for the absence of base-level hospitals in the private sector. This interregional balance could reflect strategic complementarity, with each region covering different aspects of the population’s health needs.

The configuration of the hospital network in Apulia, with a strong presence of second-level structures in the public sector, highlights a commitment to ensuring specialized care, even in the absence of base-level hospitals in the private sector. This may indicate a strategy of focusing resources on specialized and advanced care. However, it is essential to ensure that access to basic care is not compromised, and that there is an adequate geographical distribution of facilities to ensure accessibility for all residents.

3. Materials and Methods

In our study, we used data from 2021. Our selection focused on a carefully curated set of key variables presented in aggregated form. This set includes the total number of available beds and departments, healthcare personnel (nurses, doctors, and other professionals), as well as crucial data on mortality rates, readmissions, and surgical procedures. We also introduced categorical variables to distinguish between hospital networks (public vs. private) and to classify the level of healthcare facilities. These variables serve as fundamental features in our predictive model, which aims to forecast the active kilometer mobility of patients.

Our analytical journey was structured into four key phases to ensure the derivation of accurate and insightful results, depicted in Figure 1:

Data preprocessing: This initial phase was dedicated to improving data quality and uniformity. We carefully addressed missing values, eliminated outliers, and standardized variables, preparing the ground for a consistent and homogenized dataset. The significance of this step is echoed in a study that emphasizes the critical role of data preprocessing in healthcare analytics, highlighting how such practices can significantly enhance the accuracy of predictive models [21].
Cluster analysis: In the subsequent phase, we used the k-means algorithm for a nuanced categorization of the target variable, setting the stage for more detailed insights into patient mobility patterns. This approach is supported by research that demonstrates the utility of k-means clustering in healthcare for identifying patterns and improving patient care management, further validating the choice of this method for our analysis [22].
Predictive modeling: Phase 3 will see the development of an advanced predictive model, powered by logistic regression algorithms. This model will focus on the mobility variable categorized after cluster analysis, using the initially selected health variables as features. The importance of logistic regression in healthcare predictive modeling is highlighted in a review that discusses the role of predictive modeling in healthcare research, underscoring how effectively researchers can make decisions based on predictive modeling [23]. Through the application of the SHAP algorithm, we will identify the impact scores of healthcare features on the predictive model. We will use these scores as lambda weights in the subsequent data envelopment analysis.
Data envelopment analysis: In the final phase, we will apply the data envelopment analysis (DEA) oriented towards outputs under the assumption of variable returns to scale (VRS), to evaluate the efficiency of the 127 healthcare facilities in the Apulia–Emilia-Romagna macroregion. The model employs 8 inputs, which are the healthcare features assessed and weighted according to their impact on the predictive model, thus influencing patient choice, and a single output, healthcare mobility, with the objective of maximizing this output while keeping the inputs constant. This methodology allows for the comparison of the relative performance of hospitals against an “efficiency frontier”. DEA analysis is extremely relevant in the healthcare context for quantifying how effectively resources are transformed into healthcare outputs, providing a comprehensive assessment of the productivity and efficiency of the healthcare facilities.

The entire workflow is encapsulated in Figure 1, offering a visual map of our methodological approach.

Our analyses were powered by Orange Data Mining v3.36, running on an advanced Apple M1 Pro system with 16 GB of RAM and 1 TB of storage, under macOS Sonoma 14.2.1. This high-level setup, combined with sophisticated machine learning techniques, ensured the efficiency and reproducibility of our analyses. The workflow utilized with Orange software version 3.36.2 is depicted in Figure 2. The crucial role of such machine learning methodologies in extracting meaningful insights and predictive models from complex datasets has been previously underscored and validated in foundational studies, such as those focusing on machine learning for predicting neurodevelopmental disorders in children [24].

This study not only advances our understanding of healthcare dynamics but also showcases the transformative power of machine learning in navigating and interpreting complex data landscapes.

Each step of the Orange workflow will be discussed in the following paragraphs. The resulting dataset will subsequently be uploaded into the PIM-DEA software version 3 for the application of data envelopment analysis.

3.1. Data Preprocessing

In the initial phase of preprocessing in the machine learning context (red box in Figure 2), we executed the following procedures:

We formulated the aggregated variables nurses and physicians by using the sum function in the “Formula” widget, accumulating values separately for females and males.
We selected the 10 features under study, specifically 8 numeric: number of beds, departments, doctors, nurses, staff, deaths, readmissions, interventions using the “select column” widget.
We standardized the chosen variables using the “continue” widget, setting the mean to 0 and the standard deviation to 1. This step underscores the importance of standardization in data processing, as highlighted by a study that developed a standardization algorithm for categorical laboratory tests, demonstrating how such practices can facilitate the handling of clinical big data and minimize manual standardization efforts [25].

Additionally, it was noted that the dataset has no missing values and includes a total of 127 instances (hospitals) in the hypothetical macroregion of Apulia–Emilia-Romagna.

3.2. Cluster Analysis

The workflow used for cluster analysis in Orange software is illustrated in the orange box of Figure 2. To calculate intraregional active mobility in kilometers, we initially determined the interpolated distance between the capital city of the ASL/USL where the patient resides and the city where the service-providing hospital is located (Dist km_Hospi). We then aggregated the total active hospitalizations by ASL/USL (Hospi_ASL/USL) and by territorial area (Hospi_Area) for each hospital. The intraregional active mobility in kilometers was calculated using the following formula:

A c t i v e m o b i l i t y = (H o s p i_{A S L / U S L} + H o s p i_{A r e a}) * D i s t k m_{H o s p i}

(1)

This calculation considers patient movement to a different territory from their residence as motivated by a perceived higher quality of service.

To transform active kilometric mobility, originally a continuous variable, into a categorical variable reflecting a mobility gradient (low, medium, high) based on the distance patients travel to the hospital, we employed the K-means clustering algorithm. This approach allowed us to investigate the impact of various variables more accurately on hospital choice, enhancing our understanding of the perceived quality of care.

The K-means algorithm is crucial in data analysis for its ability to divide a dataset into K distinct clusters, minimizing the sum of squared distances between the data points and the cluster centroids. This process is effectively synthesized by the following formula:

J = \sum_{i = 1}^{k} \sum_{x \in C_{i}} {‖x - μ_{i}‖}^{2}

(2)

where k is the number of clusters, C_i is the set of points in cluster i, x is a point in C_i, and μ_i is the centroid of i, reflecting how the algorithm minimizes the internal variances of the clusters to effectively group the data based on their intrinsic characteristics. Its applicability ranges from exploratory data analysis to market segmentation, highlighting its importance in various fields such as biology, marketing, and network optimization [26]. The simplicity and efficiency of K-means in processing large datasets make it essential for identifying hidden patterns and guiding data-based decisions. Its ability to adapt to complex optimization problems and improve network coverage highlights its versatility, making it a valuable tool for optimizing resources and strategies [27].

The implementation of the K-means clustering algorithm in Orange software, using the K-means widget and the Silhouette widget, aimed to categorize active kilometric mobility into three distinct levels: low, medium, and high.

This methodological choice was supported by a preliminary analysis, which revealed a silhouette coefficient of 0.724, indicating a significant separation and internal cohesion among the identified clusters. The average kilometric mobility values for the three clusters are illustrated in the box plots in Figure 3. We outlined three main groups: cluster C1, with 81 hospitals and an average mobility distance of 15,191.7 km, indicative of “low” mobility; cluster C2, comprising 17 hospitals with an average distance of 400,618 km, associated with “high” mobility; and cluster C3, aggregating 29 hospitals with an average distance of 122,101 km, corresponding to “medium” mobility. The term “medium mobility” refers to the average kilometers traveled in 2021 by patients residing in the hypothetical macroregion (Apulia–Emilia-Romagna) to the chosen facility.

This segmentation provides a solid empirical foundation for delving into healthcare access dynamics, highlighting the importance of the distance patients travel in choosing a hospital, and offers key insights for optimizing healthcare services based on population mobility needs.

Table 3 provides fundamental descriptive statistics, including mean, median, minimum, and maximum values for eight key numeric features of the dataset. The main descriptive statistics of the key numerical features of the dataset, such as beds, departments, hospital staff, deaths, interventions, readmissions, nurses, and physicians, are summarized. These quantitative data are crucial for assessing the variation and distribution of hospital resources and healthcare performance across different facilities examined in the study. The metrics offer an immediate view of central trends and extremes in the data, facilitating comparative analyses and efficiency evaluations.

Table 4 details the distribution of the “mobility level” variable, which categorizes patient mobility into HIGH, LOW, and MEDIUM. It is evident that the majority of the observations fall into the “LOW” category (63.8%), followed by “MEDIUM” (22.8%) and “HIGH” (13.4%). These percentages provide insights into general mobility trends within the dataset and are crucial for understanding patient flow dynamics.

3.3. Prediction Model

The predictive model will employ the logistic regression algorithm to estimate the mobility gradient of hospitals in 2021, categorizing them into three classes: LOW, MEDIUM, and HIGH mobility, based on the distance patients travel to reach the hospital. This choice is supported by studies demonstrating the effectiveness of logistic regression in predicting mobility behaviors and health risks, such as the analysis of travel behavior during the COVID-19 pandemic [28] and the assessment of cardiovascular risk [29], highlighting its applicability in complex healthcare contexts.

To optimize the model’s performance, we configured the parameters with Ridge (L2) regularization, C=1, and without differentially weighting the classes, reflecting a standardized approach to maximize computational efficiency and predictive performance. Logistic regression, known for its ability to handle multinomial classifications and quick training times, is ideal for analyzing large datasets in the healthcare sector, providing a solid foundation for interpreting the influence of various features on hospital mobility.

In the logistic regression process, feature values are combined into a weighted sum and transformed through the logistic function, producing probability values that determine the mobility classification of each hospital. This method not only provides clear predictions but also insights into the features that most significantly influence hospital mobility, supporting informed decisions to improve the accessibility and efficiency of healthcare services. This choice is based on the demonstrated effectiveness of such configurations in analyzing complex data, as highlighted in recent studies that have explored the application of logistic regression in healthcare contexts. To address the challenge of training and validation in a context of limited data, we adopted an integrated approach, dividing the dataset consisting of 127 instances (hospitals) into a training set (70%, equivalent to 89 hospitals) and a validation set (30%, corresponding to 38 hospitals) through a data sampling process. This was accomplished using the Data Sampler widget, as illustrated in the blue box in Figure 2. This strategy allowed us to mitigate the risk of overfitting and ensure the model’s generalizability using cross-validation techniques, which facilitated a complete and iterative use of the dataset.

We paid particular attention to data quality, committing to rigorous data preparation and cleaning to provide the model with accurate and significant inputs. This approach maximized the effectiveness of the available information, strengthening the robustness and reliability of the model in a scenario characterized by a limited number of observations. The model training was designed to be replicable, using a method of stratified cross-validation sampling with 10 folds.

3.4. Data Measurements

In our study, we developed a predictive model to classify hospitals in 2021 into three mobility categories: LOW, MEDIUM, and HIGH, based on the distance patients traveled. Evaluating the model’s effectiveness is crucial for understanding its potential to support clinical decisions and patient management strategies.

The model’s performance is evaluated using several crucial metrics to ensure accurate and reliable predictions:

AUC–ROC (area under the curve–receiver operating characteristics): This metric assesses the discriminative ability of the model across the outcome classes, either individually or in an aggregated manner through macro or micro averaging. A value of AUC–ROC equal to 1 indicates a perfect classification, while a value of 0.5 indicates a random classification.

{A U C}_{m a c r o} = \frac{1}{n} \sum_{i = 1}^{n} {A U C}_{i}

(3)

where n is the total number of classes and

{A U C}_{i}

is the

A U C

calculated for class i. For a specific class i in a multiclass context, where class i is considered positive and all other classes are combined as negative, the AUC, denoted as AUC_i, is calculated through the following integral:

{A U C}_{i} = \int_{0}^{1} {T P R}_{i} [{F P R}_{i}^{- 1} (u)] d u

(4)

Here, TPRi and FPRi are, respectively, the true positive rate and false positive rate for class ii, calculated across varying decision thresholds u. This integral covers all possible decision thresholds, providing a comprehensive measure of predictive accuracy for class i.

Accuracy: Accuracy measures the percentage of correctly classified cases relative to the total number of cases. It is a general metric that provides an overview of the overall effectiveness of the model.

A c c u r a c y = \frac{\sum_{i = 1}^{n} {T r u e P o s i t i v e s}_{i} + \sum_{i = 1}^{n} {T r u e N e g a t i v e s}_{i}}{\sum_{i = 1}^{n} {T o t a l P o p u l a t i o n}_{i}}

(5)

Precision: Precision measures the proportion of true positive cases correctly identified relative to all cases identified as positive by the model. It quantifies the accuracy of positive predictions made by the model.

{P r e c i s i o n}_{i} = \frac{{T r u e P o s i t i v e s}_{i}}{{T r u e P o s i t i v e s}_{i} + \sum_{j = 1, j \neq i}^{n} {F a l s e P o s i t i v e s}_{j i}}

(6)

where

{F a l s e P o s i t i v e s}_{j i}

is the number of times that class j was incorrectly predicted as class i.

Recall (sensitivity): Recall measures the proportion of true positive cases correctly identified relative to all actually positive cases. It quantifies the model’s ability to capture all positive cases, minimizing false negatives.

{R e c a l l}_{i} = \frac{{T r u e P o s i t i v e s}_{i}}{{T r u e P o s i t i v e s}_{i} + \sum_{j = 1, j \neq i}^{n} {F a l s e N e g a t i v e s}_{i j}}

(7)

where

{F a l s e N e g a t i v e s}_{i j}

is the number of times that class ii was incorrectly not predicted as class i.

F1 score: The F1 score is a harmonic mean of precision and recall. This metric is useful when a balance between precision and recall is desired, and a single measure is sought to evaluate the model’s performance.

{F 1 S c o r e}_{i} = 2 \times \frac{{P r e c i s i o n}_{i} \times {R e c a l l}_{i}}{{P r e c i s i o n}_{i} + {R e c a l l}_{i}}

(8)

The overall F1 score can be calculated by averaging the F1 scores for each class.

Matthews correlation coefficient (MCC): MCC is a measure of the overall quality of the model’s predictions, considering both positive and negative cases. It ranges from −1 to 1, where 1 indicates perfect prediction, 0 indicates random prediction, and −1 indicates completely wrong prediction.

M C C = \frac{C \times S - \sum_{k} P_{k} \times T_{k}}{\sqrt{(S^{2} - \sum_{k} P_{k}^{2}) \times (S^{2} - \sum_{k} T_{k}^{2})}}

(9)

where

C is the sum of all elements on the diagonal of the confusion matrix (correct predictions).
S is the total sum of all elements in the confusion matrix.
P_k is the sum of all elements in column k of the confusion matrix (predictions for class k).
T_k is the sum of all elements in row k of the confusion matrix (actual values for class k).

These metrics are calculated using the confusion matrix (Figure 4), which is structured with columns representing the predicted classes and rows representing the actual classes. “True Positives” for a class ii are those elements where both the predicted and the actual class are i. Similarly, “True Negatives” for class i are all the correct predictions that do not belong to class i, “False Positives” are those instances where a class other than ii is incorrectly predicted as class ii, and “False Negatives” are those instances where class ii is incorrectly predicted as not being class i. The confusion matrix shows the classification results with colors indicating accuracy. Purple represents correct predictions on the main diagonal, white indicate 0.0% (no instances) and pink highlights other percentages of incorrect predictions.

3.5. Data Envelopment Analysis

Data envelopment analysis (DEA) is a boundary-based optimization model that allows for the assessment of operational units by comparing their performance. It is particularly useful in the healthcare sector for measuring the effectiveness with which hospitals use available resources to maximize outcomes in terms of services provided. DEA allows for the examination of efficiency variations over time and the assessment of the impact of incentivization policies on hospital performance, highlighting that such reforms can significantly influence operational efficiency [30].

In our study on the analysis of healthcare mobility in the Apulia–Emilia-Romagna macroregion, we applied an output-oriented data envelopment analysis (DEA) to evaluate the efficiency of the 127 hospital facilities in the region. This analysis aims to investigate mobility at the hospital level and within the public or private network. In this analysis, we considered kilometric mobility as the output and the eight healthcare characteristics identified in our predictive model as inputs. To ensure a weighted assessment of healthcare characteristics, we used the impact scores from our model as weights during output calculation. The main goal of the analysis was to maximize the total output of hospital facilities while considering the relative efficiency of each facility in influencing patient mobility in the region. This approach allowed us to identify facilities that operate efficiently in terms of patient mobility and understand the dynamics that determine their effectiveness in facilitating patient movement within the region. The relative efficiency of each hospital facility was calculated using the following formula:

{E R}_{i} = M a x \frac{\sum_{i = 1}^{N} λ_{i} \cdot y_{i}}{\sum_{i = 1}^{N} λ_{i} \cdot x_{i}}

(10)

Subject to

\sum_{i = 1}^{n} λ_{i} \cdot x_{i} \leq x_{j}^{*}, j = 1,2, \dots, m λ_{i} \leq 0, j = 1,2, \dots, n

(11)

where

y_{i}

represents the output (kilometric mobility) of facility i for characteristic j;

λ_{i}

represents the weights (the impact scores) assigned to healthcare features, normalized to the range between 0 and 1; N represents the total number of hospital facilities;

x_{i j}

is the value of input j for decision entity i.

x_{j}^{*}

is the maximum value of input j among all decision entities. The results of the DEA analysis will range between 0 and 1, indicating the degree of relative efficiency of each hospital facility.

4. Experimental Results and Discussions

The results achieved by the model have highlighted high performance, with an area under the curve (AUC) of 0.969, an accuracy (CA) of 87.6%, an F1 score of 0.869, a precision (PREC) of 87.1%, a recall rate of 87.6%, and a Matthews correlation coefficient (MCC) of 0.758. For a more in-depth analysis of the model’s performance, the analysis of the confusion matrix and the receiver operating characteristic (ROC) curves were included. The confusion matrix (Figure 4) showed a significant match between the model’s predictions and the actual classifications, with 90.2% of low mobility cases correctly identified (low–low), 83.3% of high mobility cases accurately classified (high–high), and 75.0% of medium mobility cases correctly recognized (medium–medium).

Classification errors were found in cases of medium mobility, with 12.5% mistakenly classified as low mobility and 12.5% as high mobility; 9.8% of low mobility cases were classified as medium, and 16.7% of high mobility as medium. The ROC curves for the three classes (low, medium, and high mobility) provided an effective visual representation of the model’s ability to discriminate between these categories, demonstrating excellent separation with high AUC values, a sign of the model’s strong ability to correctly classify hospitals based on their active mobility (Figure 5, Figure 6 and Figure 7). The confusion matrix and the ROC curve representations are produced using the Confusion Matrix widget and the ROC Analysis widget, respectively, as shown in the dark red box in Figure 2.

The ROC curves for the low, medium, and high mobility classes provide a graphical representation of the model’s ability to discriminate between these different categories. The prominently marked performance curve in black that we see in the graphs is the visual representation of the true positive rate (sensitivity) as a function of the false positive rate (1–specificity) across various decision thresholds. Sensitivity, depicted on the vertical axis, shows the extent to which the model correctly identifies actual cases of each class, while the horizontal axis indicates the percentage of cases erroneously classified, known as false positives.

The ROC curve stands out in the graph as a trajectory exploring the trade-off between sensitivity and specificity: a curve that arches towards the top-left corner signals high-quality discrimination by the model. In each of the graphs, the area under the curve, represented by the gray zone, reveals the overall classification capability of the model; an area approaching unity signals excellent predictive capacity. The red lines indicate the chosen optimal cut-off points, balancing the capture of true positives against the increase in false positives. High AUC values in these curves demonstrate the model’s strong ability to accurately classify hospitals based on the degree of active patient mobility, as highlighted in the graphs for the three mobility classes.

4.1. Feature Contributions to the Hospital Mobility Model

The analysis of feature importance, as depicted in Figure 8 and quantitatively detailed in Table 5, reveals a complex interaction between structural, operational, and systemic factors that influence the model’s classification performance. SHAP values (SHapley Additive exPlanations) quantify the absolute average impact of each feature on the magnitude of the model’s output, which predicts the distance patients travel to reach a hospital. Each feature’s MEAN and STD scores provide further quantification of their respective influences. For example, a higher standard deviation indicates greater variability in a feature’s influence across different model iterations, pointing to potential instabilities in its impact. The bar chart in Figure 8 illustrates the reduction in model accuracy, measured by the area under the curve (AUC), when each feature is individually removed. The graphical representation was facilitated by the Features Importance widget, displayed in the green box in Figure 2. The length of the bars indicates the extent to which the omission of a specific feature, such as DEATHS or PHYSICIANS, reduces the AUC, thereby underscoring its importance in the predictive model. A longer bar represents a greater decrease in AUC, highlighting the significant role of the feature in determining patient hospital mobility.

The evaluation of the importance of various medical and structural features through SHAP values (SHapley Additive exPlanations) illustrates how each factor influences the distance patients are willing to travel to access a hospital. The variable “Deaths” (DEATHS), with an average impact value of 0.128 and a standard deviation of 0.012, emerges as the most significant; it indicates that high mortality rates tend to dissuade patients from choosing a hospital, reflecting concerns about the quality of care provided. The λ_i values in the table represent the normalized coefficients ranging from 0 to 1, whose total sum is 1, of the mean impact scores of the features. They indicate the proportion of contribution of each feature to the model output, taking into account the relative importance of each feature compared to others in influencing the model’s predictions. The λ_i values are calculated using the following formula:

λ_{i} = \frac{{M E A N}_{I}}{\sum_{j = 1}^{n} {M E A N}_{J}} x \frac{1}{\sum_{j = 1}^{n} {M E A N}_{J}}

(12)

where λ_i is the normalized value of feature i; MEAN_i is the mean impact score of feature I; j is an index representing each feature in the calculation of λ_i, varying from 1 to nn, the total number of features considered.

Similarly, the presence of an adequate number of doctors (PHYSICIANS), which has an average impact of 0.042 and a standard deviation of 0.014, is perceived positively and encourages patients to opt for specific healthcare facilities, associating a higher number of doctors with better quality of care. Likewise, the use of interventions (INTERVENTIONS) records an average impact of 0.012 and a notable standard deviation of 0.08, highlighting how the volume of surgical interventions can variably influence perceptions of the hospital’s capacity and specialization.

Readmissions (READMISSIONS), with an average impact of 0.010 and a standard deviation of 0.007, can be seen as an indicator of deficiencies in the quality of care, pushing patients to consider more reliable alternatives. Moreover, the number of available beds (BEDS), with an average value of 0.023 and a standard deviation of 0.008, suggests that a greater capacity to accommodate can make a hospital more attractive, especially in areas with limited healthcare infrastructure.

Hospital staff (HOSPITAL STAFF), with an average impact of 0.011 and a standard deviation of 0.009, demonstrates that a large and qualified workforce is essential to attract patients, viewed as a sign of service efficiency and quality. The number of specialized departments (DEPARTMENTS), with an average impact of 0.016 and a standard deviation of 0.008, highlights how the availability of specialized treatments can be crucial for patients needing specific care not available locally.

Finally, the influence of nurses (NURSES), with an average impact of 0.037 and a standard deviation of 0.011, is particularly significant. This underscores the critical role nurses play in patient management and in shaping their perceptions of care quality, with a direct impact on patients’ decisions to travel longer distances to receive assistance in well-equipped hospitals.

4.2. Specific Feature Contributions by Target Class in the Hospital Mobility Model

We adopted SHAP (SHapley Additive exPlanations) for the fair interpretation of variables in our logistic regression model, crucial for informed decisions in healthcare. This technique, which assigns an impact value to each feature, has been applied in studies demonstrating its value in the analysis of complex models [31].

Figure 9 provides a visual representation of how different features influence the classification of patients into LOW, MEDIUM, and HIGH mobility categories through SHAP analysis. The gray boxes highlight the base probability values and the predictions modified by SHAP values. The probability scale is indicated along the sidebar, with SHAP values shown in white and numerical feature values in black. Red ribbons indicate an increase in the probability of belonging to a particular mobility class, while blue ribbons indicate a decrease.

The graphical representation was facilitated by the Explain Prediction widget, displayed in the green box in Figure 2. For the LOW target class, which identifies patients with low mobility, meaning those less inclined to travel long distances for hospital care, SHAP value analysis reveals which features weigh most heavily in their decision to choose a hospital. A lower availability of beds (BEDS), indicated by a SHAP value of 0.046 (VALUE of −0.8), correlates with an increased probability that patients will turn to closer facilities, suggesting a direct link between resource availability and the decision not to travel for medical care. The number of departments (DEPARTMENTS) with a SHAP value of 0.042 (VALUE of −0.68) and hospital staff (HOSPITAL STAFF) with a SHAP value of 0.031 (VALUE of −0.54) demonstrate how diversified healthcare offerings and a broad hospital group can be reassuring elements for patients, who may then choose closer hospitals rather than exploring farther options. Mortality rate (DEATHS) emerged as the most decisive factor, with a SHAP value of 0.145 (VALUE of −0.63), indicating that higher mortality rates significantly dissuade patients from choosing certain facilities. This highlights how the perception of quality of care is a critical factor affecting healthcare mobility. Finally, nursing care, measured by the number of nurses (NURSES), with a SHAP value of 0.056 (VALUE of −0.57), confirms that good nursing care is valued by patients, who tend to prefer nearby facilities perceived as better equipped to provide immediate and quality care. The model starts with a base probability of 65%, and considering the set of analyzed features, it predicts a probability of 95% that patients will opt for low mobility.

In the MEDIUM target class, the model highlights how certain hospital characteristics negatively influence patient mobility, pushing the base probability of 0.22 towards a lower predictive value of 0.05. The negative SHAP value for departments (DEPARTMENTS) of −0.032 (VALUE of −0.68) suggests that a wide range of specializations within the hospital reduces the need for patients to travel average distances, as their needs can be met locally. Similarly, a high perception of hospital staff quality (HOSPITAL STAFF), indicated by a SHAP value of −0.023 (VALUE of −0.54), and adequate nursing staffing (NURSES), with a SHAP value of −0.059 (VALUE of −0.57), reinforce the trend of patients preferring nearby hospitals. These characteristics, indicating comprehensive and quality hospital care, thus seem to discourage patients from seeking care at farther facilities. The strongest signal encouraging patients to stay nearby is given by the mortality rate (DEATHS), with a significant SHAP value of −0.15 (VALUE of −0.63). A lower mortality rate within a hospital is a clear indicator of care quality and safety, which appears to be a decisive factor in limiting patient mobility.

For the HIGH target class, which considers patients with expected high hospital mobility, it is observed that certain features, when modified, have an effect that reduces the prediction probability from 0.13 to 0. In other words, a decrease in hospital staff (HOSPITAL STAFF) with a SHAP value of 0.01 (VALUE of −0.54), a reduction in the number of departments (DEPARTMENTS) with a SHAP value of 0.01 (VALUE of −0.68), and a decrease in the number of physicians (PHYSICIANS) with a SHAP value of 0.03 (VALUE of −0.34), are associated with a lower probability of patients being classified as high mobility. Similarly, an increase in interventions (INTERVENTIONS) with a SHAP value of 0.03 (VALUE of 0.02) and an increase in readmissions (READMISSIONS) with a SHAP value of 0.01 (VALUE of 0.19) also contribute to this decrease. These changes reflect how the model predicts that modifications to these specific characteristics reduce the probability that a patient will travel significant distances to receive care, potentially preferring closer options.

4.3. Evaluating Hospital Efficiency in Healthcare Mobility

Table 6 presents the general descriptive statistics for the DEA efficiency scores across all hospital facilities involved in the analysis. These results indicate that the DEA efficiency scores do not follow a normal distribution, as confirmed by the significant Shapiro–Wilk test result (p-value < 0.001). This non-normality justifies the use of the nonparametric Kruskal–Wallis test to analyze efficiency differences among groups of hospitals categorized according to various criteria.

Table 7 categorizes hospitals based on their operational level and highlights the efficiency with which each level utilizes healthcare characteristics (inputs) to maximize patient mobility (output). The efficiency scores are calculated using DEA, aiming to optimize the output relative to the inputs used.

In the DEA efficiency score analysis, base-level hospitals demonstrate high efficiency, with an average score of 0.897 and a median of 0.906, indicating effective use of healthcare characteristics to facilitate patient mobility. In contrast, first-level hospitals have the lowest average score of 0.837, suggesting challenges in resource optimization, possibly due to the complexity of cases handled. IRCCS and private nursing homes excel with average scores of 0.922 and 0.973, respectively, highlighting their superior ability to transform healthcare inputs into high patient mobility, likely due to advanced resources and effective management. On the other hand, second-level hospitals show more variability and lower scores, with an average of 0.784, reflecting potential inefficiencies or greater complexities in treatments.

Table 8 segregates the DEA efficiency scores based on the hospital management network—private versus public—providing insights into how these networks leverage healthcare characteristics to maximize patient mobility.

Private network hospitals showcase significantly higher efficiency, with an average score of 0.936 and a median of 0.977. This superior performance is indicative of their ability to effectively convert healthcare inputs into desirable outputs. The high median and relatively low standard deviation (0.0872) suggest that private hospitals maintain consistent efficiency across the board, possibly due to more flexible management practices and targeted investments in healthcare technologies. Conversely, public network hospitals display a broader range of efficiency scores, evidenced by a lower average of 0.830 and a median of 0.875, along with a higher standard deviation of 0.1453. The minimum score dipping to 0.387 points to significant inefficiencies within some public hospitals. This variability could be attributed to disparate operational challenges, including bureaucratic constraints and varying levels of resource allocation, which may affect their capacity to optimize healthcare inputs effectively.

To further delve into the efficiency analysis of hospital facilities in the Apulia and Emilia-Romagna macroregion, we employed the nonparametric Kruskal–Wallis ANOVA test. This method was chosen to compare DEA efficiency scores across different groups of hospitals, particularly suitable given the non-normal distribution of the scores. The Kruskal–Wallis test allows us to determine if there are statistically significant differences between the medians of the groups, a critical condition for guiding healthcare policy decisions and managerial interventions.

The first set of analyses, presented in Table 9, examines the differences in DEA efficiency scores between private and public hospital networks.

A χ² value of 21.0 with a p-value less than 0.001 indicates that the differences in efficiency between hospitals in private and public networks are statistically significant. This result supports the hypothesis that the type of management significantly influences the efficiency with which hospitals can utilize resources to promote patient mobility.

The second set of analyses, detailed in Table 10, assesses the differences in efficiency among hospitals categorized by operational level.

A χ² of 36.4 with a p-value less than 0.001 suggests that there are significant differences in efficiency scores across various hospital levels. This highlights how different levels of hospitals, from base to second level, including IRCCS and private nursing homes, manage their healthcare resources differently.

Finally, Table 11 presents pairwise comparisons among hospital levels, using the Dwass–Steel–Critchlow–Fligner test to examine specific differences between the groups.

Table 11 reveals significant differences in DEA efficiency scores among hospital levels, particularly highlighting the effectiveness of private care facilities. Dwass–Steel–Critchlow–Fligner tests confirm that private care facilities significantly outperform both basic-level hospitals, with a W value of 5.57, and first-level ones, with a W value of 5.16, both with p-values significantly lower than 0.05. These results indicate that private care facilities manage resources exceptionally effectively, optimizing patient mobility far more than observed in traditional hospital levels. Furthermore, basic-level hospitals demonstrate greater efficiency compared to second-level ones, with a W value of −4.33 and a p-value of 0.019, suggesting that even basic levels manage resources more effectively than hospitals facing greater operational complexity. The most notable discrepancy is observed between private care facilities and second-level hospitals, where a W value of −7.35 highlights the overwhelming superiority of the former.

These analyses demonstrate the critical importance of efficient management and optimal operational practices in determining hospital efficiency, with private care facilities emerging as examples of excellence in utilizing healthcare resources to maximize patient mobility output. Such results provide valuable insights for decision-makers in the healthcare sector, who must consider restructuring hospital practices to enhance overall efficiency.

4.4. Experiments

In our analysis on hospital mobility prediction, we employed advanced methodologies to identify the optimal predictive model.

Our comparative assessment is summarized in Table 12, which includes the following models:

Logistic regression: Known for its simplicity and efficiency in binary classification problems, it proved effective in handling linear relationships between the predictive features and the target variable [32].
Random forest: This ensemble method, which builds multiple decision trees and merges them to obtain a more accurate and stable prediction, was particularly useful for its robustness against overfitting and its ability to handle nonlinear data [33].
Gradient boosting: Another ensemble technique that optimizes predictive accuracy by combining multiple weak predictive models to form a strong predictor. It performed well in scenarios requiring the handling of various types of data irregularities [34].
Support vector machine (SVM): Designed to find the hyperplane that best divides a dataset into classes, SVM was valuable for its effectiveness in high-dimensional spaces, particularly when the number of dimensions exceeds the number of samples [35].
Neural networks: Extremely versatile and powerful, this model can capture complex nonlinearities in data through layers of neurons that simulate human decision-making processes. Neural networks are particularly effective in large datasets and with variable complexity, adapting well to multiple application scenarios [36].
k-nearest neighbors (kNN): A simple, instance-based learning algorithm where the function is only approximated locally and all computation is deferred until classification. The kNN approach was beneficial due to its interpretability and minimal assumption about the structure of the data [37].
Naive Bayes: A group of simple probabilistic classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between the features. It is often remarkably efficient in large datasets [38].
AdaBoost: A boosting algorithm that can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms (“weak learners”) is combined into a weighted sum that represents the final output of the boosted classifier [39].

We evaluated these models based on several key performance metrics such as area under the curve (AUC), accuracy, F1 score, precision, recall, and Matthews correlation coefficient (MCC). This selection of metrics allows for a comprehensive assessment of performance, guiding the choice of the most suitable model for hospital mobility prediction. One study examined the effectiveness of predictive models for the early diagnosis of diabetes, emphasizing the critical role of model selection in healthcare outcomes [40]. Another work discussed the development and deployment of predictive models in the healthcare sector, providing practical insights into predictive modeling in healthcare [41]. Furthermore, the comparison of predictive models for hospital readmission of heart failure patients was analyzed, highlighting the importance of cost considerations in model evaluation [42].

Logistic regression, chosen for predicting hospital mobility levels (low, medium, high), is distinguished by an AUC of 0.969. This index, being the primary metric for comparing prediction algorithms, reflects the model’s high ability to differentiate between the predicted classes, a critical aspect for ensuring precision in clinical and operational decisions. The AUC, by measuring the model’s quality across the entire spectrum of classification thresholds, provides an assessment independent of the specific distribution of classes in the dataset, a fundamental aspect when considering multiple outcome categories. Logistic regression, with its probabilistic nature, offers a robust interpretative framework and flexibility in adapting to multiclass dependent variables, making it particularly suitable for addressing our tripartite target variable. The rigor in model validation was uniformly maintained for all, using 10-fold cross-validation and dividing the dataset into a proportion of 70% for training and 30% for testing, thus ensuring the robustness and generalizability of the predictive performances.

4.5. Impact of Machine Learning on Hospital Mobility: Perspectives and Challenges

The use of logistic regression and SHAP values to analyze variables influencing hospital mobility opens new perspectives for understanding patients’ perceptions of healthcare service quality. These techniques, augmented by advanced machine learning methods, significantly improve the transparency and interpretability of predictive models, allowing for the elucidation of complex variable relationships. Recent studies on seismic vulnerability and behavior in strategy games have validated the effectiveness of SHAP values in delivering detailed predictive insights across various domains [43,44], highlighting how specific factors influence patient decisions about hospital mobility and providing valuable insights for optimizing healthcare services.

Comparing two regions with distinct healthcare contexts has enriched this analysis, demonstrating how regional nuances can affect service quality perception. This approach is supported by studies investigating both perceived and technical healthcare quality in primary care settings, which have major implications for the sustainability of national health insurance schemes as evidenced in Ghana [45]. Additionally, an analysis between the Lombardy Region and national Italian data revealed substantial differences in hospital care quality and clinical outcomes, emphasizing the importance of regional context in healthcare quality assessments [46]. These examples underscore the critical role of regional comparisons in understanding and improving healthcare quality.

Despite its significant contributions, this study has limitations, including its geographical scope confined to Apulia and Emilia-Romagna. Expanding the analysis to other regions or comparing Italian data with that of other countries could provide a more comprehensive perspective. Research has shown that interregional healthcare mobility within a decentralized healthcare system is influenced by factors such as regional income, hospital capacity, organizational structure, performance, and technology [47]. These factors are crucial in guiding patient healthcare choices outside their home region, offering insights for more effective health policies [48].

Moreover, access to and the quality of data are critical aspects that can affect the generalizability of results. Future research should aim for a broader and more diversified data collection to address these limitations and further strengthen the research foundation. This study marks a significant step towards using machine learning to better analyze and understand hospital mobility and healthcare service quality perceptions.

Future research should aim to expand the geographical scope and data availability, incorporating interdisciplinary perspectives for a more holistic understanding of hospital mobility dynamics. These insights emphasize the need for a deeper understanding and targeted strategies to address the challenges posed by interregional healthcare mobility, ensuring equity and efficiency in access to care across the national territory.

4.6. Machine Learning Policies for Optimizing the Italian Healthcare System

In the current context of the Italian healthcare system, marked by significant regional disparities in the distribution and effectiveness of healthcare facilities, the adoption of advanced machine learning methodologies such as logistic regression and data envelopment analysis (DEA) emerges as a crucial tool for optimizing healthcare mobility and enhancing patient perceptions of service quality. These technologies enable a rigorous and dynamic evaluation of the operational efficiency of healthcare facilities, highlighting how hospitals with high-quality standards and low mortality rates attract more patients, regardless of their geographic location. The standardization of quality and performance metrics on a regional scale, supported by a robust data infrastructure, would not only facilitate a more equitable management of healthcare resources but also promote a more cohesive and integrated healthcare policy. Furthermore, the implementation of public–private partnerships could accelerate the transfer of managerial skills and efficiencies from the private sector to the public sector, catalyzing substantial improvements in the quality and accessibility of care. This integration of predictive analytics and continuous benchmarking, therefore, proposes itself as a keystone for targeted healthcare reforms aimed at overcoming existing disparities and enhancing the efficiency of the Italian healthcare system as a whole.

5. Conclusions

The research presented herein explored healthcare mobility within the Emilia-Romagna and Apulia regions, utilizing advanced machine learning methodologies to decipher the factors influencing patient choices regarding healthcare facilities based on the perception of service quality. Results derived from logistic regression techniques and data envelopment analysis (DEA) provide a nuanced portrait of how geographic distribution and operational efficiency of healthcare facilities impact patient mobility. Perceived quality, which includes variables such as mortality rates and medical staff availability, was identified as the dominant factor in patient choice, suggesting that improvements in hospital performance could attract more patients, thus more evenly distributing workload and enhancing access to care. Specifically, hospitals with lower mortality rates and higher availability of medical professionals were associated with increased patient mobility, indicating that patients are willing to travel greater distances for higher-quality care.

Machine learning techniques, including logistic regression for mobility prediction and the k-means algorithm for cluster analysis, facilitated an effective and detailed segmentation of mobility based on travel distances, enabling the identification of specific patient behavior patterns. These methodologies have not only confirmed the importance of clinical and structural variables in hospital choice but also highlighted how quality perceptions significantly influence such decisions. The adoption of advanced predictive models and DEA analysis allowed for a rigorous comparison of efficiency among healthcare facilities, revealing that private institutions tend to exhibit higher efficiency than public ones. These findings suggest that efficient management practices, commonly observed in the private sector, could be successfully implemented in the public sector to elevate the overall quality and efficiency of healthcare services.

In conclusion, this study demonstrates how the integration of predictive technologies and analytical techniques can provide valuable insights for optimal public health management and evidence-based policy formulation. The implications of these findings are extensive and underscore a clear path toward enhancing healthcare planning and adopting advanced management strategies to overcome regional disparities and improve the effectiveness of the Italian healthcare system. These results emphasize the need for further research to explore the applicability of these techniques in other regions and contexts, to generalize and expand the effectiveness of the proposed solutions.

Author Contributions

Conceptualization, V.S., C.T., N.F., M.I., J.P. and A.M.; methodology, V.S., N.F., J.P. and A.M.; data curation, V.S., C.T., N.F., M.I., J.P. and A.M.; writing—original draft preparation, V.S. and A.M.; writing—review and editing, V.S., C.T., N.F., M.I., J.P. and A.M.; visualization, V.S. and A.M.; supervision, V.S., C.T., N.F., M.I., J.P. and A.M.; project administration, A.M.; funding acquisition, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript (presented in alphabetical order):

ASL	Local Health Authorities
AUC	Area Under the Curve
CA	Accuracy
CNN	Convolutional Neural Network
DEA	Data Envelopment Analysis
F1	F1 score
k-NN	k-Nearest Neighbor
MCC	Matthews Correlation Coefficient
ML	Machine Learning
Prec	Precision
ROC	Receiver Operating Characteristic
SHAP	SHapley Additive exPlanations
SVM	Support Vector Machine
USL	Local Health Units

References

Chisari, G.; Lega, F. Impact of austerity programs: Evidence from the Italian national health service. Health Serv. Manag. Res. 2023, 36, 145–152. [Google Scholar] [CrossRef] [PubMed]
Rahim, A.A.; Ibrahim, M.I.; Musa, K.I.; Chua, S.-L.; Yaacob, N.M. Assessing Patient-Perceived Hospital Service Quality and Sentiment in Malaysian Public Hospitals Using Machine Learning and Facebook Reviews. Int. J. Environ. Res. Public Health 2021, 18, 9912. [Google Scholar] [CrossRef] [PubMed]
Pai, Y.P.; Chary, S.T.; Pai, R.Y. Patient-perceived hospital service quality: An empirical assessment. Int. J. Health Care Qual. Assur. 2018, 31, 76–91. [Google Scholar] [CrossRef] [PubMed]
Lahmar, B.; Dridi, H.; Akakba, A. Territorial health approach outputs of geo-governance of health facilities: Case study of Batna, Algeria. GeoJournal 2021, 86, 2305–2319. [Google Scholar] [CrossRef]
Alvarez-Elías, A.C.; Lou-Meda, R.; Exeni, R.; Exantus, J.; Bonilla-Felix, M.; González-Camac, S.; de Ferris, M.E.D.-G. Addressing Health Inequities and Disparities in Children with Kidney Disease in Disadvantaged Areas: The Latin American and Caribbean Experience. Curr. Pediatr. Rep. 2023, 11, 40–49. [Google Scholar] [CrossRef] [PubMed]
Cavalieri, M.; Guccio, C.; Lisi, D.; Pignataro, G. Does the Extent of Per-Case Payment System Affect Hospital Efficiency? Evidence from the Italian NHS (SSRN Scholarly Paper 2515772). Public Financ. Rev. 2014, 46, 117–149. [Google Scholar] [CrossRef]
Colombi, R.; Martini, G.; Vittadini, G. Determinants of transient and persistent hospital efficiency: The case of Italy. Health Econ. 2017, 26, 5–22. [Google Scholar] [CrossRef] [PubMed]
Pecoraro, F.; Clemente, F.; Luzi, D. The efficiency in the ordinary hospital bed management in Italy: An in-depth analysis of intensive care unit in the areas affected by COVID-19 before the outbreak. PLoS ONE 2020, 15, e0239249. [Google Scholar] [CrossRef] [PubMed]
Rosa, A. Il modello di rete Hub Spoke: Fattori critici di successo e barriere organizzative. Mecosan Manag. Ed Econ. Sanit. 2018, 2018, 33–56. [Google Scholar] [CrossRef]
Rose, R.C.; Uli, J.; Abdul, M.; Ng, K.L. Hospital service quality: A managerial challenge. Int. J. Health Care Qual. Assur. Inc. Leadersh. Health Serv. 2004, 17, 146–159. [Google Scholar] [CrossRef]
Guarducci, G.; Messina, G.; Carbone, S.; Nante, N. Identifying the Drivers of Inter-Regional Patients’ Mobility: An Analysis on Hospital Beds Endowment. Healthcare 2023, 11, 2045. [Google Scholar] [CrossRef] [PubMed]
Musich, S.; Wang, S.S.; Ruiz, J.; Hawkins, K.; Wicker, E. The impact of mobility limitations on health outcomes among older adults. Geriatr. Nurs. 2018, 39, 162–169. [Google Scholar] [CrossRef] [PubMed]
Demnitz, N.; Zsoldos, E.; Mahmood, A.; Mackay, C.E.; Kivimäki, M.; Singh-Manoux, A.; Dawes, H.; Johansen-Berg, H.; Ebmeier, K.P.; Sexton, C.E. Associations between Mobility, Cognition, and Brain Structure in Healthy Older Adults. Front. Aging Neurosci. 2017, 9, 155. [Google Scholar] [CrossRef] [PubMed]
Ferrari, A.; Seghieri, C.; Giannini, A.; Mannella, P.; Simoncini, T.; Vainieri, M. Driving time drives the hospital choice: Choice models for pelvic organ prolapse surgery in Italy. Eur. J. Health Econ. 2023, 24, 1575–1586. [Google Scholar] [CrossRef] [PubMed]
Aggarwal, A.; Han, L.; Lewis, D.; Costigan, J.; Hubbard, A.; Taylor, J.; Rigg, A.; Purushotham, A.; van der Meulen, J. Association of travel time, patient characteristics, and hospital quality with patient mobility for breast cancer surgery: A national population-based study. Cancer 2024, 130, 1221–1233. [Google Scholar] [CrossRef] [PubMed]
Aggarwal, A.; Han, L.; Boyle, J.; Lewis, D.; Kuyruba, A.; Braun, M.; Walker, K.; Fearnhead, N.; Sullivan, R.; van der Meulen, J. Association of Quality and Technology with Patient Mobility for Colorectal Cancer Surgery. JAMA Surg. 2023, 158, e225461. [Google Scholar] [CrossRef] [PubMed]
De Curtis, M.; Bortolan, F.; Diliberto, D.; Villani, L. Pediatric interregional healthcare mobility in Italy. Ital. J. Pediatr. 2021, 47, 139. [Google Scholar] [CrossRef] [PubMed]
Zini, M.; Carcasci, C. Machine learning-based energy monitoring method applied to the HVAC systems electricity demand of an Italian healthcare facility. Smart Energy 2024, 14, 100137. [Google Scholar] [CrossRef]
Santamato, V.; Esposito, D.; Tricase, C.; Faccilongo, N.; Marengo, A.; Pange, J. Assessment of Public Health Performance in Relation to Hospital Energy Demand, Socio-Economic Efficiency and Quality of Services: An Italian Case Study. In Computational Science and Its Applications—ICCSA 2023 Workshops; Gervasi, O., Murgante, B., Rocha, A.M.A.C., Garau, C., Scorza, F., Karaca, Y., Torre, C.M., Eds.; Springer Nature: Cham, Switzerland, 2023; pp. 505–522. [Google Scholar] [CrossRef]
Santamato, V.; Tricase, C.; Faccilongo, N.; Marengo, A.; Pange, J. Healthcare performance analytics based on the novel PDA methodology for assessment of efficiency and perceived quality outcomes: A machine learning approach. Expert Syst. Appl. 2024, 252, 124020. [Google Scholar] [CrossRef]
Malkusch, S.; Hahnefeld, L.; Gurke, R.; Lötsch, J. Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP). CPT Pharmacomet. Syst. Pharmacol. 2021, 10, 1371–1381. [Google Scholar] [CrossRef]
Awad, F.H.; Hamad, M.M.; Alzubaidi, L. Robust Classification and Detection of Big Medical Data Using Advanced Parallel K-Means Clustering, YOLOv4, and Logistic Regression. Life 2023, 13, 691. [Google Scholar] [CrossRef] [PubMed]
Panda, N.R. A Review on Logistic Regression in Medical Research. Natl. J. Community Med. 2022, 13, 265–270. [Google Scholar] [CrossRef]
Toki, E.I.; Tsoulos, I.G.; Santamato, V.; Pange, J. Machine Learning for Predicting Neurodevelopmental Disorders in Children. Appl. Sci. 2024, 14, 837. [Google Scholar] [CrossRef]
Kim, M.; Shin, S.-Y.; Kang, M.; Yi, B.-K.; Chang, D.K. Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study. JMIR Med. Inform. 2019, 7, e14083. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Wu, D.-H. A K-Means Clustering-Based Multiple Importance Sampling Algorithm for Integral Global Optimization. J. Oper. Res. Soc. China 2023, 11, 157–175. [Google Scholar] [CrossRef]
Bharadwaj, P.; Gupta, R.; Gurjar, R.; Singh, A. Importance of CURE Clustering Algorithm over K-Means Clustering Algorithm for Large Data-set. In Proceedings of the 2023 Third International Conference on Secure Cyber Computing and Communication (ICSCCC), Jalandhar, India, 26–28 May 2023; pp. 421–426. [Google Scholar] [CrossRef]
Mazanec, J.; Harantová, V.; Štefancová, V.; Brůhová Foltýnová, H. Estimating Mode of Transport in Daily Mobility during the COVID-19 Pandemic Using a Multinomial Logistic Regression Model. Int. J. Environ. Res. Public Health 2023, 20, 4600. [Google Scholar] [CrossRef] [PubMed]
Xi, Y.; Wang, H.; Sun, N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension. Front. Cardiovasc. Med. 2022, 9, 1025705. [Google Scholar] [CrossRef] [PubMed]
Lindaas, N.A.; Anthun, K.S.; Magnussen, J. New Public Management and hospital efficiency: The case of Norwegian public hospital trusts. BMC Health Serv. Res. 2024, 24, 36. [Google Scholar] [CrossRef]
Yao, Z.; Chen, M.; Zhan, J.; Zhuang, J.; Sun, Y.; Yu, Q.; Yu, Z. Refined Landslide Susceptibility Mapping by Integrating the SHAP-CatBoost Model and InSAR Observations: A Case Study of Lishui, Southern China. Appl. Sci. 2023, 13, 12817. [Google Scholar] [CrossRef]
Musa, A.B. Comparative study on classification performance between support vector machine and logistic regression. Int. J. Mach. Learn. Cyber. 2013, 4, 13–24. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Zhang, J. New Machine Learning Algorithm: Random Forest. In Information Computing and Applications. ICICA 2012; Liu, B., Ma, M., Chang, J., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7473. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed]
Awad, M.; Khanna, R. Support Vector Machines for Classification. In Efficient Learning Machines; Apress: Berkeley, CA, USA, 2015. [Google Scholar] [CrossRef]
Taherdoost, H. Deep Learning and Neural Networks: Decision-Making Implications. Symmetry 2023, 15, 1723. [Google Scholar] [CrossRef]
Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; pp. 1255–1260. [Google Scholar] [CrossRef]
Wei, W.; Visweswaran, S.; Cooper, G.F. The application of naive Bayes model averaging to predict Alzheimer’s disease from genome-wide data. J. Am. Med. Inform. Assoc. 2011, 18, 370–375. [Google Scholar] [CrossRef] [PubMed]
Zhang, P.-B.; Yang, Z.-X. A Novel AdaBoost Framework with Robust Threshold and Structural Optimization. IEEE Trans. Cybern. 2018, 48, 64–76. [Google Scholar] [CrossRef] [PubMed]
Jahani, M.; Mahdavi, M. Comparison of Predictive Models for the Early Diagnosis of Diabetes. Healthc. Inform. Res. 2016, 22, 95–100. [Google Scholar] [CrossRef] [PubMed]
Stiglic, G. Tutorial: Developing and Deploying Healthcare Predictive Models in R. In Proceedings of the 2014 IEEE International Conference on Healthcare Informatics, Verona, Italy, 15–17 September 2014; Volume 363. [Google Scholar] [CrossRef]
Landicho, J.A.; Esichaikul, V.; Sasil, R.M. Comparison of predictive models for hospital readmission of heart failure patients with cost-sensitive approach. Int. J. Healthc. Manag. 2021, 14, 1536–1541. [Google Scholar] [CrossRef]
Greenwood, G.W.; Abbass, H.; Hussein, A. Interpretation of Neural Network Players for a Generalized Divide the Dollar Game Using SHAP Values. In Proceedings of the 2023 IEEE Symposium Series on Computational Intelligence (SSCI), Mexico City, Mexico, 5–8 December 2023; pp. 1808–1813. [Google Scholar] [CrossRef]
Karampinis, I.; Iliadis, L.; Karabinis, A. Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values. Appl. Sci. 2024, 14, 2609. [Google Scholar] [CrossRef]
Alhassan, R.K.; Duku, S.O.; Janssens, W.; Nketiah-Amponsah, E.; Spieker, N.; van Ostenberg, P.; Arhinful, D.K.; Pradhan, M.; Rinke de Wit, T.F. Comparison of Perceived and Technical Healthcare Quality in Primary Health Facilities: Implications for a Sustainable National Health Insurance Scheme in Ghana. PLoS ONE 2015, 10, e0140109. [Google Scholar] [CrossRef]
Signorelli, C.; Pennisi, F.; Lunetti, C.; Blandi, L.; Pellissero, G.; Fondazione Sanità Futura, W.G. Quality of hospital care and clinical outcomes: A comparison between the Lombardy Region and the Italian national data. Ann. Ig. Med. Prev. Comunita 2024, 36, 234–249. [Google Scholar] [CrossRef]
Balia, S.; Brau, R.; Marrocu, E. Interregional patient mobility in a decentralized healthcare system. Reg. Stud. 2018, 52, 388–402. [Google Scholar] [CrossRef]
Nante, N.; Guarducci, G.; Lorenzini, C.; Messina, G.; Carle, F.; Carbone, S.; Urbani, A. Inter-Regional Hospital Patients’ Mobility in Italy. Healthcare 2021, 9, 1182. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Methodological workflow.

Figure 2. Orange software workflow.

Figure 3. Box plots of kilometric mobility for the three hospital clusters.

Figure 4. Confusion matrix.

Figure 5. ROC curve for target class LOW.

Figure 6. ROC curve for target class MEDIUM.

Figure 7. ROC curve for target class HIGH.

Figure 8. Representation SHAP values quantifying feature impacts on model output.

Figure 9. SHAP analysis for predicting mobility class.

Table 1. Contingency table: Relationship between hospital network nature and hospital level in the Apulia and Emilia-Romagna Region.

			Level
Region	Network		Base Level	First Level	Irccs	Private Nursing Homes	Second Level	Total
Apulia	Private	N.	0	4	2	24	0	30
	Private	%	0.0	6.8	3.4	40.7	0.0	50.8
	Public	N.	9	13	2	0	5	29
	Public	%	15.3	22.0	3.4	0.0	8.5	49.2
	Total	N.	9	17	4	24	5	59
	Total	%	15.3	28.8	6.8	40.7	8.5	100
Emilia Romagna	Private	N.	27	0	1	15	3	46
		%	39.7	0.0	1.5	22.1	4.4	67.6
	Public	N.	1	6	3	0	12	22
		%	1.5	8.8	4.4	0.0	17.6	32.4
	Total	N.	28	6	4	15	15	68
		%	41.2	8.8	5.9	22.1	22.1	100
Total	Private	N.	27	4	3	39	3	76
		%	21.3	3.1	2.4	30.7	2.4	59.8
	Public	N.	10	19	5	0	17	51
		%	7.9	15.0	3.9	0.0	13.4	40.2
	Total	N.	37	23	8	39	20	127
		%	29.1	18.1	6.3	30.7	15.7	100

Table 2. χ² test.

Region		Value	df	p
Apulia	χ²	42.8	4	<0.001
Apulia	N	59
Emilia Romagna	χ²	49.2	4	<0.001
Emilia Romagna	N	68
Total	χ²	64.5	4	<0.001
Total	N	127

Table 3. Descriptive statistics for 8 numeric features.

	Mean	Median	Maximum
BEDS	210.9	103	1291
DEPARTMENTS	15.6	7	135
HOSPITAL STAFF	728.9	302	5493
DEATHS	1057.7	199	8493
INTERVENTIONS	748.3	388	4658
READMISSIONS	446.0	281	2540
NURSES	325.1	91	2599
PHYSICIANS	151.1	78	903

Table 4. Frequencies of target variable mobility level.

MOBILITY LEVEL	Counts	% of Total	Cumulative %
HIGH	17	13.4%	13.4%
LOW	81	63.8%	77.2%
MEDIUM	29	22.8%	100.0%

Table 5. Numerical SHAP values quantifying feature impacts on model output.

FEATURE	MEAN	STD	$λ_{i}$
DEATHS	0.128	0.012	0.575
PHYSICIANS	0.042	0.014	0.148
NURSES	0.037	0.011	0.130
BEDS	0.023	0.008	0.080
DEPARTMENS	0.016	0.008	0.050
INTERVENTIONS	0.012	0.08	0.007
READMISSIONS	0.010	0.007	0
HOSPITAL STAFF	0.011	0.009	0.010

Table 6. Descriptive statistics of DEA efficiency scores.

							Shapiro–Wilk
	N	Mean	Median	SD	Minimum	Maximum	W	p
DEA EFFICIENCY SCORES	127	0.893	0.950	0.125	0.387	1.00	0.824	<0.001

Table 7. DEA efficiency scores by hospital level.

	LEVEL	N	Mean	Median	SD	Minimum	Maximum
DEA EFFICIENCY SCORES	BASE LEVEL	37	0.897	0.906	0.094	0.669	1.00
DEA EFFICIENCY SCORES	FIRST LEVEL	23	0.837	0.842	0.160	0.387	1.00
	IRCCS	8	0.922	0.969	0.129	0.620	1.00
	PRIVATE NURSING HOMES	39	0.973	0.999	0.047	0.802	1.00
	SECOND LEVEL	20	0.784	0.748	0.129	0.515	1.00

Table 8. DEA efficiency scores by hospital network.

	NETWORK	N	Mean	Median	SD	Minimum	Maximum
DEA EFFICIENCY SCORES	PRIVATE	76	0.936	0.977	0.0872	0.669	1.00
DEA EFFICIENCY SCORES	PUBLIC	51	0.830	0.875	0.1453	0.387	1.00

Table 9. Kruskal–Wallis by hospital network.

	χ²	df	p
DEA EFFICIENCY SCORES	21.0	1	<0.001

Table 10. Kruskal–Wallis by hospital level.

	χ²	df	p
DEA EFFICIENCY SCORES	36.4	4	<0.001

Table 11. Dwass–Steel–Critchlow–Fligner pairwise comparisons.

		W	p
BASE LEVEL	FIRST LEVEL	−1.63	0.780
BASE LEVEL	IRCCS	1.81	0.703
BASE LEVEL	PRIVATE NURSING HOMES	5.57	<0.001
BASE LEVEL	SECOND LEVEL	−4.33	0.019
FIRST LEVEL	IRCCS	2.05	0.593
FIRST LEVEL	PRIVATE NURSING HOMES	5.16	0.002
FIRST LEVEL	SECOND LEVEL	−2.38	0.446
IRCCS	PRIVATE NURSING HOMES	1.11	0.936
IRCCS	SECOND LEVEL	−3.35	0.124
PRIVATE NURSING HOMES	SECOND LEVEL	−7.35	<0.001

Table 12. Performance parameters of prediction models.

Model	AUC	CA	F1	Prec	Recall	MCC
Logistic Regression	0.969	0.876	0.869	0.871	0.876	0.758
SVM	0.959	0.876	0.866	0.874	0.876	0.760
Random Forest	0.957	0.865	0.865	0.865	0.865	0.739
Neural Network	0.953	0.865	0.859	0.858	0.865	0.735
kNN	0.945	0.865	0.865	0.865	0.865	0.741
Gradient Boosting	0.930	0.876	0.874	0.873	0.876	0.759
Naïve Bayes	0.924	0.787	0.791	0.814	0.787	0.629
AdaBoost	0.870	0.854	0.855	0.858	0.854	0.726

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Santamato, V.; Tricase, C.; Faccilongo, N.; Iacoviello, M.; Pange, J.; Marengo, A. Machine Learning for Evaluating Hospital Mobility: An Italian Case Study. Appl. Sci. 2024, 14, 6016. https://doi.org/10.3390/app14146016

AMA Style

Santamato V, Tricase C, Faccilongo N, Iacoviello M, Pange J, Marengo A. Machine Learning for Evaluating Hospital Mobility: An Italian Case Study. Applied Sciences. 2024; 14(14):6016. https://doi.org/10.3390/app14146016

Chicago/Turabian Style

Santamato, Vito, Caterina Tricase, Nicola Faccilongo, Massimo Iacoviello, Jenny Pange, and Agostino Marengo. 2024. "Machine Learning for Evaluating Hospital Mobility: An Italian Case Study" Applied Sciences 14, no. 14: 6016. https://doi.org/10.3390/app14146016

APA Style

Santamato, V., Tricase, C., Faccilongo, N., Iacoviello, M., Pange, J., & Marengo, A. (2024). Machine Learning for Evaluating Hospital Mobility: An Italian Case Study. Applied Sciences, 14(14), 6016. https://doi.org/10.3390/app14146016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning for Evaluating Hospital Mobility: An Italian Case Study

Abstract

1. Introduction

2. Background

2.1. Application Context

2.2. Overview of Hospital Infrastructure in Apulia and Emilia-Romagna

3. Materials and Methods

3.1. Data Preprocessing

3.2. Cluster Analysis

3.3. Prediction Model

3.4. Data Measurements

3.5. Data Envelopment Analysis

4. Experimental Results and Discussions

4.1. Feature Contributions to the Hospital Mobility Model

4.2. Specific Feature Contributions by Target Class in the Hospital Mobility Model

4.3. Evaluating Hospital Efficiency in Healthcare Mobility

4.4. Experiments

4.5. Impact of Machine Learning on Hospital Mobility: Perspectives and Challenges

4.6. Machine Learning Policies for Optimizing the Italian Healthcare System

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI