Identification of Health Expenditures Determinants: A Model to Manage the Economic Burden of Cardiovascular Disease

The purpose of this paper is to investigate the determinants influencing the costs of cardiovascular disease in the regional health service in Italy’s Apulia region from 2014 to 2016. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were collected from the hospital discharge registry. Generalized linear models (GLM), and generalized linear mixed models (GLMM) were used to identify the role of random effects in improving the model performance. The study was based on socio-demographic variables and disease-specific variables (diagnosis-related group, hospitalization type, hospital stay, surgery, and economic burden of the hospital discharge form). Firstly, both models indicated an increase in health costs in 2016, and lower spending values for women (p < 0.001) were shown. GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Day-hospital has the lowest cost, surgery increases the cost, and AMI is the most expensive pathology, contrary to AF (p < 0.001). Secondly, AIC and BIC assume the lowest values for the GLMM model, indicating the random effects’ relevance in improving the model performance. This study is the first that considers real data to estimate the economic burden of CVD from the regional health service’s perspective. It appears significant for its ability to provide a large set of estimates of the economic burden of CVD, providing information to managers for health management and planning.


Introduction
The ongoing evolution of demographic dynamics, and the consequent modification of the population's health needs, with a growing share of elderly patients and those with chronic diseases, requires health systems to be structurally and organizationally redesigned [1].
Chronic diseases represent the close time horizon of the world countries, given the constantly increasing trend of risk factors [2]. Increasing costs occur when complications occur, thus emphasizing the necessity of secondary preventative approaches in health management [3]; actively preventing complications can control a disease's impacts on one's quality of life, decreasing costs, and improving the health system [4]. National health services are currently challenged to research and develop strategies, determinants, and impacts to reduce the predisposition to chronic degenerative diseases and to reduce the burden of the same on public accounts. A careful economic evaluation of the determinants reveals the paramount importance of starting new cost-effective strategies to optimize performance of health expenditure [5].
Around the world, governments, academia, and experts in health economics are faced with increased healthcare costs and are searching for tools to reform and to reduce health expenditures [6,7]. Healthcare costs are increasing more and more across all of the world's treating the disease (costs for diagnosis, treatment, and management of disease progression and patients' own costs), as well as indirect costs, such as loss in productivity resulting from time off from employment. A very small number of literature reviews have been devoted to analysis of the determinants influencing CVD costs. However, use of data and a COI model to estimate the economic burden of CVD from the regional health service's perspective has not been previously studied.
Hospital discharge data are usually used to address issues of public safety, including the identification of disease rates, patient characteristics, hospitalization costs, and outcomes for a specific disease, spreading the development of disease prevention and control programs. Discharge data can be useful to better understand hospitalization patterns for an exact area, to plan for better allocation of resources, to identify services that a population lacks, and to assess the potential impact of hospital changes [32].
The disadvantages of models used in the literature are different. First, they do not consider the intra-patient variance. Second, a perspective disadvantage is identified: In the literature, the studies used a "patient-cost" or "patient-insurance" perspective while in our research, a "regional-system" perspective is used.
This work attempts to fill this gap and aims to define the determinants that affect the costs of CVD in the regional health service of Italy's Apulia region in the 2014-2016 period by carrying out and then identifying the best model (GLM or GLMM). Therefore, the study supports the investigation field of determinants of health sector expenditure.
The present research is organized as follows: The next section defines the study design, population, and variables used in the study taking into account two models of investigation. Then, the results are presented, and discussions are argued. Finally, the work closes and concludes with limitations and implications for the healthcare sector.

Study Design
This retrospective longitudinal population-based study was conducted using a large regional health administrative database. Data for patients with acute myocardial infarction (AMI), heart failure (HF), and atrial fibrillation (AF) were extracted from electronic medical records included in the hospital discharge registry (HDR). The HDR is held at the regional epidemiological observatory in the Apulia region, in southern Italy, which has a population of about 4 million (6.67% of the Italian population). HDR collects data on discharge diagnoses (one main and up to five secondary diagnoses) and procedures of all admissions to regional health service (RHS) hospitals in the Apulia region, using the International Classification of Diseases, Ninth Edition, Clinical Modification (ICD-9 CM) coding system for diagnoses.
In addition, this registry contains information on ordinary inpatient hospitalizations and day-hospital services delivered to all patients by healthcare organizations managed by the regional service, and healthcare services provided by the independent sector that the RHS commissioned. RHS is funded by the National Health System and ensures 'unlimited coverage' to all its residents, who generally should pay only part of the costs of drugs or services (ticket). Patients who have a specific condition, such as a severe chronic disease, are exempted from co-paying.
This study was conducted from the RHS' perspective using healthcare costs. Unit costs of medical records were applied to estimate direct healthcare costs. Since RHS updates its data every three years, the last available period for carrying out the analysis was 2014-2016. All reference costs for this period were included. The approval of the ethics committee was not requested as all the data processed were treated in an aggregate manner with full respect of privacy.

Study Population
In order to estimate CVD costs and their evolution over time, a unique database matching the records extracted from data sources by using the personal anonymous ID code was created.
Any patient who, during the observational period, was discharged (excluding voluntary and inter-ward discharge) from an Apulian hospital with a diagnosis of AMI, AF, or HF was considered. The first occurrence of a diagnosis code in at least one of the six discharge diagnosis fields was identified as the inclusion criterion. The following groups of ICD-9-CM codes were considered: (1) 410 for AMI; (2) 427.31 and 427.32 for AF; (3) 428 for HF [33]. The description of the ICD-9-CM codes for each disease of interest is shown in Table 1. Residents that were hospitalized outside the regional territory of competence were excluded from the analysis. Furthermore, according to Italian legislation, the principal diagnosis coding needs to be based on the health status diagnosed at the end of hospitalization. It represents the main cause of some treatments and/or diagnostic tests used and is mainly responsible for one's use of health resources [34]. The index date was identified as the date of occurrence of the first criterion above.

Study Variables
The hospital discharge registry (HDR) collects data of all hospitalizations and represents a valid tool for clinical-cost assessments of the use of healthcare resources. In the HDR, the diagnosis-related group (DRG) code appears to be the key element of the records of each patient admitted to a care organization. The DRG code refers to a system that classifies all patients discharged from a care organization into homogeneous groups by absorption of committed resources. This code is attributed by a software called "DRG-grouper" and was developed to classify all medical records, to define the categories based on the clinical and demographic information collected for each hospitalization through the hospital discharge form, and to identify each subject to whom information is attributed on diagnoses, any surgical interventions and diagnostic procedures, age, and on the hospital-discharge type [35].
In this study, the only dependent variable is the economic burden of the hospital discharge form (hereafter, hdf_value). Hdf_value is a variable strictly correlated with the DRG. It represents an economic and monetary appraisal (in Euros) of diverse healthcare services involving the patient's life during the hospitalization, such as practitioner consultations, inpatient stays, emergency department visits, laboratory and imaging tests, etc.

Explanatory Variables
Two types of variables were used: socio-demographic (gender, age, citizenship, residential location in the region and the province), and disease-specific variables. Citizenship was assigned according to the ISTAT country code. The list of foreign countries, with the related codes, is constantly updated following changes in the global geopolitical configuration. Each foreign country is identified by the statistical code and the geographical name (source: https://www.istat.it/it/archivio/6747 accessed on 2 March 2021). The sample's citizenship was identified with the following codes: (1) European citizenship; (2) African citizenship; (3)  Second, the disease-specific variables AMI, AF, and HF were associated with every patient. Each of the variables implies healthcare costs associated with inpatient care. The variables considered were: DRG, hospitalization-type, hospital stay, surgery (presence or absence), and hdf_value. The healthcare costs from the RHS viewpoint were calculated. Costs were computed using charges that the RHS reimbursed to the healthcare providers.
The economic evaluation of the HDR is given by the monetary value (in Euros) assigned to each DRG. DRG is a discrete variable that, as aforementioned, represents the hospital's "final product" classification tool.
Hospitalization-type is a dummy variable for classifying patients based on the type of hospitalization. This variable includes either the ordinary hospitalization option or the day-hospitalization one.
Hospital stay is a day-count variable and indicates the patient's stay in the healthcare organization.
Surgery is a dummy variable indicating whether a patient has undergone surgery. A more detailed explanation of all the variables used in the study can be deepened in Supplementary Materials Table S1. Data on prescription pharmaceutical costs during inpatient stays were not available, so were not included.

Statistical Models and Analysis
Descriptive statistics for each variable were performed. The Kruskal-Wallis test was carried out to determine if there were statistically significant differences between categorical variables on hdf_value. In order to analyze the determinants influencing the economic burden of the hospital discharge form, two models were considered: The generalized linear model (GLM) (model 1) and generalized linear mixed model (GLMM) (model 2), being in the presence of heteroscedasticity and random effects [36]. In addition, the distribution of expenses is often skewed to the right because of a moderate number of hospitalizations that involve high expenses (Figure 1), and these models allow use of the link log function. economic burden of the hospital discharge form, two models were considered: The generalized linear model (GLM) (model 1) and generalized linear mixed model (GLMM) (model 2), being in the presence of heteroscedasticity and random effects [36]. In addition, the distribution of expenses is often skewed to the right because of a moderate number of hospitalizations that involve high expenses (Figure 1), and these models allow use of the link log function. The GLM (model 1) generalizes a linear model and is composed of two fundamental elements. The first is the link function between the expected value of the result and the linear predictor, and the second one is the variance function, in which the variance is expressed as a function of the mean. To explain the structure of the GLM, the linear model starts from:

=
(1) where μ = Xβ. In order to generalize the model indicated in Equation (1), three parts must be considered. The first one is the random part, according to which the components of Y have independent Gaussian distributions with E(Y) = μ and constant variance σ 2 . The second part is the systematic component in which the covariates x1, x2, ... xh yield a linear predictor η, given by: Finally, the third part is the link between the random and the systematic component: Given ηi = g(μi), g (.) is the link function (for example, normal, gamma, binomial, Poisson, etc.). The distribution family, based on the modified Park test (λ), was chosen. According to Park [37], if the value of the test gives a coefficient λ = 0, it indicates that the The GLM (model 1) generalizes a linear model and is composed of two fundamental elements. The first is the link function between the expected value of the result and the linear predictor, and the second one is the variance function, in which the variance is expressed as a function of the mean. To explain the structure of the GLM, the linear model starts from: where µ = Xβ. In order to generalize the model indicated in Equation (1), three parts must be considered. The first one is the random part, according to which the components of Y have independent Gaussian distributions with E(Y) = µ and constant variance σ 2 . The second part is the systematic component in which the covariates x 1 , x 2 , . . . x h yield a linear predictor η, given by: Finally, the third part is the link between the random and the systematic component: Given is the link function (for example, normal, gamma, binomial, Poisson, etc.). The distribution family, based on the modified Park test (λ), was chosen. According to Park [37], if the value of the test gives a coefficient λ = 0, it indicates that the best choice is the Gaussian family; if λ = 1, the Poisson family is the best choice; if λ = 2, the gamma family; and if λ = 3, the inverse Gaussian family. In this study, the coefficient was 2, thus this sample follows the gamma distribution.
The second model applied was the GLMM (model 2). The GLMM is an extension of the GLM. The GLM includes in the mathematical model only the fixed effects (in this case, age, year, DGR, etc.), estimating their influence on the dependent variable (hdf_value). Instead, the GLMM considers, in addition to the fixed effects, the random effects, i.e., blocks in observational studies or experiments, which are replicated across sites or times [38]. The random effects of the GLMM were the patients, since the variability of costs may depend on the intrinsic characteristics of the patient, which enter the register several times. The general form of the mixed model is: where Y is the vector of observations, X is the matrix of known constants associated with the fixed effects, β is the vector of fixed effects, Z is the matrix of known constants associated with the random effects, u is the vector of random model effects, and e is the vector of random errors [39].
Both model 1 and model 2 were specified as in the gamma family and with link log. The goodness of fit of the models was tested by the Akaike information criterion (AIC) and Bayesian information criterion (BIC); the best model is the one that minimizes these values. All statistical analyses were conducted in Stata 14.0 (Stata Corp LP, College Station, TX, USA). A critical value of p < 0.05 was specified a priori as the statistical significance threshold for all analyses.

Sample Characteristics
In total, 98,829 eligible subjects identified. The sample was aged 75.46 years (1-107) on average, and consisted of 52.8% males and 47.2% females. Most of the patients were Italian (99.1%) and 0.4% were of European nationality. As regards the pathologies investigated, AF was most commonly detected among the Apulian residents (39.5%). Instead, the patients analyzed whom presented two of the three pathologies at the same time were suffering from HF and AF (17.7%), HF and AMI (3.8%), and AF and AMI (1.6%). Only 1.3% of the sample had all three diseases at the same time (Table 2).
Finally, Figure 2 (single disease) and Figure 3 (double disease) show the hdf_value errorbar charts with 95% confidence intervals, by sex and year, highlighting a gender gap in all years, especially for AF&AMI. Regarding gender, except in 2014, hdf_value was higher for males than females.

GLM and GLMM Models
To evaluate the determinants that influence the costs, two models were considered. Table 4 shows the results of the GLM and GLMM models. For each of them, a gamma distribution and log link were performed. Both the GLM and GLMM models showed similar results.
Compared to 2014, the models indicate that in 2016, hdf_value recorded an increase (for GLM and GLMM models p < 0.001). The spending is on average lower for females (all models p < 0.001). GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Although the category under 20 is the most expensive in terms of care (Table 3), young patients have very little impact on costs in general, being a category that suffers very little from this type of pathology. On the other hand, CVDs greatly increase in frequency in old age, with a greater impact on costs, as evidenced by both models. As expected, due to the nature of the covariates, with the increase of the DRG and hospital stay, there is a significant increase in the hdf_value (both models p < 0.001). Regarding the hospitalization type, the day-hospital modality results in a lower cost than the ordinary hospitalization modality (both models p < 0.001) and surgery increases the cost (both models p < 0.001). As for the pathologies, considered individually, the most expensive is AMI, while the cheaper is AF (for both models p < 0.001). The two models agree on two important aspects: The absolute values of the coefficients for each determinant are often very similar, and the coefficients are often significant in both models. These results' concordance indicates robustness, confirming that the included determinants have significant relevance in costing. Comparing the goodness of fit, the

GLM and GLMM Models
To evaluate the determinants that influence the costs, two models were considered. Table 4 shows the results of the GLM and GLMM models. For each of them, a gamma distribution and log link were performed. Both the GLM and GLMM models showed similar results.
Compared to 2014, the models indicate that in 2016, hdf_value recorded an increase (for GLM and GLMM models p < 0.001). The spending is on average lower for females (all models p < 0.001). GLMM indicates a significant increase in health expenditure with increasing age (p < 0.001). Although the category under 20 is the most expensive in terms of care (Table 3), young patients have very little impact on costs in general, being a category that suffers very little from this type of pathology. On the other hand, CVDs greatly increase in frequency in old age, with a greater impact on costs, as evidenced by both models. As expected, due to the nature of the covariates, with the increase of the DRG and hospital stay, there is a significant increase in the hdf_value (both models p < 0.001). Regarding the hospitalization type, the day-hospital modality results in a lower cost than the ordinary hospitalization modality (both models p < 0.001) and surgery increases the cost (both models p < 0.001). As for the pathologies, considered individually, the most expensive is AMI, while the cheaper is AF (for both models p < 0.001). The two models agree on two important aspects: The absolute values of the coefficients for each determinant are often very similar, and the coefficients are often significant in both models. These results' concordance indicates robustness, confirming that the included determinants have significant relevance in costing. Comparing the goodness of fit, the GLMM model showed better adequacy (lower values of AIC and BIC). This indicates that the patient has an important role in the variability of the phenomenon, improving the performance of the model.

Conclusions
The identification of the best model allowed identification of the determinants and their weight influencing the costs of cardiovascular disease by adopting a regional health perspective and using regional data for the period 2014-2016 in the Apulia region (Italy).
As highlighted by the research, although the health administrative databases were not generated with the aim of assessing the economic impact of diseases, they are effective tools for these purposes, as they can provide correct information directly from target populations (not just from samples) observed for related follow-up in a short time period [16]. The research supports the investigation scope of health expenditure determinants from two aspects. First, it represents a surplus compared to the other studies already present in the literature, because this research takes into account determinants of health expenditure not considered in the literature, by inserting them into complex and flexible statistical models, such as the GLMM. They allow for a data structure-oriented model among local geographic territories rather than a standard and inflexible model for investigation, providing information to managers for health planning. Secondly, since most literature has focused on determinants of health expenditures involving national empirical data [27], this study undertook for the first time a comprehensive analysis with real Apulian data using a COI technique.
Specifically, the study focused on three types of CVD-AMI, HF, and AF-that are widespread in the Apulia region. These were selected for different reasons. The AMI pathology in Apulia is classified first for low mortality. This information is confirmed by the data presented here, as the lowest number of patients with AMI disease was identified. This excellent performance, given by the low mortality indicator, is closely related to the efficiency of the service provided by the emergency medicine-urgency departments in the Apulia region. It, therefore, expresses the presence of a regional model of clinical efficacy and management efficiency.
As for HF, the increase in hospitalizations for heart failure has been confirmed by other studies [40]. Data shows that the most worrying problem for health management is the economic and organizational impact on the national health system that this heart disease causes. For this reason, in order to reduce the high rate of early hospitalization and its economic burden on the regional health service, the organizational forecasting of management paths for HF patients is a necessary activity.
The third heart disease object of this study, AF, is a disease destined to increase dramatically over time, reflecting Italy's demographic trend. From the EUROSTAT data, it can be deduced that, in the elderly population of the 28 European Union countries, the prevalent cases in 2016 were 7.6 million, and they will increase exponentially, reaching 14.4 million in 2060 [15]. These analyses were confirmed in the present study, since the majority of the patients analyzed were affected by AF (39.5% of the sample).
GLM and GLMM models were applied to achieve the main objective of the study, which was to evaluate the determinants that influence the costs. The main findings are provided as follows.
All statistical models found agreement in analyzing which determinants caused an increase and decrease in the hdf_value. The models confirmed that in 2016, hdf_value increased, and, as the hospitalization stay increases, the hdf_value increases. In addition, all models also confirmed some determinants that led to a decrease in the hdf_value: female patients as well as the cost of hospitalization in day-hospital generated cost savings. As for the analysis by disease, HF and AMI caused greater expense than AF, which is the cheapest. This result suggests that much should be done, especially in the prevention of AMI and HF diseases. Prevention campaigns that suggest healthier lifestyles and more accurate doctor-patient relationships in the pre-hospital setting could guarantee a lower economic burden on public spending centers. Furthermore, the models confirmed that variable surgery has the greatest impact on the increase of the hdf_value (GLM: 0.3370466, GLMM: 0.2874097, p < 0.001), compared to the methods of carrying out day-hospital-care, which allows the greatest savings (GLM: −0.5666091, GLMM: −0.6481928, p < 0.001). Day hospitalization should also be encouraged compared to ordinary hospitalization when surgery is not necessary. Indeed, increasing the number of day hospitals for each patient, at the expense of longer hospitalizations, would allow better prevention without leading to more serious illness situations.
The comparison of the two models highlighted that the best model is the GLMM, as the Akaike and Bayesan information criteria have the lowest values (Table 4). Since the GLMM model is the only one that considers the random effect and this is the patient, it means that the economic burden sustained by the region is linked to the patient's access to the regional health service. This is because each patient's identity is different from the other. Moreover, by investigating the results of descriptive statistics and statistical models, it can be argued that female patients generate cost savings. Although Mosca et al. [41] state that women experience greater morbidity and mortality than men after being diagnosed with cardiovascular disease, they are able to generate savings for the regional health service. In other words, decades of research shows that early identification in women, and particular attention to risk factors, such as high blood pressure, high cholesterol levels, obesity, family history of cardiovascular disease, and diabetes, can substantially reduce disease economic burden and deaths. Lifestyle changes, smoking cessation, and the right use of drugs are proven mainstays of women's habits that allow greater cost-effectiveness for the regional health service [41,42]. In addition, this result confirms the validity of the recent approach to gender medicine, since significant gender disparities are prevalent in presentation, management, and outcomes of adults with cardiovascular diseases [43,44].
The present study has some potential limitations. First, the comparison with other studies in the literature regarding the self-explanatory results of this study poses limitations due to the originality of our study. Since, to our knowledge, no other study in the literature has used models in the same way, comparing our results with those obtained in other studies has limitations. Furthermore, the level of reimbursement given to health organizations is established first by the national health system, and, secondly, by the loans that the national health system has provided to the individual regions. This practice can be a limitation as it is another way in which overall costs are assessed. Third, some healthcare costs were probably underestimated, due to the absence of the following information in the administrative databases: costs related to cardiovascular risk factors (hypertension, diabetes, smoking, drug abuse, atrial fibrillation, family history, and previous personal history of CVD); costs related to outpatient consultations, laboratory tests, and imaging and diagnostic tests; costs related to prescribed medications; costs related to the use of intensive care units; costs related to the performance of physiotherapy sessions; and post-discharge costs. Finally, the costs of visits to general practitioners (GPs) were not considered. Almost all residents in Italy are enrolled with a GP, from whom they receive primary care in case of need. GPs are paid on a capitation basis, regardless of the actual care provided to their patients; therefore, it is not possible to quantify the costs attributable to visits for cardiovascular reasons. As future research, it could be useful to integrate administrative databases with new ones in which other types of data, i.e., costs related to inpatient and outpatient consultations, medications, and care treatments, are collected and registered, in order to generate more complete data sources for assessment of the economic burden of diseases.
In conclusion, the findings of this study are of interest to the public healthcare sector, as economic evaluation is one of the most important components of studies focused on public health management. Public organizations of the health system fulfill a crucial role for the provision of public services to citizens [45]. The "public service" concept includes all goods and services that the public administration recognizes as being of social utility and therefore ensures their production, distribution, and supply to guarantee free use to the community [46]. Although public services are very different from each other in technical characteristics and delivery methods, they together respond to a need considered public [47]. Ensuring the well-being of citizens, such as supporting campaigns for cardiovascular risk prevention, is one of the fundamental objectives of the public administration [48]. These organizations, by supporting important prevention campaigns, play a fundamental role in guaranteeing huge savings for the public health system [49]. The statistical models analyzed represent a valid decision-making tool both to assess the value-related determinants influencing diseases and to improve the economic management of the health system as a whole.

Funding:
Published with a contribution from 5 × 1000 IRPEF funds in favor of the University of Foggia, in memory of Gianluca Montel.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki. The data was used in aggregate form. No data attributable to a subject was used.

Conflicts of Interest:
The authors declare no conflict of interest.