Multilevel Zero-One Inflated Beta Regression Model for the Analysis of the Relationship between Exogenous Health Variables and Technical Efficiency in the Spanish National Health System Hospitals

Background: This article proposes a methodological innovation in health economics for the second stage analysis of technical efficiency in hospitals. It investigates the relationship between the installed capacity in regions and hospitals and their ownership structure. Methods: A multilevel zero-one inflated beta regression model is employed to model pure technical efficiency more adequately than other models frequently used in econometrics. Results: Compared to publicly managed hospitals, the mean efficiency index of hospitals with public-private partnership (PPP) formulas was 4.27-fold. This figure was 1.90-fold for private hospitals. Concerning the efficiency frontier, the odds ratio (OR) of PPP models vs. public hospitals was 42.06. The OR of private hospitals vs. public hospitals was 8.17. A one standard deviation increase in the percentage of beds in intensive care units increases the odds of being situated on the efficiency frontier by 50%. Conclusions: The proportion of hospital beds in intensive care units relates to a higher chance of being on the efficiency frontier. Hospital ownership structure is related to the mean efficiency index of Spanish National Health Service hospitals, as well as the odds of being situated on the efficiency frontier.


Introduction
Fluctuations in the demand for healthcare have an impact on the efficiency of hospitals, which has created uncertainty about the optimal use of installed capacity [1][2][3]. This uncertainty is most intense in hospital emergency departments [4][5][6] and emergency healthcare services [6,7]. Both are services on the front line of hospital care, which influences the composition of hospital caseloads [8], and requires a provision of resources which is highly dependent on uncertain fluctuations in demand. This may explain, in part, the observed differences in hospital efficiency [1,9]. Primary care services also play an important role as gatekeepers to the health care system. They indirectly influence the complexity of hospital caseloads and their expenditure levels by avoiding unnecessary and costly hospital diagnostic and therapeutic procedures [10,11].
Traditionally, the literature on hospital efficiency differentiates between the characteristics of hospitals (such as technological provision, teaching nature of the center, rural or urban location, type of ownership) [12][13][14], and those of the environment in which they operate (region, gross domestic product per capita, population density) [13][14][15][16]. However, there is no previous research which analyzes the relationship between hospital efficiency and the characteristics of the emergency department and primary care structure as a whole. Only one study in Germany has evaluated the relationship between emergency departments and the technical efficiency of hospitals [16]. The study used a two-stage double bootstrap data envelopment analysis approach with truncated regression and found a negative relationship between the dispersion of emergency departments, their caseload, and hospital efficiency.
The relationship between hospital efficiency and ownership structure has been studied more recently [17,18]. Studies in European countries have evaluated the technical efficiency of public and private hospitals with data envelopment analysis (DEA) [19][20][21][22][23][24], which has led to ambiguous and contradictory results. The regulatory and management framework to which they are subject presents as more relevant than ownership be it public or private [14,25]. The efficiency of the various public-private partnership (PPP) formulas, which have experienced significant growth in recent years, has also been investigated [26][27][28][29]. PPP is defined as a long-term contract between a private actor and a government agency to provide a public asset or service, in which the private actor assumes the risks and management responsibility [30,31]. The available evidence on the efficiency of PPP models compared to other hospital management structures is limited and contradictory [26,[32][33][34][35][36][37][38][39][40][41].
In the literature on hospital efficiency, the predominant approach is a combination of non-parametric frontier models, particularly DEA, with different types of multivariate regressions (Tobit regression, ordinary least squares regression, logistic regression, and truncated regression) employed as second-stage analyses which relate the level of technical efficiency obtained to different hospital characteristics [42,43].
Recently, the use of multilevel or hierarchical models, both linear and logistic, has made it possible to incorporate variables from hospital contexts, modelling intra-and interlevel variability. These models allow for the estimation of the effect of explanatory variables (both hospital and contextual), as well as studies of the variability among hospitals and among the contexts in which they are located [13][14][15][44][45][46][47]. However, a common problem is inappropriateness of efficiency scores in a normal distribution function, which can lead to goodness-of-fit problems in the linear model. Pure technical efficiency (PTE) is a continuous variable, with values restricted to the interval (0, 1). Furthermore, the units of analysis that reach the efficiency frontier show accumulation points at value 1 ( Figure 1).
These characteristics of PTE mean that its density function, f (PTE), can be represented by a one-inflated Beta(α 1 , α 2 ) distribution, α 1 > 0 y α 2 > 0. This density function is integrated in the zero-one inflated family distributions, with the expression: where ω 01 is the probability of zero-one inflation (probability that the efficiency variable takes the value 0 or the value 1), ω 1 is the conditional probability of inflation one (probability that the efficiency variable takes the value 1 instead of 0) and f Beta (PTE) is the density function of the Beta(α 1 , α 2 ), distribution, with mean u = α 1 α 1 +α 2 [48]. This probability distribution is the basis for the zero-one inflated beta regression model, which allows variables of this type to be modelled [49]. The model assumes that the dependent variable has a mixed distribution, consisting of two distributions: a continuous one, described by the beta distribution, and a binary one, with probability mass at zero or one. The density function of the beta distribution can take multiple forms, depending on the two parameters which define it. This flexible quality allows for an optimal data fit. The parameters of the mixture are modelled in terms of the regression coefficients, which can be estimated using maximum likelihood methods or Bayesian inference. Bayesian inference is usually preferred for this type of model due to the complexity of the estimates and the convergence difficulties of the maximum likelihood method [50].
The zero-one inflated beta regression model can be extended to a multilevel model, where the information is clustered hierarchically at several levels. This is a recent line of research in the field of mathematical statistics, which is currently under development. In this paper we have carried out such a multilevel extension, proposing, for the first time, a multilevel zero-one inflated beta regression model for PTE second stage analysis. The use of this model has permitted analysis of two strategic issues in health policy design. Firstly, the investigation of the relationship of hospital efficiency to installed capacity at both the hospital and regional levels to cope with potential sudden increases in demand due to exogenous global health shocks. Secondly, to study how the ownership structure of the centers affects both the level of technical efficiency of the hospital and the probability of being located on the efficiency frontier.

Decision Making Units (DMUs)
This empirical analysis was performed on a set of general hospitals of public, private, or PPP ownership which make up the Spanish National Health System (SNHS). The exclusion criteria were: having less than 50 beds, not having registered activity in the emergency department and not having information presented on all inputs and outputs considered. In addition, hospitals located in autonomous cities (Ceuta and Melilla) were excluded due to the particularities of these territories. Under these criteria, the group of hospitals analyzed consisted of 173 centers.

Variables
The inputs used for DEA are purchases and external services acquired, installed beds, healthcare personnel and non-health personnel. To avoid larger hospitals being penalized, the purchases and external services variable was standardized and refers to expenditure per bed [51]. The outputs used are total adjusted case-by-case discharges and outpatient activity ( Table 1). a Annual average of the installed provision, regardless of whether or not they have actually been in operation throughout the year. b Includes number of full-time professionals and number of part-time professionals (calculating part-time work at 50%). c In Euros. To avoid larger hospitals being penalized, the purchases and external services variable was standardized and refers to expenditure per bed. d Hospital discharges adjusted by applying the average Spanish weight (also known as case-mix index or casuistry index). The average weight is the weighted average of the diagnosis-related group weights of all patients in a given unit, group or a provider. e Integrates outpatient consultations attended in the hospital itself and in its peripheral specialty centers, patients treated in hospital emergency departments who do not require admission or who have voluntarily discharged themselves, transferred to another health center, or died and subsidiary surgery processes performed under general anesthesia, local anesthesia, or sedation which do not require hospital admission. Source: Prepared by the authors, based on the Statistics on Specialised Healthcare Centres (SIAE) and the Basic Minimum Hospitalisation Data Collective (CMBD-H) from the Spanish Ministry of Health, Consumption, and Social Welfare.
As independent variables potentially related to PTE, this study includes a set of exogenous hospital and regional variables ( Table 2). In each hospital, the training of specialists and the availability of resources in hospital emergency departments and intensive care units (emergency physicians, beds, and their occupancy) were considered. The ownership of the health centers was also considered, differentiating between public, private, and PPP formulas. Each Spanish region includes two economic variables (public healthcare expenditure and average annual income per household), and a set of variables that characterize the provision of hospital resources (private bed, public general bed, intensive care unit public beds, and emergency physician population rates), primary care (outpatient emergency center, doctor and nurse population rates) and healthcare emergencies (emergency professionals per 1000 inhabitants) available to confront a potential pandemic or health crisis in the future (Appendix A).

Sources of Information
Data on hospitals were obtained from the Specialised Care Information System (SIAE) and the Specialised Care Activity Register (RAE-CMBD), both produced by the Spanish Ministry of Health, Consumption, and Social Welfare as part of the Government of Spain. The rest of the variables included in the analysis were obtained from the system of key indicators of the SNHS provided by the aforementioned Ministry, as well as from the statistics published by the Spanish Institute for National Statistics.

Data Analysis
The data analysis consisted of two phases: analysis of each hospital PTE by means of DEA, and a second-stage analysis to identify exogenous variables related to technical efficiency using a multilevel zero-one inflated beta regression model.

Data Envelopment Analysis
Frontier models base their methodological strategy on the explicit construction of an efficiency frontier. This efficiency frontier is determined by the DMUs whose management is considered unbeatable within the set of all those evaluated from the point of view of relative efficiency. Initially proposed by Charnes et al., mathematical programming was used to construct the technological frontier which represents best practices, and to calculate an efficiency index for each DMU. This efficiency index was obtained by comparing the position of each with respect to the frontier [52,53].
In line with this methodology, this study employs a BCC model, which considers variable returns to scale, to determine the value of the PTE [54], and an output maximization orientation. The latter implies that it would not be possible for efficient DMUs to increase their outputs given the inputs used. The analytical formulation of the output-oriented BCC model is: Max φ This optimization program consists of a vector of n DMUs made up of m inputs and s outputs, such that: x ij = quantity of input i consumed by DMU j (there being n DMUs). y rj = quantity of output r produced by DMU j. φ = proportion by which the outputs can be increased. λ j = intensity of DMU j in the construction of the reference DMU.
The value of φ cannot be less than one and provides information on the technical efficiency of each DMU. Specifically, φ − 1 represents the proportional increase in outputs which could be obtained by keeping inputs constant. Furthermore, 1/φ represents the relative measurement of pure technical efficiency and has a range between 0 and 1. The DMU is efficient and is located on the production frontier when the resulting value is 1. On the other hand, values lower than 1 indicate that DMUs are inefficient, meaning that, through better input management, output could be increased without changing the amount of inputs used [55].

Second-Stage Analysis
A multilevel zero-one inflated beta regression model was used to study the variables related to PTE. This modelling considers the hierarchical structure of the information, where the hospitals (level 1) are grouped by autonomous communities (level 2). Thus, for each hospital i = 1, ..., 173 in the autonomous community j = 1, ..., 17, the parameters of the function f PTE ij , described in (1), was modelled as follows: logit ω o1ij = log ω 01ij is the mean of the Beta α 1ij , α 2ij distribution for hospital i = 1, . . . , 173 in the region j = 1, . . . , 17. Furthermore, the value I ij = u ij 1−u ij can be interpreted as a mean efficiency index (MEI), where a value greater than 1 indicates that the mean efficiency of the hospital, u ij , exceeds the inefficiency, 1 − u ij .
Both the mean of the beta distribution, u ij , and the zero-one inflation probability, ω 01ij , transformed using the logit function, are modelled through a linear combination of the independent variables x (1) rij and x (2) rj representing, respectively, level 1 (r = 1, ..., l (1) ) and level 2 (r = 1, ..., l (2) ) variables. All quantitative independent variables were standardized to obtain dimensionless variables with mean zero and standard deviation one. The intercept term β 0j = β 0 + u 0j is a random effect whose error, represented by j u 0j , follows a normal distribution with mean 0 and variance σ 2 u 0 . The coefficients of the independent variables, represented respectively by β (1) r , β (2) r , and γ (1) r y γ (2) r , are fixed effects. The conditional probability of one-inflation, ω 1ij , and the precision, α 1ij + α 2ij , remain constant.
The estimation of the model parameters was performed by full Bayesian inference, assuming improper flat prior distributions for all fixed effects and normal distribution for the variance of the random error. In Markov Chain Monte Carlo (MCMC) sampling, 1000 warmup iterations and 5000 subsequent updates were used. Convergence of the estimates was ensured using four chains and a diagnostic using the parameterR, the bulk effective sample size (Bulk-ESS) and the tail effective sample size (tail-ESS). Convergence was obtained withR < 1, 01, Bulk-ESS > 400 and tail-ESS > 400 [56]. Gamma and normal a priori distributions were used for sensitivity analysis [57].
In Bayesian statistics, model coefficients are not considered fixed parameters, but random variables which have a distribution function. Bayesian inference estimates the posterior distribution function of the coefficient, taking the a priori function and the information provided by the data into account. The mean of the posterior distribution function is considered as the coefficient estimate. However, all other values of this function could also be possible values of the coefficient.
The analysis of the a posteriori distribution function of each coefficient yielded the 2.5 and 97.5 percentiles, which formed the credibility interval with 0.95 probability. In addition, the positive coefficient probability (PCP) and the negative coefficient probability (NCP) were used to test the hypothesis of a positive or negative relationship of each independent variable with the dependent variable.
Once the model is estimated, exp β (k) r is the ratio of mean efficiency indices (RMEI) among hospitals which are not on the efficiency frontier and exp γ (k) r is the odds ratio (OR) of being situated on the efficiency frontier, k = 1, 2.

Results
18.5% of the hospitals analyzed (32) were efficient, and are situated on the efficiency frontier. The mean of PTE was 0.78 (78%), i.e., for the individual levels of inputs used, the average increase in output which could be achieved with improved management was 0.22 points (22%).
Within the second-stage analysis, the multilevel zero-one inflated beta regression model showed values ofR equal to 1 for all coefficients. Furthermore, the effective sample sizes were greater than 400, indicating a good performance of the estimation method in achieving adequate convergence.
The form of hospital management and the number of resident doctors showed coefficients with credible intervals which did not contain the value 0, or had a very high positive coefficient probability in the posterior distribution function (Table 3) (Appendix B). Compared to publicly managed hospitals, the mean efficiency index of the hospitals subject to PPP formulas was 4.27-fold. This value was 1.90-fold in the case of private hospitals. A one standard deviation increase in the number of resident medical professionals per 100 faculty members increased the average efficiency index by 25%. The rest of the hospital and regional variables considered showed coefficients with credible intervals containing the value 0, or a smaller positive coefficient probability in the posterior distribution function (Table 3). For example, variables related to the installed capacity in hospitals (such as emergency physicians and operating beds in intensive care units) or in the region (such as the number of primary care physicians and nurses, emergency physicians or public beds in intensive care units per thousand inhabitants), which are necessary to cope with demand shocks, showed coefficients with credible intervals containing the value 0 or a smaller positive coefficient probability.  a Quantitative variables are standardized to obtain dimensionless variables with mean 0 and standard deviation 1. b 2.5th percentile and 97.5th percentile of the posterior distribution function of the coefficient. c Probability of positive coefficient and probability of negative coefficient in the a posteriori distribution function. d Ratio of mean efficiency indices: change in the MEI for one standard deviation increase in the quantitative independent variable, or change in the MEI for a category of the qualitative independent variable with respect to the reference category. Source: Prepared by the authors.
When modelling the probability of being on the efficiency frontier, the form of hospital management, the proportion of beds in intensive care units, and the average annual income per household showed coefficients with credible intervals not containing the value 0 or a very high positive coefficient probability in the a posteriori distribution function (Table 3) (Appendix B). On the other hand, the provision of private beds per thousand inhabitants showed a very high negative coefficient probability. The OR of PPP management formulas vs. public hospitals was 42.06. The OR of private hospitals vs. public hospitals was 8.17.
A one standard deviation increase in the percentage of beds in Intensive Care Units increased the odds of being on the efficiency frontier by 50%. The odds increased by 6.43 for every one standard deviation increase in the average annual income per household. A one standard deviation increase in private beds per thousand population in the region decreased the odds of being on the efficiency frontier by 63%. The rest of the variables showed coefficients with credible intervals containing the value 0, smaller positive coefficient probability, or small negative coefficient probability in the posterior distribution function as, for example, in this situation, we found in the percentage of occupancy in intensive care units, public beds in the region in intensive care units, or out-of-hospital emergency centers per thousand inhabitants.

Discussion
This paper is innovative in the econometric field by applying a multilevel zero-one inflated beta regression model to study the hospital and regional variables which are related to both PTE and the probability of being situated on the efficiency frontier. The multilevel zero-one inflated beta regression model fits the characteristic density function of PTE provided by the DEA, and is appropriate for analyzing hierarchically structured information. These two features make the model preferable to other alternatives used hitherto [13][14][15]45,46]. Additionally, multilevel zero-one inflated beta regression models simultaneously predict both the value of PTE and the probability of being on the efficiency frontier as a function of hospital and contextual characteristics. This provides public decision-makers with a higher level of information. Finally, the description of the a posteriori density function using percentiles and probabilities of occurrence in addition to the mean, helped to determine the relationship between each independent and dependent variable. This new approach is an important step in the methodological improvement of efficiency studies in health economics.
The second objective of this article was to analyze the relationship between the installed capacity at both a hospital and regional level and the technical efficiency of SNHS hospitals in order to assess their ability to cope with demand shocks and to evaluate whether hospital efficiency is affected by the hospital provisions in primary care and emergency healthcare resources, which are critical for dealing with these situations.
The results suggest that increasing the number of public beds in hospital intensive care units increases the probability of a hospital being situated on the efficiency frontier. Only one study in Germany assessed the relationship between emergency departments and the technical efficiency of hospitals [16], finding a negative relationship between the dispersion of emergency departments, their caseloads, and hospital efficiency. However, the studies are not comparable due to the different emergency variables used, as well as the different methods and objectives. Contrary to the German study, our results focus on the efficiency frontier, not on the relationship between technical efficiency and the characteristics of hospital emergency departments. The result obtained establishes a link between a higher number of beds in intensive care units and greater odds of being efficient. Further studies should establish whether the association found is robust and analyze possible explanatory hypotheses. The other variables analyzed, related to the ability of the installed capacity to cope with sudden global increases in demand, do not seem to be associated to the level of efficiency of hospitals.
Private ownership and PPP models performed better than public hospitals, both because they achieved higher levels of efficiency and because of the greater odds of being placed on the efficiency frontier. These results are in line with previous research in Spain [14,15,45,46]. Regional health services with centralized governance structures, no management autonomy in public hospitals, and a rigid civil servant labor regime, may partially explain these results. At the regional level, only two variables have shown a relevant probabilistic relationship to the possibility of being on the efficiency frontier: the number of private beds per thousand inhabitants and the average annual income per household.
The number of private beds per thousand inhabitants decreased by 63% the odds of being positioned on the efficiency frontier. Previous work did not find this relationship significant [47], although the divergence of results is probably due to the different estimation method used. In this sense, the results of this study are more robust when using an econometric specification in line with the DEA technical efficiency density function. The relationship found suggests that a higher private hospital supply at the regional level decreases the odds of SNHS hospitals being on the efficiency frontier, irrespective of ownership structure, i.e., a higher provision of private beds per thousand inhabitants in a region impedes the possibility of SNHS hospitals being on the efficiency frontier in that region.
The regional average annual income per household increased the odds of reaching the efficiency frontier 6.43-fold for every one standard deviation increase in the average annual income per household. This result is in line with previous results [47], and confirms the importance of a region's income level in improving the possibility of its hospitals' reaching the efficiency frontier.
There are several limitations to the analysis. The cross-sectional nature of the data limits the robustness of the results, and further studies with panel data are needed to extend this analysis. Secondly, it is possible that there are problems in the specification of the exogenous factors included in the model or that explanatory variables which could modify the impact of the variables analyzed on technical efficiency have been omitted.
As lines of future research, multilevel zero-one inflated beta regression models may be suitable for re-evaluating the second-stage analysis of the efficiency of healthcare organizations for a broad set of hospital and contextual characteristics beyond the two examples developed in this article. The effect of pandemics such as COVID-19, on hospital PTE may also be suitable for modelling where data are available. Pandemics are global demand shocks which test installed capacity at a territorial level, including primary care, emergency departments, and intensive care units, and their ability to cope with them. The relationship between these dimensions and hospital PTE in the context of a pandemic may be relevant for policy makers.

Conclusions
The statistical model proposed in this paper represents a methodological innovation for hospital efficiency studies in the field of health economics. It integrates the nonparametric frontier approach provided by DEA with a second stage analysis using a multilevel zero-one inflated beta regression model. This allows for the analysis of which variables, at both hospital and contextual level, are related to the degree of hospital efficiency, and the probability of being on the technical efficiency frontier. The method improves PTE modelling when compared to its alternatives.
The proportion of hospital beds in intensive care units is related to a higher probability of being on the efficiency frontier. Other variables related to hospital and regional installed capacity and the ability to cope with demand shocks, such as those generated by a pandemic, were not relevant from a probabilistic point of view.
Hospitals with private or PPP management formulas integrated into the SNHS have higher technical efficiency values, and higher odds of being on the technical efficiency frontier than traditional publicly owned hospitals.

Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets analysed during the current study are not publicly available and were provided to the authors by the Sub-directorate General of Health Information of the Ministry of Health, Consumption and Social Welfare of the Government of Spain, but are available from the corresponding author on reasonable request.

Acknowledgments:
To the Sub-directorate General of Health Information (Directorate General of Public Health. Quality and Innovation) of the Ministry of Health. Consumption and Social Welfare as part of the Government of Spain for its collaboration and the facilities provided to conduct the study.

Appendix B
The posterior distribution function of the coefficient of each standardized variable is shown below. Variable names starting with "Efficiency" correspond to the distribution of the coefficients in the modelling of the Mean Efficiency Index (MEI). Variable names starting with "Efficiency frontier" correspond to the distribution of the coefficients in the modelling of the odds of being placed on the efficiency frontier. Both PCP (Positive Coefficient Probability) and NPC (Negative Coefficient Probability) are obtained from these distribution functions.   Figure A1. Posterior distribution function of the coefficient of each standardized variable. Variable names starting with "Efficiency" correspond to the distribution of the coefficients in the modelling of the Mean Efficiency Index (MEI). Variable names start-ing with "Efficiency frontier" correspond to the distribution of the coefficients in the mod-elling of the odds of being placed on the efficiency frontier.