Hierarchical Bayesian Models to Estimate the Number of Losses of Separation between Aircraft in Flight

Air transport is considered to be the safest mode of mass transportation. Air traffic management (ATM) systems constitute one of the fundamental pillars that contribute to these high levels of safety. In this paper we wish to answer two questions: (i) What is the underlying safety level of ATM systems in Europe? and (ii) What is the dispersion, that is, how far does each ATM service provider deviate from this underlying safety level? To do this, we develop four hierarchical Bayesian inference models that allow us to infer and predict the common rate of occurrence of SMIs, as well as the specific rates of occurrence for each air navigation service provider (ANSP). This study shows the usefulness of hierarchical structures when it comes to obtaining parameters that enable risk to be quantified effectively. The models developed have been found to be useful in explaining and predicting the safety performance of 29 European ATM systems with common regulations and work procedures, but with different circumstances and numbers of aircraft, each managing traffic of differing complexity.


Introduction
Air transport is considered to be the safest mode of mass transportation [1]. ATM systems constitute one of the fundamental pillars that contribute to these high levels of safety. The essential function of ATM systems is to ensure that aircraft flying in the same airspace are kept separate from each other and from the ground.
The International Civil Aviation Organization (ICAO) defines ATM as "the dynamic, integrated management of air traffic and airspace including air traffic services, airspace management, and air traffic flow management-safely, economically, and efficientlythrough the provision of facilities and seamless services in collaboration with all parties and involving airborne and ground-based functions". Each country is responsible for ensuring that its airspace has an adequate ATM system.
The main objective of an ATM system is, therefore, to manage and reduce the risk of accidents. As such, the minimum separation distances between aircraft are defined as the minimum separation between two aircraft in the airspace to ensure that they do not collide with one other. The ATM is responsible for ensuring that these separation minima between aircraft are not violated and, in this way, managing the risk of collision between aircraft.
The safety of ATM systems has steadily improved over the last decades thanks to a number of measures including better equipment, technologies, and work procedures, as well as the deployment of additional safety barriers. However, the sustained growth of air transport imposes increasing demands and requirements on the safety, capacity, and efficiency of the ATM system. ATM systems are currently facing their greatest challenge to date, namely the evolution towards multi-faceted, hyper-dimensional, highly distributed, and mutually dependent systems, with levels of complexity that were unimaginable just a few decades ago [2]. The Single European Sky ATM Research (SESAR) project proposes a new paradigm for the Data on the number of losses of separation between aircraft is scarce and incomplete, partly due to their low occurrence and partly due to the sensitive nature of this information. Furthermore, the statistical models and techniques applied in the sector do not exploit the potential of the most advanced inference methods. It is of utmost importance that the most advanced techniques and methods are widely used to infer the rates of occurrence of safety events and to predict adverse safety effects.
In this paper, we wish to answer two questions: (i) What is the underlying safety level in Europe? and (ii) What is the dispersion, that is, how far does each ATM service provider deviate from this underlying safety level? To do this, we will use hierarchical Bayesian inference models that allow us to infer and predict the common rate of occurrence of SMIs, as well as the specific rates of occurrence for each ANSP.
Studies and models regarding SMIs have focused, over the past few years, on technical aspects and human error as main causes of unsafe situations [10,11]. Some authors have partially analyzed some design aspects such as traffic mix [5,12,13], relative geometry between aircraft [14], complexity of airspace sectors [4], air traffic controller workload, or flight efficiency. However, none of these approaches incorporate predictive capabilities, nor allow benchmarking between ANSP. Additionally, given the high level of safety in the industry, SMIs respond at very low frequencies of occurrence, and it is very difficult have a sufficient number of data for relevant conventional statistical analysis [15][16][17].
Hierarchical and Bayesian models have been used in recent decades to overcome these limitations and to predict safety incidents in other modes of transport, specifically in road transport [18]. However, within the aviation context, there is yet little research in this area [19].
Bayesian models are useful for performing this type of analysis, because they permit inference of the statistical parameters and the distributions being studied in spite of the fact that little data is available [20,21]. They allow a priori knowledge about the phenomenon under study to be incorporated [22][23][24]. They also permit various levels or hierarchies to be considered when explaining the different phenomena. Furthermore, they allow information from different sources to be integrated in an orderly manner [22]. As such, in this article, we propose and evaluate several Bayesian models that consider the hierarchical relationship between the ANSPs and illustrate the underlying mechanism in the generation of SMIs from explanatory variables that represent the main local differences.
These models are intended to explain and predict the safety performance of 29 European organizations, with common regulations and work procedures, but with different circumstances and numbers of aircraft, each managing traffic of differing complexity. Figure 1 outlines the process and highlights the original contributions of the paper. Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 25

Hierarchical Bayesian Models
Hierarchical models are mathematical representations that involve multiple parameters in such a way that credible values of some of the parameters depend significantly on the values of other parameters. Hierarchical Bayesian models have been used in different industries to analyze different aspects of safety [18,20,[23][24][25] but have not yet been regularly used in aviation [11,21,22,26,27].
What makes hierarchical modelling so effective is that the estimate of each individual parameter is simultaneously informed by data from all other parameters. All parameters inform the top-level parameters, which in turn restrict all individual parameters. These structural dependencies provide better-informed, that is, more precise estimates of all parameters.
Consider that each ATM system will experience a certain number of SMIs per year. Each ATM will provide service to a certain number of aircraft in its airspace. Parameter defines the rate of occurrence of SMIs for each ATM system.
can be estimated from the number of SMIs that have occurred in the past, the number of flights managed, and their complexity. Similarly, given that all ATM systems operate according to the same rules and procedures, each parameter will depend on a global parameter that defines the rate of occurrence of SMIs of a generic ATM system that applies the rules and procedures deriving from the European regulations. This hierarchy of dependencies between parameters could be extended to even more levels, bearing in mind that, over and above European regulations, ATM systems around the world are governed by ICAO standards.
In hierarchical models, parameters at different levels co-exist in a joint parameter space. The joint prior distribution can be factored or decomposed so that some parameters depend on others. In other words, chains of dependencies can be established between parameters. These factorizations are not unique, and each model can choose the most convenient parameterization at any time.
Given a series of parameters and ω, and a set of data D, we can apply Bayes' theorem as follows: What characterizes a hierarchical model is that the terms on the right side of Equation (1) can be factored into a chain of dependencies, as seen in Equation (2):

Hierarchical Bayesian Models
Hierarchical models are mathematical representations that involve multiple parameters in such a way that credible values of some of the parameters depend significantly on the values of other parameters. Hierarchical Bayesian models have been used in different industries to analyze different aspects of safety [18,20,[23][24][25] but have not yet been regularly used in aviation [11,21,22,26,27].
What makes hierarchical modelling so effective is that the estimate of each individual parameter is simultaneously informed by data from all other parameters. All parameters inform the top-level parameters, which in turn restrict all individual parameters. These structural dependencies provide better-informed, that is, more precise estimates of all parameters.
Consider that each ATM system will experience a certain number of SMIs per year. Each ATM will provide service to a certain number of aircraft in its airspace. Parameter θ s defines the rate of occurrence of SMIs for each ATM system. θ s can be estimated from the number of SMIs that have occurred in the past, the number of flights managed, and their complexity. Similarly, given that all ATM systems operate according to the same rules and procedures, each parameter θ s will depend on a global parameter ω that defines the rate of occurrence of SMIs of a generic ATM system that applies the rules and procedures deriving from the European regulations. This hierarchy of dependencies between parameters could be extended to even more levels, bearing in mind that, over and above European regulations, ATM systems around the world are governed by ICAO standards.
In hierarchical models, parameters at different levels co-exist in a joint parameter space. The joint prior distribution can be factored or decomposed so that some parameters depend on others. In other words, chains of dependencies can be established between parameters. These factorizations are not unique, and each model can choose the most convenient parameterization at any time.

of 24
What characterizes a hierarchical model is that the terms on the right side of Equation (1) can be factored into a chain of dependencies, as seen in Equation (2): According to this refactoring, the data, D, depends only on each parameter θ s . This means that once the value of the parameter θ s has been established, the data becomes independent of all other parameters. It is also clear that the value of each parameter θ s depends on the value of ω, and its value is conditionally independent of all other parameters. In other words, when the value of ω is set, θ s becomes independent of all other parameters. Any model that can be factored or decomposed into a chain of dependencies like that given in Equation (2) is a hierarchical model.
Hierarchical dependencies between parameters enable all available data to be used to jointly inform the estimated values of the parameters. In our case, this means that data from each ATM system can be used to estimate the θ s parameters of the other ATM service providers. In turn, the data from all providers can be used together to estimate the ω parameter, which gives the rate of occurrence of SMIs of a generic ATM system.

Data Used
The data used in this study were obtained from a number of European institutions involved in monitoring the performance of ATM systems, including the Performance Review Body (PRB), the Performance Review Unit (PRU), and EASA.
The PRB is an advisory body of the European Commission that assists the Commission and national authorities in the supervision of the performance of ATM systems. The PRU is part of the EUROCONTROL Single Sky Directorate. It provides information and data analysis on the performance of European ATM. EASA collaborates with the two aforementioned institutions by collecting the necessary safety data in each state.
The study covers a total of 29 European states. The value to be estimated and predicted y i,s is the number of annual SMIs that have taken place in each of the 29 ATM systems included in the study. The variable y is to the number of annual SMIs. Subscript i relates to the year, and subscript s relates to the specific ATM system.
Specialist literature on the topic suggests that the two factors that best explain the occurrence of SMIs are the volume of air traffic and the complexity of this traffic. To verify these dependencies, the following explanatory variables are analyzed in the study. x1 i, s is the volume of air traffic, i.e., number of annual flight hours, defined as the number of total flight hours per year of aircraft operating under Instruments Flight Rules (IFR) in the airspace of each of the countries considered in the study. The subscripts i and s refer to the year and ATM system, respectively.
To assess the complexity of air traffic, Eurocontrol has defined a set of indicators that can be used for ANSP benchmarking [28]. The complexity indicators are based on the "interactions" that arise when there are two aircraft in the same "place" at the same "time". The variable "complexity score" x2 i, s is defined as the total duration in minutes per flight hour of all interactions between controlled aircraft in a given volume of airspace.
The indicator "complexity score" is the product of two components: the adjusted density x3 i, s and the structural index x4 i, s . The variable x3 i, s measures the relative concentration of aircraft in the airspace. The airspace is divided into a discrete grid of cells measuring 20 × 20 nautical miles in the horizontal and 3000 feet high. An interaction is defined as the simultaneous presence of two planes in one of these cells. The variable x4 i, s is the sum of horizontal, vertical, and velocity interactions.
In this study, the values for complexity score, adjusted density, and structural index are aggregated on an annual basis. Therefore, the variables x2, x3, x4 refer to the annual mean value (calculated as the sum of the daily values of a year divided by the total flight hours in the year). The subscript i indicates the year in question, and the subscript s indicates the ATM system analyzed.
Appl. Sci. 2021, 11, 1600 6 of 24 The above data was obtained for 29 countries and their ATM systems over seven consecutive years. The identity of the countries is not given in the study. Each has been assigned a number from 1 to 29. In total, the data sample comprises five variables with 203 records per variable. Figure 2 shows the relationship between variables y, x1, x2, x3, x4 by means of scatterplots and the calculation of the correlations between them, as well as the density functions of each variable. The value of the variable x1 "number of flight hours" is divided by 100,000 so that it is more easily readable in the figure.
, x FOR PEER REVIEW 6 of 25 assigned a number from 1 to 29. In total, the data sample comprises five variables with 203 records per variable. Figure 2 shows the relationship between variables , 1, 2, 3, 4 by means of scatterplots and the calculation of the correlations between them, as well as the density functions of each variable. The value of the variable 1 "number of flight hours" is divided by 100,000 so that it is more easily readable in the figure.

Proposed Hierarchical Models
The models analyzed in this study combine regression analysis with hierarchical structures [29,30].
Let y i,s be the predicted variable and let x1 i,s , x2 i,s , x3 i,s , x4 i,s be the predictors. There is a likelihood function that expresses the probability of the values of the predicted variable as a function of the values of the predictors. Generalized Linear Models (GLM) are used that express the combined influence of the predictors as their weighted sum. Function lin(x i, s ) of the predictors is defined as: where i refers to each measurement and s relates to each ATM system.
Each coefficient βj s represents the expected variation in the predicted variable due to a unit increase in the value of the predictor xj i,s .
Likewise, prior distributions are defined for the coefficients βj s : θ s depends in turn on other hyperparameters that enable the relationship between the different ATM systems and the general safety standard to be expressed via a hierarchical relationship. θ s is defined as a Beta distribution of parameters (a s , b s ).
According to [31], analysis of the first and second moments of a distribution Beta(a, b) can be reparametrised as follows: Parameter ϑ represents the mean or overall influence of the variable xj on the precicted variable, while parameter K is a measure of the dispersion around parameter ϑ. Parameter K is inversely proportional to the variance (Var), as such, it is an appropriate indicator of the dispersión of ϑ characteristic of each ATM system. The higher the value of the paramater K for an ATM system, the lower the variance, and, therefore, the influence of the variable xj on the number of SMIs will be closer to the overall mean value for Europe. Conversely, the lower the value of the parameter K, the greater the dispersión.
Taking this reparameterization into account, the prior distributions of the coefficients βj s are: Once the predictors are combined, they are mapped with the predicted variable. Firstly, an inverse link function f () must be defined according to the following equation: where µ represents the central tendency of the prediction of the values of y i, s . This central tendency µ i, s may correspond to the mean or any applicable measure of central tendency, such as mode, median, etc. The only thing that remains to do is to specify the probability density function, abbreviated as pd f , which enables the measurements y i, s to be generated from the central tendency µ i, s . The literature generally refers to this probability distribution as the noise distribution.
Appl. Sci. 2021, 11, 1600 9 of 24 The shape of the probability density function pd f will depend on the scale of measurement of the predicted variable. In our problem, all the predictive variables correspond to metric data, while the predicted variable corresponds to count data, a particular case of metric data.
If the predicted variable corresponds to count data, the typical pd f distribution is usually a Poisson distribution. The canonical link function for Poisson distribution is a log-link function [32]. The combination of both results in a Poisson regression, as indicated below: Inverse link function : As can be seen, a Poisson regression is a type of generalized linear model (GLM) that enables a non-negative integer response, that is, a natural number, to be modelled against a linear predictor via an exponential link function. The exponential link function allows us to transform the expected values of the response found on the scale of (0, ∞) into the scale of the linear predictor, that is (−∞, ∞).
To mitigate the limitations due to overdispersion in the data, three additional models have been tested.
Hierarchical negative binomial regression model. The negative binomial model is parameterized in terms of the mean (λ i ) and the scale factor (r) [33]. It can be likened to a hierarchical two-stage process in which the response is modelled against a Poisson distribution whose expected recount is in turn modelled by a Gamma distribution with a mean λ i and a constant scale parameter (r).
where the expected value of y i is given by µ i and the variance is i ω . The negative binomial model is appropriate for situations of over-dispersion caused by clustering (due to the influence of other factors that have not been measured). JAGS is Just Another Gibbs Sampler, a program for analysis of Bayesian hierarchical models using MCMC) that uses a parametrization of the negative binomial distribution based on the parameters p and r. Direct estimation of the parameters p and r of the binomial distribution usually implies a bad autocorrelation of the MCMC chains. To avoid this problem, we will use a reparameterization in which we will set priors for the mean λ s and the parameter r, while the variance and the parameter p will be obtained from λ s and the parameter r.
Hierarchical zero-inflated regression model regression. This model is appropriate when the overdispersion is due to a higher-than-expected number of zeros in a Poisson distribution [34]. Zero-inflated models combine a binary logistic regression model with a Poisson regression.
Hierarchical normal quadratic regression with variance proportional to the number of flights. This last model combines the advantages of Poisson distributions with a normal model [35,36]. This allows a variance proportional to the number of SMIs to be combined with the effect of stable confidence intervals, especially if the number of incidents is high, which gives rise to a quadratic regression.
The data points y i, s are assumed to derive from a normal distribution. The model establishes a different variance for each ATM system proportional to the number of flights handled by each organization. A Gaussian noise is added to the regression to account for cases where the variance is constant. The hierarchical structure of parameters is similar to that of the previous models. Table 1 summarizes the mathematical formulation of the models developed.

Evaluation of the Models
All the models in this study have been developed using MCMC [37] simulations based on Gibbs using the JAGS simulation program [38]. The most significant results of each model are summarized below. Figures 6 and 7 give the overall result and precision of Model 1. Figure 6 shows the values predicted by Model 1 (in red) and the real values (in black). It also gives the confidence intervals at 2.5% and 97.5%, respectively (red lines), and the mean value of the prediction (black line). It is clear from Figure 4 that the 95% interval does not contain all of the SMIs and that the prediction is less optimal for high values of flight hours. Figure 7 shows the residuals of the model, the predicted values versus residuals, and the Q-Q plot. The residual plot shows a high density of points close to the origin and a low density of points away from the origin. It is symmetric about the origin, and there are not any patterns in the value of the residuals as we move along the x-axis. A small number of cases have notable errors. There are nine measurements with errors greater than 50. These correspond to ANSPs 9,11,22,28,8,20, and 12. Figure 8 comprises caterpillar plots giving the distributions of each hyperparameter K js of the model (K, K1, K2, K3, and K4). Each distribution is indicated by a horizontal line representing the 95% interval and a dot showing the mean value. Each provider is non-dimensionalized by assigning a number to it. The vertical red line represents the global mean of the means of the posterior distributions. Each K js parameter is a measure of the dispersion of the regression model coefficients for each service provider. The higher the value of K js , the lower the variance and, therefore, the nearer the value of the parameter for an ANSP to the mean value of the ϑ j parameters at a European level. Conversely, the smaller the value of K js , the greater the dispersion. It should be borne in mind that all values of ϑ j (ϑ, ϑ 1 , ϑ 2 , ϑ 3 and ϑ 4 ) are common for all ANSPs. Figures 6 and 7 give the overall result and precision of Model 1. Figure 6 shows the values predicted by Model 1 (in red) and the real values (in black). It also gives the confidence intervals at 2.5% and 97.5%, respectively (red lines), and the mean value of the prediction (black line). It is clear from Figure 4 that the 95% interval does not contain all of the SMIs and that the prediction is less optimal for high values of flight hours.    The numbers show that the mean values of are generally high. This is in agreement with the hypothesis that the coefficients of the regression model are of the same order for the ANSPs and similar to the corresponding value. However, in each case, there is a set of service providers whose behavior deviates significantly from the mean. Extreme values of , that is, values that are unusually high or low compared to the mean values, can be helpful in identifying ANSPs with atypical rates of incidents.

Model 1
For different ANSPs, the parameters that show the most variation are K1, K2, and K3. In other words, the variability in response between ANSPs is greater for the variables x1 "number of flight hours", x2 "Complexity Score", and x3 "Adjusted Density", in that order.
Finally, the values of ( , 1 , 2 , 3 , and 4 ) for this model are summarized in Table 2.  The numbers show that the mean values of K js are generally high. This is in agreement with the hypothesis that the coefficients of the regression model are of the same order for the ANSPs and similar to the corresponding ϑ j value. However, in each case, there is a set of service providers whose behavior deviates significantly from the mean. Extreme values of K js , that is, values that are unusually high or low compared to the mean values, can be helpful in identifying ANSPs with atypical rates of incidents.
For different ANSPs, the parameters that show the most variation are K1, K2, and K3. In other words, the variability in response between ANSPs is greater for the variables x1 "number of flight hours", x2 "Complexity Score", and x3 "Adjusted Density", in that order.

Model 2
Figures 9 and 10 give the overall results and the precision of Model 2. In this model, the 95% interval contains all of the SMIs, but at the expense of very wide confidence intervals, which grow exponentially with increasing number of flight hours. It is also at the expense of poor precision for high values of number of flight hours, as can be seen in the right-hand margin of Figure 9.
The precision and fit of this model are worse than the previous one. Some values shown in Figure 8 have very high residuals (greater than 600). The plot of residuals versus predicted values shows an increase in error and spread as the predicted values increase. This same effect can be seen at the edges of the Q-Q plot. These effects are especially due to the inability of the model to reproduce the seven data points at the far right of the diagram. This data corresponds entirely to a single ANSP, number 9. For the remaining data and ANSPs the model fits well. Furthermore, there are eight measurements with errors greater than 50, which correspond to ANSPs 9, 22, 28, 20, and 12. Figure 11 comprises caterpillar plots giving the distributions of each hyperparameter K js of the model. The numbers show that the mean values of K js are generally high. The graphs show that in the case of parameters K and K4 there is no dispersion, and that for parameters K1, K2, and K3, dispersion is limited to a few service providers, notably ANSPs 13,8,22,19,6,10,and 29. This coincides with the analysis of Figure 10.

Model 2
Figures 9 and 10 give the overall results and the precision of Model 2. In this model, the 95% interval contains all of the SMIs, but at the expense of very wide confidence intervals, which grow exponentially with increasing number of flight hours. It is also at the expense of poor precision for high values of number of flight hours, as can be seen in the right-hand margin of Figure 9.  The precision and fit of this model are worse than the previous one. Some values shown in Figure 8 have very high residuals (greater than 600). The plot of residuals versus predicted values shows an increase in error and spread as the predicted values increase. This same effect can be seen at the edges of the Q-Q plot. These effects are especially due to the inability of the model to reproduce the seven data points at the far right of the dia-  Finally, the values of ( , 1 , 2 , 3 , and 4 ) for this model are summarized in Table 2. Figures 12 and 13 show the overall result and precision of Model 3. Model 3 behaves in a similar way to Model 1. It must be remembered that the zero-inflated model has been designed as a variation of the Poisson model, with a component that introduces a binomial model to account for an excess of zeros. However, the estimate of parameter (see Table  3) with a mean value of 980 and a similar standard deviation, namely 997, indicates that the binomial model does not give a good fit.  Finally, the values of ϑ j (ϑ, ϑ 1 , ϑ 2 , ϑ 3 , and ϑ 4 ) for this model are summarized in Table 2. Figures 12 and 13 show the overall result and precision of Model 3. Model 3 behaves in a similar way to Model 1. It must be remembered that the zero-inflated model has been designed as a variation of the Poisson model, with a component that introduces a binomial model to account for an excess of zeros. However, the estimate of parameter γ 0 (see Table 3) with a mean value of 980 and a similar standard deviation, namely 997, indicates that the binomial model does not give a good fit.  This model contributes very little in addition to Model 1. This is because the difficulty in achieving a good fit is not caused by an excess of zeros but rather due to difficulties in reproducing the high-number SMIs. Table 3 shows that the values of the hyperparameters ( , 1 , 2 , 3 , and 4 ) are similar to those obtained using Model 1. Similarly, in this model This model contributes very little in addition to Model 1. This is because the difficulty in achieving a good fit is not caused by an excess of zeros but rather due to difficulties in reproducing the high-number SMIs. Table 3 shows that the values of the hyperparameters ϑ j (ϑ, ϑ 1 , ϑ 2 , ϑ 3 , and ϑ 4 ) are similar to those obtained using Model 1. Similarly, in this model there are eight measurements with errors greater than 50, which correspond to the ANSPs 9, 22, 28, 20, and 12. Figure 14 comprises caterpillar plots giving the distributions of each hyperparameter K js of the model (K, K1, K2, K3, and K4).  Figures 15 and 16 give the overall result and the precision of Model 4. The main advantage of this model, compared to the previous ones, is that the confidence levels give a very good fit to the data and the predictions are, in all cases, in line with the initial data.  Figures 15 and 16 give the overall result and the precision of Model 4. The main advantage of this model, compared to the previous ones, is that the confidence levels give a very good fit to the data and the predictions are, in all cases, in line with the initial data. This model has more parameters than the previous ones. In particular, the introduction of the error parameter ε s permits fine-tuning of the data of each ANSP. This means that it is flexible and capable of quantifying the dispersion of the data.

Model 4
In this model, the hyperparameters K js (K, K1, K2, K3, and K4) and ϑ j (ϑ, ϑ 1 , ϑ 2 , ϑ 3 , and ϑ 4 ) behave in a similar manner to the way they do in the other models. Figure 17 gives an analysis of the caterpillar plots of the βj s parameters of the regression function. This enables us to have a more intuitive view of the influence of the variables xj s on each ANSP.  Figures 15 and 16 give the overall result and the precision of Model 4. The main advantage of this model, compared to the previous ones, is that the confidence levels give a very good fit to the data and the predictions are, in all cases, in line with the initial data.   This model has more parameters than the previous ones. In particular, the introduction of the error parameter permits fine-tuning of the data of each ANSP. This means that it is flexible and capable of quantifying the dispersion of the data.

Model 4
In this model, the hyperparameters (K, K1, K2, K3, and K4) and ( , , , , and ) behave in a similar manner to the way they do in the other models. Figure 17  Additionally, in this model, there is another consideration with respect to the parameter 1 in that it corresponds to the coefficient of the quadratic term in the regression. This parameter can be used to reflect the dependence of the rate of occurrence of SMIs on the number of flight hours. In aviation this phenomenon is known as the stress effect and refers to the increase in accident rates as a function of the number of operations or flight hours. This has not been considered in the other three models. The value of the quadratic coefficient in the regression parameter is a measure of the importance of the stress effect. The higher the value of 1 , the greater the positive correlation between the rate of occurrence of SMIs and the square of the number of flight hours and, consequently, the greater the stress effect. ANSPs that experience this effect are considered to be saturated, since variations in the number of flight hours have a significant effect on the occurrence of SMIs. Therefore, special attention must be paid to this factor.
Analysis of the parameters , the error term in the linear regression, and , the variance proportional to the number of hours of operation, is especially relevant in this case. Figure 18 shows the caterpillar plots for these variables. Additionally, in this model, there is another consideration with respect to the parameter β1 s in that it corresponds to the coefficient of the quadratic term in the regression. This parameter can be used to reflect the dependence of the rate of occurrence of SMIs on the number of flight hours. In aviation this phenomenon is known as the stress effect and refers to the increase in accident rates as a function of the number of operations or flight hours. This has not been considered in the other three models. The value of the quadratic coefficient in the regression parameter is a measure of the importance of the stress effect. The higher the value of β1 s , the greater the positive correlation between the rate of occurrence of SMIs and the square of the number of flight hours and, consequently, the greater the stress effect. ANSPs that experience this effect are considered to be saturated, since variations in the number of flight hours have a significant effect on the occurrence of SMIs. Therefore, special attention must be paid to this factor.
Analysis of the parameters ε s , the error term in the linear regression, and σ s , the variance proportional to the number of hours of operation, is especially relevant in this case. Figure 18 shows the caterpillar plots for these variables.
Practically all the ANSPs have values of σ s equal to zero with the exception of ANSPs 29,18,19,21,25,7,and 6. All of these correspond to small service providers with low numbers of flight hours.
Finally, the values of ϑ j (ϑ, ϑ 1 , ϑ 2 , ϑ 3 , and ϑ 4 ) for all the models are summarized in Table 2 Practically all the ANSPs have values of equal to zero with the exception of AN-SPs 29, 18, 19, 21, 25, 7, and 6. All of these correspond to small service providers with low numbers of flight hours.
Finally, the values of ( , , , , and ) for all the models are summarized in Table 2.

Comparison of the Models
The four hierarchical models in the study use the following probability density functions: Poisson, negative binomial, zero-inflated, and normal. We will use deviation information criteria (DIC) to compare the models [39] via MCMC simulations.
This parameter calculates the deviation as the mean of the posterior predictive function: This criterion is a measure of the goodness of fit of the data and, at the same time, introduces a term to evaluate the complexity of the model. This criterion is asymptotically the same as performing cross-validation using part of the data to estimate the data and the rest to calculate the goodness of the predictions [40]. The lower the calculated DIC value, the more accurate the model's predictions. One model can be considered superior to another when its DIC is more than five points below that of its competitor [41].
According to the values obtained in the simulation, Model 4 appears to be the most efficient and Model 2 the least efficient. This coincides with the analysis carried out in the previous sections. The table confirms that Models 1 and 4 are better at predicting SMIs.

Comparison of the Models
The four hierarchical models in the study use the following probability density functions: Poisson, negative binomial, zero-inflated, and normal. We will use deviation information criteria (DIC) to compare the models [39] via MCMC simulations.
This parameter calculates the deviation as the mean of the posterior predictive function: This criterion is a measure of the goodness of fit of the data and, at the same time, introduces a term to evaluate the complexity of the model. This criterion is asymptotically the same as performing cross-validation using part of the data to estimate the data and the rest to calculate the goodness of the predictions [40]. The lower the calculated DIC value, the more accurate the model's predictions. One model can be considered superior to another when its DIC is more than five points below that of its competitor [41].
According to the values obtained in the simulation, Model 4 appears to be the most efficient and Model 2 the least efficient. This coincides with the analysis carried out in the previous sections. The The hierarchical relationship framework underlying all the models includes parameters ϑ j and K js . Each of the models includes a parameter or set of parameters ϑ s that synthesizes the general level of safety of the EU ATM system. Parameter ϑ s represents the mean or overall influence of variable xj on the predicted variable. Parameter K js is a measure of the dispersion of an ANSP around parameter ϑ j . These parameters could be used in further studies to benchmark among different regions in the world (USA, Middle East, etc.), providing similar data set are available for all the regions.
If the values of K js are high and similar to each other for all ANSPs, this means that there is a central tendency around the European mean, since K js is inversely proportional to the variance of ϑ j . Conversely, small values of K js indicate that an ANSP has an unusually low or high number of SMIs, compared to the European average. A value of K js that is significantly different to that of other ANSPs will alert us to those ANSPs whose behavior deviates from the ideal.
To illustrate the meaning of K js , let us analyze in detail an example. Let us look at parameter K4 in Model 1 (Figure 8d). The estimated values of K4 for each of the 29 ANSPs are presented in Table 4. K4 is related to the influence of the variable x4, "complexity index", in the number of SMIs. K4 indicates the deviation of each ANSP with respect to the influence of complexity index in the number of SMIs from the overall European ATM system. It can be observed that the values of K4 are higher than 900 for almost all ANSPs, except for ANSP-12 and ANSP-9. K js is inversely proportional to the variance of ϑ j . For those ANSPs with elevated K4 values, the influence of complexity index in the occurrence of SMIs is similar and close to that of the overall European ATM system. However, ANSPs 12 and 9 exhibit lower values of K4, 78 and 335, respectively, indicating that for these two providers the dependency of the SMIs with the variable complexity index does not behave as the overall EU ATM system.
In general, ANSPs exhibiting higher errors in each model happen to have low values for one or more K js parameters, which indicates higher deviation of this ANSP with respect to the influence of one explanatory variable in the number of SMIs, regarding the overall European ATM system.
Furthermore, the final model has two additional features. Firstly, it combines the advantages of Poisson distributions with a normal model. To do this, it introduces a variance proportional to the number of SMIs. When the variance of an ANSP is constant for the entire range of SMIs, the Gaussian noise ε s added to the regression, allowing σ s to be set to 0, if necessary. This model also has a quadratic term in the regression equation, which reflects the dependence of the rate of occurrence of SMIs on the square of the number of flight hours. In aviation, this phenomenon is known as the stress effect, and refers to the increase in accident rates as a function of number of flight hours.

Conclusions
This study makes use of the advantages of hierarchical Bayesian inference models to quantify the levels of safety in European airspace. The study has three objectives: • Generate predictive models that enable the number of losses of separation between aircraft in the airspace to be predicted using predictive variables such as the number of flight hours or the complexity of the airspace. • Estimate the general underlying safety level of European ATM systems. • Estimate how much each of the European ANSPs deviates from this general value.
For this, four models were developed that combine a hierarchical approach with a regression model. These enable us to infer and predict the common rate of occurrence of SMIs, as well as the specific rates of occurrence for each ANSP. The models take two factors into consideration: the hierarchical relationship between the ANSPs and the generation of SMIs from predictors that represent the main local differences.
The four models generally demonstrate good behavior. Model 4 is the most efficient and Model 2 the least efficient. The original pieces of knowledge provided by this research can be summarized as: • The development of explanatory and predictive models for the number of SMIs as a function of the "number of flight hours", "complexity score", "adjusted density", and "structural index".

•
The models quantify the underlying safety level of European ATM and how much each of the European ANSPs deviates from this general value.

•
The contribution of the European ATM regulation to a reduced number of SMIs has been quantified as an overall EU ATM system performance.

•
The models explain and predict SMIs for 29 European ATM systems with common regulations and work procedures, but with different local circumstances.

•
The models are compatible with a very reduced set of data and able to integrate available expert prior information.

•
The models prove to be able to capture hierarchical dependences between parameters.
The models developed have been found to be useful in explaining and predicting the safety performance of 29 European ATM systems with common regulations and work procedures, but with different circumstances and numbers of aircraft, each managing traffic of differing complexity.
This study shows the usefulness of hierarchical structures when it comes to obtaining parameters that enable risk to be quantified effectively. We can identify the parameters that are characteristic of the safe operation of each ATM system and of the entire European system. These parameters allow us to identify and quantify trends and establish benchmarks to compare the current year's performance with that of previous years.

Conflicts of Interest:
The authors declare no conflict of interest.