Neural Network-Based Aircraft Conflict Prediction in Final Approach Maneuvers

Conflict detection and resolution is one of the main topics in air traffic management. Traditional approaches to this problem use all the available information to predict future aircraft trajectories. In this work, we propose the use of a neural network to determine whether a particular configuration of aircraft in the final approach phase will break the minimum separation requirements established by aviation rules. To achieve this, the network must be effectively trained with a large enough database, in which configurations are labeled as leading to conflict or not. We detail the way in which this training database has been obtained and the subsequent neural network design and training process. Results show that a simple network can provide a high accuracy, and therefore, we consider that it may be the basis of a useful decision support tool for both air traffic controllers and airborne autonomous navigation systems.


Introduction
This work is part of a line of research aimed at improving air traffic management (ATM) procedures. More specifically, the line focuses on improving operations in the airport environment, and it is mainly motivated by the relentless increase in global air traffic [1].
All proposals in this field must thoroughly respect the vast rules issued by the many civil aviation authorities in this sector. These authorities include the ICAO (International Civil Aviation Organization), a United Nations agency, which promotes aviation safety and the orderly development of international civil aviation worldwide. The ICAO establishes standards and regulations necessary for aviation safety, efficiency, regulation, and for environmental protection. One of the many safety requirements that air operations must meet has to do with maintaining a minimum lateral and vertical separation between aircraft that are in flight. For example, the minimum lateral separation is 5 NM (we use nautical miles (NM), feet (ft), and knots (kt), as they are usual units of measurement in air navigation. In the International System of Units (SI), 1 NM = 1852 m, 1 ft = 0.3048 m, and 1 kt = 0.5144 m/s) for en-route airspace, and 3 NM inside the terminal radar approach control area. On the other hand, the minimum vertical separation is 2000 ft above 29,000 ft, and 1000 ft below this altitude [2].
Air traffic conflict detection and resolution (CD and R) mechanisms [3,4] aim to maintain the separation between in-flight aircraft established by the aviation regulation. In this context, a "conflict" (or "collision") is an event in which two or more aircraft experience a loss of minimum separation. Traditional CD and R mechanisms use some geometric [5][6][7] or probabilistic [8][9][10] techniques for predicting future aircraft trajectories starting from, for example, known flight plans or current radar information. Trajectory prediction allows the detection of conflict in advance and the triggering of actions to prevent it from happening.

Neural Networks Basics
With the current increasing of computing power, and the popularization of GPU computing, the neural network technology is now being applied to very diverse areas and covers a wide variety of topics, proving itself worthy by producing incredibly good results. From medicine to economics, this is currently one of the most used tools to solve problems that would traditionally require very complex mathematical models. We consider that it can also be applied to solve typical ATM problems, such as predicting conflicts between approaching aircraft. Artificial neural networks (or, simply, neural networks) are vaguely inspired by the biological neural networks that form animal brains. As well as their biological counterparts, the key factor in neural networks is that they can learn to perform tasks by considering examples, instead of being specifically programmed for that end. A neural network is based on a collection of nodes, called artificial neurons, that model the neurons existing in the biological model. These nodes, also connected between them, reflect the synapses between the neurons by transferring or not the information processed in them depending on a certain activation layer.
The computational model for neural networks was developed by McCulloch and Pitts in 1943 [29]. Former implementations (called Perceptron) were formed by a series of artificial neurons connected to the output layer. In 1975, Werbos developed his backpropagation algorithm, which effectively allowed the training of multi-layer networks (called Multi-Layer Perceptron, or MLP) to be feasible and efficient [30]. As the computing power increased through GPUs and distributed systems, the number of layers in the neural networks models could grow. These systems became known as deep learning networks and proved particularly good at solving image and visual recognition problems.
A neural network is often described as a black box learning model. However, that does not mean that their mechanism is not well-known or is too intricated to know. Rather, they are called black boxes because of their complexity. Once the complexity of the network starts to grow, by having multiple layers and a big number of neurons per layer, the weights each neuron take become indescribable, so they do not mean anything to a human observer. However, we can look at the main elements of the neural network to have an idea of how those weights are finally computed and, later, used.
The first element to consider is the artificial neuron. There are several models of artificial neurons. The one used in the perceptron is still used in current neural networks and deep learning networks. It works by taking a set of binary inputs x 1 , x 2 , . . . , x n and produces a single binary output. Inside the neuron, all these binary inputs are multiplied by a set of weights w 1 , w 2 , . . . , w n , which could be defined as the importance each input has for the final result. If the sum of all the inputs multiplied by the weights is greater than a threshold value, the neuron outputs a1. Or, put in more precise algebraic terms, That threshold, used to decide if the neuron will activate, is usually called bias, b, and it can be moved to the right part of Equation (1). In addition, to simplify the notation, the sum of the products of the inputs and weights is usually written as the dot product of those two vectors. This way, we can define the neuron more easily as With this neuron model, a complete neural network can be devised to work and approximate a function. However, a problem arises when the neural network must learn from a set of inputs. When trying to adjust the different weights, the binary nature of these neurons has a catastrophic chain reaction effect. If a single weight is changed, to make a neuron change from 0 to 1, it can activate or deactivate a lot of the following neurons that were working fine before. That is where logistic neurons appear. Instead of being completely binary, these neurons can receive any real number between 0 and 1. Electronics 2020, 9, 1708 4 of 18 In addition, instead of the step function 0 or 1 for the output value, there will be an activation function. Historically, this activation function has been the sigmoid, and that is why these logistic neurons can also be referred to as sigmoid neurons in some books. However, the late increase in popularity of other activation functions, such as the hyperbolic tangent (tanh) or the Rectified Linear Unit (ReLU), has rendered the sigmoid neuron terminology old. For simplicity, the sigmoid function will be the one used to explain the functioning of these kind of neurons, the other activation functions being interchangeable. The sigmoid function is defined as where, in this case, z is our dot product of weights and inputs, that is, w · x + b.
While it might seem a complex thing versus the simple model that we described previously, it is actually not that far away from the previous model. If we look at the sigmoid function, we can see that when z is very large and positive, the sigmoid approaches 1, and when it is very negative, it approaches 0. In fact, if the σ function were a step function, we would have exactly the same kind of neurons as before. However, back to the sigmoid, it is the smoothness of its shape that is crucial. By making slight changes in the weights and bias, we can obtain slight changes in the output of the neuron, rather than an extreme change like before. This will allow the learning algorithm to make small changes to each neuron without completely disrupting the model.
In order to learn, we define the following to be true: real_output = output + ∆output. As we want to improve our network, we must compute that ∆output in order to make the small changes to the network. Further, to compute this, we know that small changes in weights and bias produce small changes in the output, that is Thus, it is here where the choice of activation function comes into play. To solve the derivatives, the exponential character of the sigmoid function plays an important role in facilitating the necessary computations to adjust the weights and bias of each neuron. That is why such a proper candidate cannot be just any function with its shape. The only factor is that, now, our output layer will not output a definite Boolean value. However, when classifying, we can just take a threshold, which is usually 0.5, to define when the real number means true or false.
After defining both the neurons and the activation functions, we must define the learning function. We have previously stated that the backpropagation technique was a breakthrough in facilitating the computing of the different adjustments of the weights and bias, and have now properly explained what they mean. First, we need a reference to know if the neural network is working correctly. This function is called the loss or cost function and is represented as C. As a simple example, the mean squared error (MSE) can be used as a cost function. In fact, Matlab [31], which is the platform employed in this work, uses this particular cost function for most of its network implementations. This function is defined as where w and b are our vectors of weights and bias, y(x) is the output of the network, and a is the desired output. We must take into account that the methods explained here work for supervised classification, which is the modality used for our final Matlab network. Once the cost function is defined, the set of weights and bias must be optimized to minimize it. This is done with the learning function. A popular learning function is the Stochastic Gradient Descendent (SGD) [32]. This function will also be one of those used later when designing and training our network. It is an algorithm that seeks the minimization of the cost function using derivatives.
That is why the MSE is often used, identical to before the sigmoid function. It is a function in which it is easy to make small changes to improve its accuracy. In fact, other ad hoc functions are sometimes used that better suit whatever the network is approximating. However, for the task, the MSE works well enough both in the explaining and, later, for the implementation.
The way of minimizing the error is going to be a local search. This means that it is possible to find a local minimum while not being able to exit that area when training the networks. Even if finding the global minimum is a difficult task, preventing the search for a shallow local minimum is something that can be dealt with and that will be later explained.
Thus, to find the minimum for such kinds of functions, the solution is to find the derivative to the function. However, when the number of variables, weights and bias, grows as in a neural network (when having a big neural network of thousands of neurons, the weights and bias can be billions), the derivatives grow exponentially more complex, not having any efficient way to compute them. However, instead of looking for the absolute minimum, we can just "peek" at where the slope is going. If the minimum is to be found, the logic says that the minimum should eventually be reached by following the slope.
To move toward the minimum, the first thing is to define how the cost function evolves, that is where each v is a variable of our network. In addition, we define the vector of changes in our variables as ∆v ≡ (∆v 1 , ∆v 2 , . . . , ∆v n ) and the gradient of C, ∇C, as the vector of partial derivatives. With this, Equation (6) can be rewritten as ∆C ≈ ∇C · ∆v which proves interesting in showing a way in which we can make ∆C negative. In particular, the changes in the variables can be chosen as ∆v = −η∇C (8) This would mean that ∆C ≈ ∇C · −η∇C = −η ∆C 2 (9) and, given that ∆C 2 is always positive, ∆C is always negative. η is a small, positive parameter that is chosen when defining the network, and is called the learning rate. The smaller it is, the lower the chance that the changes in the variables will jump out of the local minimum, but it will also make the computations slower. That is why choosing the learning rate properly can change the outcome of the network after its learning process. After this, the set of variables can be updated as Finally, to apply this with the components we have, that is, w and b, we have A problem appears when we have a large number of inputs. Given that the gradient ∇C must be computed for every input, in the case of a very large database, the computing time can be excessive. That is when the SGD can be used. This approach takes a relatively small batch of training inputs, chosen randomly from the database, to adjust the weights. By applying this several times, it happens that the speed is greatly improved without losing much accuracy with the true gradient of the whole database.
Once we have defined all our elements and the gradient, the only thing missing is to know how to compute the gradient previously explained. It is here where the backpropagation algorithm enters as the solution. For the rest of this section, the following notation will be used: • w l jk will be the weight that connects the k th neuron in the (l − 1) th layer with the j th neuron of the l th layer. • b l j will be the bias of the j th neuron of the l th layer. • a l j will be the activation of the j th neuron of the l th layer. • z l j will be what is called the weighted input. This will be used to sum up the following formula: z l j = k w l jk a l−1 j + b l j .
• δ l will be the error of the l th layer.
With these definitions, we can define the activation (or output) of the j th neuron of the l th layer as We can see that this notation is a bit cumbersome. To make it simpler to write and follow, the neurons will be written using a matrix approach, meaning that w l is a matrix or array of all the weights of the l th layer. Then, Equation (13) can be rewritten as After this, the backpropagation algorithm is based on four fundamental equations. Proving them falls out of the scope of this work but knowing them will allow us to explain how the backpropagation works. The four equations are the following ones: • δ L = ∇ a C · σ z L . Simply put, this means that the error of any layer can be computed as the derivatives of the cost function in that activation layer multiplied by the derivative of the activation function of that layer. • δ l = w l+1 δ l+1 · σ z l . This one means that the error of a layer can be computed as the error of the following layer multiplied by the weights of the following layer, multiplied by the derivative of the activation function of the current layer. This will be a key concept, because it gives a way of computing the error of a layer having the error of the following one. • ∂C ∂b l j = δ l j . In this case, the rate of change of the cost with respect to any bias is exactly the same as the error of that neuron. • ∂C ∂w = a in δ out . Lastly, the rate of change of the cost with respect to any weight can be computed as the activation of its input layer multiplied by the error of its output layer.
With all these elements, the backpropagation algorithm can be finally defined as followed:

1.
Feedforward: For each l, compute z l = w l a = l − 1 + b l and a l = σ z l .

2.
Output error: Compute the error of the last layer as δ L = ∇ a C · σ z L . Using the MSE as the cost error, and the sigmoid as the activation function, the derivatives are as easy as: where t L is the expected output for the network in array format.

3.
Backpropagate the error: Compute the error of every layer using the error of the following one.
As the error of the last layer can be easily known, the rest of them can be computed iteratively. This is where the power of the algorithm lays: Just with a forward and a backward pass, which can have roughly the same computational cost, the weights are adjusted closer to the final result. As stated before, the error of the hidden layers can be computed as δ l = w l+1 δ l+1 · σ z l . For the last layer, z l = x l , that is, the input values.

4.
Compute the gradient and update the weights and biases. By putting everything together, the result is:

Conflict Detection in Approach Phase
In this section, we describe the dynamic model of aircraft movement, as well as the navigation procedure between waypoints that we assume the pilot follows. Subsequently, the criterion of conflict determination between two aircraft executing a runway approach maneuver is detailed.

Aircraft Dynamics
We assume that an approach procedure is completely defined by an ordered list or sequence of waypoints. Each waypoint consists of a tridimensional position and the required horizontal speed for the aircraft at this position. Formally, we define a tridimensional position as a vector p = [x y z], where x, y, and z (in m) are given with respect to a coordinate system whose origin is located at the point of contact between the aircraft and the runway.
The position and heading of an aircraft at a given moment is expressed by the state vector v = [p ψ], where ψ is the angle between the North direction and the aircraft longitudinal axis (in rad). Figure 1 illustrates the airspace model in the vicinity of a runway.

Conflict Detection in Approach Phase
In this section, we describe the dynamic model of aircraft movement, as well as the navigation procedure between waypoints that we assume the pilot follows. Subsequently, the criterion of conflict determination between two aircraft executing a runway approach maneuver is detailed.

Aircraft Dynamics
We assume that an approach procedure is completely defined by an ordered list or sequence of waypoints. Each waypoint consists of a tridimensional position and the required horizontal speed for the aircraft at this position. Formally, we define a tridimensional position as a vector = [ ], where , , and (in m) are given with respect to a coordinate system whose origin is located at the point of contact between the aircraft and the runway. The position and heading of an aircraft at a given moment is expressed by the state vector = [ ], where is the angle between the North direction and the aircraft longitudinal axis (in rad). Figure 1 illustrates the airspace model in the vicinity of a runway.

Pilot's Behavior
We formally define a waypoint as a vector where establishes a tridimensional position and refers to the aircraft horizontal speed (in m/s). The algorithm presented in Table 1 details the pilot's behavior. Let vector u = [F s V s Y s ] be the system input modeling the aircraft movement. The F s and V s parameters establish, respectively, the desired forward and vertical aircraft speeds (in m/s), and the Y s parameter represents the angular speed of course change (yaw speed; in rad/s). In our model, the aircraft dynamics is defined by

Pilot's Behavior
We formally define a waypoint as a vector w = [p s], where p establishes a tridimensional position and s refers to the aircraft horizontal speed (in m/s). The algorithm presented in Table 1 details the pilot's behavior.
This establishes that the desired forward speed equals the waypoint speed. The desired vertical speed is computed according to the relation between vertical (d z ) and horizontal (d xy ) distances from the aircraft to the waypoint (see Figure 2). Electronics 2020, 9, x FOR PEER REVIEW 8 of 18 = pilot( , ) 10.

= [ ]
This establishes that the desired forward speed equals the waypoint speed. The desired vertical speed is computed according to the relation between vertical ( ) and horizontal ( ) distances from the aircraft to the waypoint (see Figure 2). Additionally, the angular speed is computed to guarantee that the aircraft turns with a preestablished radius (see Figure 3). In this work, we assume a turn radius = 1.52 NM (2815 m) [33]. From the moment in which an aircraft enters the airport airspace, it must follow all the fly-by waypoints in the approach sequence. An aircraft should change to the next waypoint when it reaches the Distance-of-Turn Anticipation (DTA) to the current one. Figure 4 illustrates the geometry of this well-known problem. Let us assume that an aircraft located at is following the sequence of waypoints { }. Assuming that the aircraft turns with a fix radius , it should begin turning toward at a distance = tan from . Additionally, the angular speed is computed to guarantee that the aircraft turns with a preestablished radius r (see Figure 3). In this work, we assume a turn radius r = 1.52 NM (2815 m) [33].

= [ ]
This establishes that the desired forward speed equals the waypoint speed. The desired vertical speed is computed according to the relation between vertical ( ) and horizontal ( ) distances from the aircraft to the waypoint (see Figure 2). Additionally, the angular speed is computed to guarantee that the aircraft turns with a preestablished radius (see Figure 3). In this work, we assume a turn radius = 1.52 NM (2815 m) [33]. From the moment in which an aircraft enters the airport airspace, it must follow all the fly-by waypoints in the approach sequence. An aircraft should change to the next waypoint when it reaches the Distance-of-Turn Anticipation (DTA) to the current one. Figure 4 illustrates the geometry of this well-known problem. Let us assume that an aircraft located at is following the sequence of waypoints { }. Assuming that the aircraft turns with a fix radius , it should begin turning toward at a distance = tan from . From the moment in which an aircraft enters the airport airspace, it must follow all the fly-by waypoints in the approach sequence. An aircraft should change to the next waypoint when it reaches the Distance-of-Turn Anticipation (DTA) to the current one. Figure 4 illustrates the geometry of this well-known problem. Let us assume that an aircraft located at v is following the sequence of waypoints {w 1 w 2 w 3 }. Assuming that the aircraft turns with a fix radius r, it should begin turning toward w 3 at a distance d = r tan α 2 from w 2 .

Conflict Detection
We have considered an ideal conflict detector that continuously compares the position of all aircraft, analyzing if, at any time, two of them are closer than they can legally be. Specifically, as indicated, according to current regulations, the minimum allowable separation between two aircraft in the approach phase is 1000 ft vertically and 3 NM horizontally [2]. Table 2 details a generic conflict detection algorithm. Symbol ~ refers to the position in the state vector of the aircraft heading, which is not used by the algorithm. Parameter height_th refers to a height threshold below which checking is not carried out.

Conflict Prediction Based on Neural Network
As stated, the neural network that we propose to use as conflict predictor in this work is a binary classifier that, once trained, will be able to determine whether two approaching aircraft will break the separation rules during the maneuver. A database is required for the training of this neural network. The entries in this database must be labeled with the correct output that the network should provide in each case. As we must only differentiate between two classes ("conflict" or "not conflict"), two network outputs are possible in this case. To generate such a database, we need to focus on a specific airport (with an explicit approach maneuver), on which we will proceed to deploy a set of approaching aircraft by applying the aircraft and pilot models described in Sections 3.1 and 3.2.

Conflict Detection
We have considered an ideal conflict detector that continuously compares the position of all aircraft, analyzing if, at any time, two of them are closer than they can legally be. Specifically, as indicated, according to current regulations, the minimum allowable separation between two aircraft in the approach phase is 1000 ft vertically and 3 NM horizontally [2]. Table 2 details a generic conflict detection algorithm. Symbol ∼ refers to the position in the state vector of the aircraft heading, which is not used by the algorithm. Parameter height_th refers to a height threshold below which checking is not carried out.
if z 1 > 29000 f t then 6. T

Conflict Prediction Based on Neural Network
As stated, the neural network that we propose to use as conflict predictor in this work is a binary classifier that, once trained, will be able to determine whether two approaching aircraft will break the separation rules during the maneuver. A database is required for the training of this neural network. The entries in this database must be labeled with the correct output that the network should provide in each case. As we must only differentiate between two classes ("conflict" or "not conflict"), two network outputs are possible in this case. To generate such a database, we need to focus on a specific airport (with an explicit approach maneuver), on which we will proceed to deploy a set of approaching aircraft by applying the aircraft and pilot models described in Sections 3.1 and 3.2.
Next, we detail the scenario selected, the way in which the training database has been obtained, and the design and training process of our neural network.

Airport Scenario
In our study, we have considered the approach procedure to the "RWY 13" runway of Málaga Airport (in Spain). Figure 5a shows part of the Instrument Approach Procedure (IAP) chart for this runway. Approach procedures are composed of several segments, referred to as initial, intermediate, and final approach segments, and a missed approach segment. These segments begin and end at designated fixes or specified points where no fixes are available.
Electronics 2020, 9, x FOR PEER REVIEW 10 of 18 Next, we detail the scenario selected, the way in which the training database has been obtained, and the design and training process of our neural network.

Airport Scenario
In our study, we have considered the approach procedure to the "RWY 13" runway of Málaga Airport (in Spain). Figure 5a shows part of the Instrument Approach Procedure (IAP) chart for this runway. Approach procedures are composed of several segments, referred to as initial, intermediate, and final approach segments, and a missed approach segment. These segments begin and end at designated fixes or specified points where no fixes are available. During the initial segment, starting at the Initial Approach Fix (IAF), aircraft transit from an enroute airway to the intermediate segment. In our scenario, we assume that aircraft appear at LOJAS (see Figure 5b), and then they fly to TOLSU, as this is the IAF in this case. The next approach segment, which starts at the Intermediate Fix (IF), allows descent to an intermediate altitude and alignment of the aircraft to the runway. In the approach considered, the IF is MG 402.
Lastly, during the final approach segment, starting at the Final Approach Point (FAP), the aircraft navigates to the runway by using navigation aids, such as the Instrument Landing System (ILS) [11], which are located at or nearby the runway. The FAP in our scenario is MG 401. If the landing is successful, the maneuver is over. If, on the other hand, the pilot decides to perform a missed approach maneuver, then they must follow the instructions in the approach chart. In the case of Málaga, the chart indicates that aircraft must fly to XILVI in the case of missed approach. Table 3 details the complete sequence of waypoints defining this approach procedure, in both aeronautical and standard (SI) notation. As stated in Section 3.1, each waypoint consists of a tridimensional position and the required horizontal speed for the aircraft at this position.  During the initial segment, starting at the Initial Approach Fix (IAF), aircraft transit from an en-route airway to the intermediate segment. In our scenario, we assume that aircraft appear at LOJAS (see Figure 5b), and then they fly to TOLSU, as this is the IAF in this case. The next approach segment, which starts at the Intermediate Fix (IF), allows descent to an intermediate altitude and alignment of the aircraft to the runway. In the approach considered, the IF is MG 402.
Lastly, during the final approach segment, starting at the Final Approach Point (FAP), the aircraft navigates to the runway by using navigation aids, such as the Instrument Landing System (ILS) [11], which are located at or nearby the runway. The FAP in our scenario is MG 401. If the landing is successful, the maneuver is over. If, on the other hand, the pilot decides to perform a missed approach maneuver, then they must follow the instructions in the approach chart. In the case of Málaga, the chart indicates that aircraft must fly to XILVI in the case of missed approach. Table 3 details the complete sequence of waypoints defining this approach procedure, in both aeronautical and standard (SI) notation. As stated in Section 3.1, each waypoint consists of a tridimensional position and the required horizontal speed for the aircraft at this position.

Training Database
The training database has been obtained by using an airspace simulation tool developed in Matlab/Simulink [34]. This tool models, in a realistic way, the air traffic flow that approaches an airport runway. Figure 6 shows a snapshot of the graphical output provided by the simulator. As just detailed, the scenario considered in this work is one of the runways in Málaga Airport, and the aircraft dynamics modeled corresponds to the popular Airbus 320.

Training Database
The training database has been obtained by using an airspace simulation tool developed in Matlab/Simulink [34]. This tool models, in a realistic way, the air traffic flow that approaches an airport runway. Figure 6 shows a snapshot of the graphical output provided by the simulator. As just detailed, the scenario considered in this work is one of the runways in Málaga Airport, and the aircraft dynamics modeled corresponds to the popular Airbus 320. Instead of assuming that all aircraft are approaching from the LOJAS waypoint, we expanded the entry margin to a three-dimensional area of 20 km × 20 km × 500 m. Aircraft will enter the airport airspace evenly distributed over that area. According to ICAO sequencing standards [2], we assume that a new aircraft appears every 60 s. In Figure 6, the green dots represent the initial detection position for 1000 aircraft, allowing this area to be visually delimited. On the other hand, each red dot represents an aircraft executing the approach maneuver. We assume that all arriving aircraft complete the maneuver.
If, at any given time during the simulation, there is a conflict between two flying aircraft (according to the criterion described in Section 3.3), the tool stores in the training database the positions in which both aircraft were first detected in airspace, as well as the difference in time between their apparitions. This entry is then labeled as conflict generator. If, on the other hand, an aircraft manages to land without conflicting with any other aircraft in airspace, then it can be paired with any of them, adding a new entry to the database (including similarly their respective initial positions and time difference) that will be labeled as not generating conflict.
For a neural network to train properly, it is necessary for the database to be balanced, that is, to contain approximately the same number of results of each of the elements that we want it to be able to predict in its output (i.e., output classes). In our case, the neural network will predict whether there is a conflict (1) or no (0) between a given pair of aircraft. As, in each simulation, there is a higher proportion of aircraft that are not in illegal positions than aircraft that actually are, only a third of the legal pairs of aircraft have been included to balance the database.
A training database should reflect all possible options that we may present in a neural network query. This is often a laborious task that results in databases with hundreds of thousands or millions of entries. In our case, we have generated a training database with half a million entries, structured as shown in Table 4. As it can be seen, approximately half of the entries represent conflict situations. On the other hand, a training database is usually broken down into three separate parts. The first one Instead of assuming that all aircraft are approaching from the LOJAS waypoint, we expanded the entry margin to a three-dimensional area of 20 km × 20 km × 500 m. Aircraft will enter the airport airspace evenly distributed over that area. According to ICAO sequencing standards [2], we assume that a new aircraft appears every 60 s. In Figure 6, the green dots represent the initial detection position for 1000 aircraft, allowing this area to be visually delimited. On the other hand, each red dot represents an aircraft executing the approach maneuver. We assume that all arriving aircraft complete the maneuver.
If, at any given time during the simulation, there is a conflict between two flying aircraft (according to the criterion described in Section 3.3), the tool stores in the training database the positions in which both aircraft were first detected in airspace, as well as the difference in time between their apparitions. This entry is then labeled as conflict generator. If, on the other hand, an aircraft manages to land without conflicting with any other aircraft in airspace, then it can be paired with any of them, adding a new entry to the database (including similarly their respective initial positions and time difference) that will be labeled as not generating conflict.
For a neural network to train properly, it is necessary for the database to be balanced, that is, to contain approximately the same number of results of each of the elements that we want it to be able to predict in its output (i.e., output classes). In our case, the neural network will predict whether there is a conflict (1) or no (0) between a given pair of aircraft. As, in each simulation, there is a higher proportion of aircraft that are not in illegal positions than aircraft that actually are, only a third of the legal pairs of aircraft have been included to balance the database.
A training database should reflect all possible options that we may present in a neural network query. This is often a laborious task that results in databases with hundreds of thousands or millions of entries. In our case, we have generated a training database with half a million entries, structured as shown in Table 4. As it can be seen, approximately half of the entries represent conflict situations. On the other hand, a training database is usually broken down into three separate parts. The first one is the set of entries dedicated to the training itself. The second part is the set of inputs used to validate the neural network design decisions. The last part is the set of inputs intended to evaluate the Electronics 2020, 9, 1708 12 of 18 reliability of the resulting network. We have considered training, validation, and test sets of 80%, 10%, and 10%, respectively. Next, we analyze the distribution of aircraft entry positions in the Málaga Airport airspace. We will look at the 504, 680 entries in Table 4 separately, depending on whether they caused a conflict or not. Figure 7 shows the initial spatial distribution of the 267, 329 analyzed aircraft pairs that turned out not to conflict (see Table 4). The figure contains two rows, showing the data for each of the aircraft involved. For each aircraft, the columns indicate its initial XYZ position. The six plots in the figure are histograms, that is, the horizontal axis is divided into fixed-width intervals, while the vertical axis shows the number of analyzed occurrences that fall into each interval. For example, we can see that, in almost 2000 aircraft pairs, the initial height of aircraft 1 is between 2640 and 2650 m.
is the set of entries dedicated to the training itself. The second part is the set of inputs used to validate the neural network design decisions. The last part is the set of inputs intended to evaluate the reliability of the resulting network. We have considered training, validation, and test sets of 80%, 10%, and 10%, respectively. Next, we analyze the distribution of aircraft entry positions in the Málaga Airport airspace. We will look at the 504,680 entries in Table 4 separately, depending on whether they caused a conflict or not. Figure 7 shows the initial spatial distribution of the 267,329 analyzed aircraft pairs that turned out not to conflict (see Table 4). The figure contains two rows, showing the data for each of the aircraft involved. For each aircraft, the columns indicate its initial XYZ position. The six plots in the figure are histograms, that is, the horizontal axis is divided into fixed-width intervals, while the vertical axis shows the number of analyzed occurrences that fall into each interval. For example, we can see that, in almost 2000 aircraft pairs, the initial height of aircraft 1 is between 2640 and 2650 m.
As indicated, to generate such initial positions, we have used a uniform random distribution within the established space, so it is to be expected that the histograms obtained will look relatively flat. We can see how the database inputs corresponding to pairs of nonconflicting locations generate flat-looking histograms in the XYZ coordinates.  Figure 8 below shows the initial spatial distribution of the 237,351 analyzed aircraft pairs that did come into conflict during their landing maneuver (see Table 4). The structure of this figure is similar to that of Figure 7 described previously. On this occasion, we can see that both the X and Y histograms have a clearly linear distribution, which indicates that the initial position in the plane has an effect on the generation of the conflict (as expected). By contrast, the Z histogram remains uniformly distributed, indicating that height does not affect the probability of conflict. As indicated, to generate such initial positions, we have used a uniform random distribution within the established space, so it is to be expected that the histograms obtained will look relatively flat. We can see how the database inputs corresponding to pairs of nonconflicting locations generate flat-looking histograms in the XYZ coordinates. Figure 8 below shows the initial spatial distribution of the 237, 351 analyzed aircraft pairs that did come into conflict during their landing maneuver (see Table 4). The structure of this figure is similar to that of Figure 7 described previously. On this occasion, we can see that both the X and Y histograms have a clearly linear distribution, which indicates that the initial position in the plane has an effect on the generation of the conflict (as expected). By contrast, the Z histogram remains uniformly distributed, indicating that height does not affect the probability of conflict.  Figure 9 shows the same data as Figure 8, but in a way that makes it easier to explain the observed linear behavior. First, we discard the Z component (as it is kept uniformly distributed). Then, for each aircraft, we merge the X and Y histograms (both two-dimensional) into a single threedimensional histogram, in which two dimensions are taken from the plane, and the third dimension is provided by a colored scale with the accumulated occurrences. When an aircraft appears in the detection area, it heads toward the TOLSU waypoint, which is located to the southwest. Aircraft that are detected in the northeast are, therefore, more likely to conflict with subsequently detected aircraft and constitute section "aircraft 1" of a database entry. By contrast, aircraft that are detected in the southwest zone are more likely to conflict with previously detected aircraft, becoming the "aircraft 2" section of a database entry. This circumstance explains why the first aircraft presents a proportional distribution in the XY components, while the second aircraft has an inversely proportional distribution in those components.

Neural Network Design and Training
We have designed and trained our neural network employing the Matlab Deep Learning Toolbox [35]. This tool provides the user with several interfaces to visualize the state of the network and its progress in real time. The creation tool is also a graphical interface in which the different kind  Figure 9 shows the same data as Figure 8, but in a way that makes it easier to explain the observed linear behavior. First, we discard the Z component (as it is kept uniformly distributed). Then, for each aircraft, we merge the X and Y histograms (both two-dimensional) into a single three-dimensional histogram, in which two dimensions are taken from the plane, and the third dimension is provided by a colored scale with the accumulated occurrences.  Figure 9 shows the same data as Figure 8, but in a way that makes it easier to explain the observed linear behavior. First, we discard the Z component (as it is kept uniformly distributed). Then, for each aircraft, we merge the X and Y histograms (both two-dimensional) into a single threedimensional histogram, in which two dimensions are taken from the plane, and the third dimension is provided by a colored scale with the accumulated occurrences. When an aircraft appears in the detection area, it heads toward the TOLSU waypoint, which is located to the southwest. Aircraft that are detected in the northeast are, therefore, more likely to conflict with subsequently detected aircraft and constitute section "aircraft 1" of a database entry. By contrast, aircraft that are detected in the southwest zone are more likely to conflict with previously detected aircraft, becoming the "aircraft 2" section of a database entry. This circumstance explains why the first aircraft presents a proportional distribution in the XY components, while the second aircraft has an inversely proportional distribution in those components.

Neural Network Design and Training
We have designed and trained our neural network employing the Matlab Deep Learning Toolbox [35]. This tool provides the user with several interfaces to visualize the state of the network and its progress in real time. The creation tool is also a graphical interface in which the different kind When an aircraft appears in the detection area, it heads toward the TOLSU waypoint, which is located to the southwest. Aircraft that are detected in the northeast are, therefore, more likely to conflict with subsequently detected aircraft and constitute section "aircraft 1" of a database entry. By contrast, aircraft that are detected in the southwest zone are more likely to conflict with previously detected aircraft, becoming the "aircraft 2" section of a database entry. This circumstance explains why the first aircraft presents a proportional distribution in the XY components, while the second aircraft has an inversely proportional distribution in those components.

Neural Network Design and Training
We have designed and trained our neural network employing the Matlab Deep Learning Toolbox [35]. This tool provides the user with several interfaces to visualize the state of the network and its progress in real time. The creation tool is also a graphical interface in which the different kind of layers can be dragged as blocks, building a graph that forms a very visual representation of the design of the network. It also offers the possibility of designing, creating, and training these networks through Matlab scripting, offering a flexibility comparable to Google's popular TensorFlow framework [36].
The design and training phases of a neural network are performed sequentially and iteratively, so that a design is trained, and its result helps in refining the original design. After training a huge number of networks, by varying the number of internal layers between 1 and 8, and the number of neurons per layer between 5 and 64, we have concluded that the problem can be solved by using the minimum MLP network shown in Figure 10.
Electronics 2020, 9, x FOR PEER REVIEW 14 of 18 of layers can be dragged as blocks, building a graph that forms a very visual representation of the design of the network. It also offers the possibility of designing, creating, and training these networks through Matlab scripting, offering a flexibility comparable to Google's popular TensorFlow framework [36]. The design and training phases of a neural network are performed sequentially and iteratively, so that a design is trained, and its result helps in refining the original design. After training a huge number of networks, by varying the number of internal layers between 1 and 8, and the number of neurons per layer between 5 and 64, we have concluded that the problem can be solved by using the minimum MLP network shown in Figure 10. In this design, Layer 1 contains five neurons whose function is to host the elements of the input vector [ ∆ ] ⊺ . Layer 2 acts as the inner hidden layer. It is fully connected, so that each neuron has five inputs and two outputs. Layer 3 is the output layer. It contains two neurons (with five inputs and two outputs) representing the possible responses provided by the network. If neuron 0 is activated, the network will indicate no conflict. Conversely, if neuron 1 is activated, the network will indicate the presence of conflict. Layer 4 performs a post-processing of the output. In particular, the softmax() activation function increases the differences between the possibility of getting a zero or a one [37]. Finally, Layer 5 incorporates a classifier that chooses which of the two options is most likely.
Regarding the learning function, we have considered three variants of SGD (see Section 2) provided by the Matlab Deep Learning Toolbox. The variants are Stochastic Gradient Descent with Momentum (SGDM) [32], Root Mean Square Propagation (RMSprop) [38], and Adaptive Moment Estimation (Adam) [39], the last one being finally chosen, as it offered the best accuracy in the experiments carried out.
The neural network training process has been performed on a computer based on a ninthgeneration Intel Core i9-9900K 3.6 GHz processor. Figure 11 shows its evolution. The upper part of the figure shows the network accuracy, while the lower part shows the loss. A batch size of 10 entries has been applied. This means that the network is fed with a set of 10 database entries, and then the adjustment of the neuron weights is performed according to the backpropagation algorithm detailed in Section 2. Therefore, the 403,744 entries in the training set (see Table 4) lead to 40,374 learning iterations. To avoid drastic fluctuations, the learning rate has been reduced to 5 × 10 . However, the scale of the figure is so large that it does not allow such an evolution to be observed. Figure 12 shows a very short training process, which allows us to better appreciate the behavior of the accuracy for successive iterations. In this design, Layer 1 contains five neurons whose function is to host the elements of the input vector [x 1 y 1 x 2 y 2 ∆t] . Layer 2 acts as the inner hidden layer. It is fully connected, so that each neuron has five inputs and two outputs. Layer 3 is the output layer. It contains two neurons (with five inputs and two outputs) representing the possible responses provided by the network. If neuron 0 is activated, the network will indicate no conflict. Conversely, if neuron 1 is activated, the network will indicate the presence of conflict. Layer 4 performs a post-processing of the output. In particular, the softmax() activation function increases the differences between the possibility of getting a zero or a one [37]. Finally, Layer 5 incorporates a classifier that chooses which of the two options is most likely.
Regarding the learning function, we have considered three variants of SGD (see Section 2) provided by the Matlab Deep Learning Toolbox. The variants are Stochastic Gradient Descent with Momentum (SGDM) [32], Root Mean Square Propagation (RMSprop) [38], and Adaptive Moment Estimation (Adam) [39], the last one being finally chosen, as it offered the best accuracy in the experiments carried out.
The neural network training process has been performed on a computer based on a ninth-generation Intel Core i9-9900K 3.6 GHz processor. Figure 11 shows its evolution. The upper part of the figure shows the network accuracy, while the lower part shows the loss. A batch size of 10 entries has been applied. This means that the network is fed with a set of 10 database entries, and then the adjustment of the neuron weights is performed according to the backpropagation algorithm detailed in Section 2. Therefore, the 403, 744 entries in the training set (see Table 4) lead to 40, 374 learning iterations. To avoid drastic fluctuations, the learning rate has been reduced to 5 × 10 −5 . However, the scale of the figure is so large that it does not allow such an evolution to be observed. Figure 12 shows a very short training process, which allows us to better appreciate the behavior of the accuracy for successive iterations.  Every 50 adjustment iterations, the accuracy of the network is checked against the 50,468 entries in the validation set. This value (shown in black in the plots) is the one that really tells us if the network is learning properly. The training process has been configured to end after obtaining no improvement in 25 consecutive validations. From here, it makes no sense to continue the process by submitting the network to an overtraining in which, far from improving, there is a worsening of learning. Finally, the trained network is able to correctly interpret 97.42% of the validation vector. At this point, the accuracy obtained after presenting to the network the 50,468 entries in the test set is 97.476%.
To conclude our study, Table 5 details the internal structure of the network resulting from the previous training. The table includes the weights for the five inputs of each neuron, as well as the bias applied to their output, for the fully connected layers 2 and 3.  Every 50 adjustment iterations, the accuracy of the network is checked against the 50,468 entries in the validation set. This value (shown in black in the plots) is the one that really tells us if the network is learning properly. The training process has been configured to end after obtaining no improvement in 25 consecutive validations. From here, it makes no sense to continue the process by submitting the network to an overtraining in which, far from improving, there is a worsening of learning. Finally, the trained network is able to correctly interpret 97.42% of the validation vector. At this point, the accuracy obtained after presenting to the network the 50,468 entries in the test set is 97.476%.
To conclude our study, Table 5 details the internal structure of the network resulting from the previous training. The table includes the weights for the five inputs of each neuron, as well as the bias applied to their output, for the fully connected layers 2 and 3. Every 50 adjustment iterations, the accuracy of the network is checked against the 50, 468 entries in the validation set. This value (shown in black in the plots) is the one that really tells us if the network is learning properly. The training process has been configured to end after obtaining no improvement in 25 consecutive validations. From here, it makes no sense to continue the process by submitting the network to an overtraining in which, far from improving, there is a worsening of learning. Finally, the trained network is able to correctly interpret 97.42% of the validation vector. At this point, the accuracy obtained after presenting to the network the 50, 468 entries in the test set is 97.476%.
To conclude our study, Table 5 details the internal structure of the network resulting from the previous training. The table includes the weights for the five inputs of each neuron, as well as the bias applied to their output, for the fully connected layers 2 and 3.

Conclusions and Future Work
This work aims to verify the feasibility of using neural networks to implement an air traffic management support system to predict the occurrence of conflicts between aircraft in the approach phase. A possible implementation for this conflict predictor, based on a relatively simple multi-layer perceptron architecture, has been detailed. The high accuracy of the prediction (above 97%) allows us to conclude that neural networks may be the basis for a support system for the air traffic controller or the aircraft pilot, as it does not intend to replace them, but to provide them with critical information for decision-making. Applying the methodology detailed in this paper, the proposed conflict prediction tool can be easily deployed for any airport, either at the control tower or as an airborne system (without ground support), after training the neural network with historical data on approaches at that airport. As future work, we plan to improve different air traffic management operations in terminal maneuvering areas, focusing not only on arrivals, but also on departures. We also want to explore the possibility of extending the current study to the rest of the flight phases.