A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge

Sui, Dong; Chen, Hanping; Zhou, Tingting

doi:10.3390/aerospace10110914

Open AccessArticle

A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge

by

Dong Sui

,

Hanping Chen

^* and

Tingting Zhou

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

^*

Author to whom correspondence should be addressed.

Aerospace 2023, 10(11), 914; https://doi.org/10.3390/aerospace10110914

Submission received: 24 September 2023 / Revised: 21 October 2023 / Accepted: 24 October 2023 / Published: 26 October 2023

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

With the escalating complexity of surface operations at large airports, the conflict risk for aircraft taxiing has correspondingly increased. Usually, the Air Traffic Controllers (ATCOs) generate route, speed and holding instructions to resolve conflicts. In this paper, we introduce a conflict resolution framework that incorporates prior knowledge by integrating a Multi-Layer Perceptron (MLP) neural network into the Monte Carlo Tree Search (MCTS) approach. The neural network is trained to learn the allocation strategy for waiting time extracted from actual aircraft taxiing trajectory data. Subsequently, the action probability distribution generated with the neural network is embedded into the MCTS algorithm as a heuristic evaluation function to guide the search process in finding the optimal conflict resolution strategy. Experimental results show that the average conflict resolution rate is 96.8% in different conflict scenarios, and the taxiing time required to resolve conflicts is reduced by an average of 42.77% compared to the taxiing time in actual airport surface operations.

Keywords:

air traffic management; airport surface operation; conflict resolution; prior knowledge; Monte Carlo Tree Search

1. Introduction

Amidst the rapid development of global civil aviation industries, the air traffic demand is experiencing a pronounced ascent. According to the forecast of the International Air Transport Association (IATA 2022), the total number of air passengers will reach 4 billion by 2024 [1]. Simultaneously, major airports actively consider expansion that involves the construction of additional runways and taxiways to accommodate the escalating demand for airport capacity. It also means more complex taxiway configuration for the airport, subsequently increasing the potential for taxiing conflicts among aircraft [2]. Furthermore, this change can intensify the workload for both pilots and airport traffic controllers (ATCOs), potentially impacting the safety and efficiency of airport surface operations [3]. To achieve safe, efficient, flexible and autonomous airport ground movement scheduling, the Next-Generation Air Transportation System (NextGen) proposes the concept of automation for airport ground operations. It aims to utilize intelligent decision-making algorithms and advanced sensing technologies to assist ATCOs in real-time data processing, path planning and taxiing trajectory control, ensuring the avoidance of conflicts among aircraft while maintaining the safe and continuous operation of the airport surface [4]. In recent years, the emergence of advanced equipment such as Airport Surface Detection Equipment (ASDE-X), Automatic Dependent Surveillance–Broadcast (ADS-B) [5,6] and Advanced Surface Movement Guidance and Control System (A-SMGCS) [7] has provided users with high-quality decision-making support, and technological assistance for the automation of airport surface operation.

A Conflict Detection and Resolution (CD&R) system is one of the core modules of ground automation operation, which guides aircraft to execute a sequence of actions to avoid conflicts after detecting potential conflicts by monitoring the aircraft’s position, speed and other status information. While most of the existing research focuses on data monitoring, trajectory prediction and route optimization, which have the ability to alert ATCOs of potential conflicts promptly, there is still a lack of emphasis on automated decision support for conflict resolution. Taxiing conflicts can be categorized into long-term and short-term conflicts based on the time horizon of conflict detection [8]. Long-term conflicts span a broader timeframe, often allowing for proactive measures to be taken in advance for resolution. Research on long-term conflict resolution mainly involves dynamic route planning, which integrates Conflict Detection and Resolution into path re-planning to generate new conflict-free paths for an aircraft. It can be tackled using exact or heuristic approaches, such as the mixed-integer linear programming method [9,10], genetic algorithm, A-star algorithm and simulated annealing algorithm [11,12]. This method provides a global resolution for all aircraft, enabling the assignment of appropriate taxi routes for each aircraft and even precise time schedules. It ensures conflict avoidance during aircraft taxiing and prevents the triggering of chain reactions of conflicts. However, these approaches heavily rely on the conformance monitoring of the aircraft’s actual taxi trajectory with the planned trajectory. Given the uncertainty and dynamics of airport surface operations, any deviation from the planned trajectory results in a misalignment between execution and the original plan, disrupting subsequent plans. The computation time of the re-planning model tends to grow with the increasing number of planned aircrafts. This may require sacrificing solution quality, such as reducing computation time to find a locally optimal solution. As a result, the long-term conflict resolution approach may encounter challenges when applied in the large-scale real-word.

Short-term conflicts are impending conflicts. Due to the shorter timeframe, the resolution of short-term conflicts necessitates more immediate interventions. Certain researchers began to model the aircraft conflict problem as an agent decision-making issue. Deep reinforcement learning has significant advantages in addressing sequential decision problems, enabling substantial reductions in response time and facilitating continuous advisory functions for an aircraft. Zhou proposed a Deep Q Network (DQN) model to realize the dynamic control of aircraft taxiing speed in the taxi intersection area, but only considered the situation at a single taxiway intersection [13]; Shin-Lai designed and trained a DQN composed of convolutional neural networks, which is used to capture flight position and action based on images, to resolve conflicts [14]. Hasnain Ali, based on model-free reinforcement learning, used the Proximal Policy Optimization (PPO) algorithm to find an effective departure metering policy, which can alleviate airport traffic congestion and minimize potential conflicts in the taxiing process [15]. Deep reinforcement learning algorithms can make quick decisions and significantly improve computational efficiency; however, the training process is very time-consuming. Additionally, the quality of the collected samples has a significant impact on the model’s performance. In airport surface operation, the effectiveness of the conflict resolution model based on reinforcement learning can be affected due to the uneven sample quality resulting from the presence of diverse types of moving vehicles and complex operating conditions.

While the above studies lack consideration of human behavior, they primarily rely on predefined rules to control the aircraft taxiing process, such as the limitation of taxiing speed and the assignment of aircraft priority. For each conflict detected, aircraft priority is determined according to simple rules, including ‘first-come-first-served’; a departure aircraft is given priority over an arrival aircraft [16,17,18]. In such cases, the conflicting aircraft with lower priority has to alter its route from its ideal path or slow down to ensure that the aircraft can safely enter the next area [19,20,21,22,23]. This kind of rule-based method can effectively guarantee the safety during aircraft taxiing, but depends on subjective assumptions and poses challenges in fully and accurately summarizing the characteristics of aircraft taxiing in the real-world. Rules are often static and do not easily adapt to dynamic and rapidly changing environments, resulting in delays and inefficiencies in airport surface operations. An effective approach involves real-world historical data to establish the conflict resolution model from a realistic perspective that is more acceptable to ATCOs [24]. Duggal proposed a random forest model to learn the controller’s behavior to assign aircraft priority, and effectively resolve potential taxiing conflicts by arranging a low-priority aircraft to decelerate [25]; Pham integrated deep reinforcement learning with prior knowledge, employing the Generative Adversarial Imitation Learning (GAIL) and PPO algorithms to learn the distribution of an aircraft’s taxi speed from historical taxi trajectories and provide acceleration recommendations at each time interval [26]. These studies have focused on learning the assignment of aircraft priority and the distribution of taxiing speeds in real-world scenarios. A critical aspect that warrants thoughtful consideration is the timing of aircraft release after waiting at designated points. Controllers must consider factors like surrounding traffic conditions and aircraft characteristics to allocate a reasonable and safe waiting period. Setting the waiting time to be too short can compromise safety, which is a situation that is difficult for controllers to accept. Conversely, overly long waiting times can lead to inefficiencies by prolonging taxiing times. We can use historical waiting time data as a reference with a foundational understanding of how controllers have historically approached similar situations.

Therefore, this paper introduces a conflict resolution framework incorporating prior knowledge, and aims to resolve the short-term local conflict at an intersection. It formulates the conflict resolution problem as a Markov Decision Process (MDP), and solves this MDP problem using MCTS with a neural network. The waiting time assigned using the ATCOs extracted from historical taxiing trajectories can be introduced as prior knowledge. By employing a neural network, we learn the waiting time allocation strategy and integrate it into the Monte Carlo Tree Search algorithm. This integration effectively guides the action search process within MCTS, thereby refining the search space. The ultimate goal of this framework is to regulate aircraft movement by providing appropriate accelerations and waiting time instructions at each time step, thus effectively preventing conflicts during taxiing.

The structure of this paper is organized as follows: Section 2 presents the description of the aircraft taxiing conflict problem at an intersection and establishes a conflict resolution model based on MDP. Section 3 gives the conflict resolution framework incorporating prior knowledge; Section 4 and Section 5 give the analysis of the simulation results and the final conclusions.

2. Problem Formulation

2.1. Problem Statement

Taxiing conflict occurs when two or more aircrafts cross the same taxiway segment or intersection with a distance separation smaller than the minimal safe threshold, thereby disrupting orderly and safe taxiing. As illustrated in Figure 1, there are three different types of conflicts: intersection conflicts, head-on conflicts and rear-end conflicts. The taxiway system, connecting the ramp and runway areas, is a critical component of the airport infrastructure. Due to the intricate nature of taxiway networks, aircraft conflicts often arise at taxiway intersections. In this paper, we concentrate specifically on addressing conflicts that occur at taxiway intersections.

Intersection conflict refers to when two or more aircrafts occupy a certain intersection at more or less the same time, creating a loss of separation. Figure 2 illustrates an intersection conflict scenario. Every aircraft in the scenario is represented by its position, velocity and heading angle. Aircraft

f_{i}

and

f_{j}

converge at an intersection from different directions; the positions and speeds of the two aircraft are

[p_{i}, v_{i}]

and

[p_{j}, v_{j}]

. Aircrafts converge at an intersection from different directions. The line with the arrow denotes the aircraft heading; the green portion indicates the taxiway intersection area, which is assumed to be a square area of

L \times L

; the yellow part is the aircraft protection area, symbolized by the square area of

B \times B

, accounting for the dimensions of the aircraft fuselage and wingspan. In actual operation, conflict resolution involves ATCOs issuing instructions to an aircraft to maintain a certain safety distance, and continually monitoring the surface traffic until the conflict is completely resolved. Nevertheless, there are no specific minimum distance separation requirements between aircrafts on the ground. In this work, we propose a concise definition of the conflict-free condition for an aircraft at the intersection based on the intersection’s topological structure and the dimensional characteristics of the aircraft. It stipulates that two aircrafts must not concurrently occupy the intersection zone. Specifically, upon one of the aircrafts reaching the conflict area, another aircraft has already passed the intersection:

\frac{S_{P_{j}} + L + B}{v_{j}} < \frac{S_{P_{i}}}{v_{i}}

(1)

\frac{S_{P_{i}} + L + B}{v_{i}} < \frac{S_{P_{j}}}{v_{j}}

(2)

2.2. MDP Formulation

Based on the above description, the conflict resolution process can be discretized into state transitions at each moment, representing a sequential decision problem. We can formulate this sequential decision problem as an MDP problem. The formalism allows for modeling a simulated environment with sequential decisions to make in order to receive a reward after a certain sequence of actions. However, the real conflict resolution process is complex and filled with uncertainties. To simplify our model, we made the following assumptions:

(1): The impact of weather and trajectory deviation is negligible, and the next position of the aircraft only depends on the current state and the commands received from the ATCOs.
(2): The flight deck is equipped with a speed display and suggestion device; the time used for voice communication between the controller and the pilot as well as the response time of the aircraft operation are not considered.
(3): The prioritization of the aircraft was not considered in the taxiing process.

In intersection conflict scenarios, controllers regulate the aircraft’s speed through a series of actions and interact with the surrounding environment to obtain maximized long-term returns. The interactive process can be modelled as a tuple,

< S, P, A, R >

[27].

$S$ is a set of states that are possible in an environment (state space);
$A$ is a set of actions available to perform in a state;
$P$ is the probability that the action performed in the current state $S_{t}$ will lead to the next state $S_{t + 1}$ ;
$R$ is a reward for reaching the next state $S_{t + 1}$ with action $a$ .

The key to solving MDP is to find an execution strategy. A policy,

π

, is a mapping from the state

S

to one specific action

A

, which is the probability distribution of the action in a given state.

π : S \to A

(3)

The optimal policy

π^{*}

is the policy that maximizes excepted rewards.

π^{*} = a r g m a x E [{\sum_{t = 0}^{T} R_{S t, a t} |}_{} π]

(4)

2.2.1. State Space

The state space is composed of necessary information at that moment, including the flight type, position, speed and heading angle of two aircrafts, as well as the state information of the intersection area. The state vector of one aircraft can be denoted by

[v, x, y, θ]

, where

v

is the speed of the aircraft,

[x, y]

is the position of the aircraft,

θ

is the heading angle of the aircraft in its own coordinate system and the taxiway intersection information is the central position coordinates

[x_{c}, y_{c}]

. It is assumed that the state at a certain time is

S_{t}

, given with Equation (5).

S_{t} = [t, x_{c}, y_{c}, v_{i}, x_{i}, y_{i}, θ_{i}, v_{j}, x_{j}, y_{j}, θ_{j}],

(5)

2.2.2. Action Space

Given that historical conflict resolution strategies often involve deceleration and waiting, this paper’s set of actions is defined to encompass two categories of instructions for aircraft reassignment: speed adjustment commands and wait time instructions. The speed adjustment instructions comprise acceleration, deceleration and maintaining speed. In situations where the aircraft transitions into a holding state, wait time instructions are added. Furthermore, due to conflicts occurring between two or more aircrafts, this study establishes that instructions issued at each moment are directed to adjust only one aircraft, while the remaining aircraft maintains their previous operational states. Consequently, every command must explicitly indicate the targeted aircraft for adjustment. To encompass the full spectrum of action possibilities, the complete action space necessitates the inclusion of actions for both aircrafts. The following is the action space for a single aircraft:

A_{f l i g h t} = {\begin{matrix} A_{s p e e d} = [- 1, 0, 1] m / s^{2} \\ A_{h a l t} = [5, 10, 15, 20] s \end{matrix}

(6)

2.2.3. State Transition

The model assumes that in every time step, a collision avoidance decision will be made. The best action will be selected from the action set. When the agent takes an action, it initiates a transition process to the next state. It is important to note that the transition probability in this paper is deterministic because the MCTS algorithm selects a specific action. In the surface taxiing process with positional uncertainty, a kinematic model is employed to update state information for each aircraft.

\begin{matrix} x^{'} = x_{} + v c o s θ Δ t + \frac{1}{2} a c o s θ (Δ t)^{2} \\ y^{'} = y_{} + v s i n θ Δ t + \frac{1}{2} a s i n θ (Δ t)^{2} \\ v^{'} = v + a Δ t \end{matrix}

(7)

where

v

denotes the aircraft current speed,

a

indicates the acceleration,

θ

indicates the heading angle and

Δ t

is the gap between two consequent time steps. In order to control the aircraft more precisely, the model is discretized in 1 second and defined as

Δ t

, which means that the aircraft’s speed, heading and position will be updated every 1 s.

2.2.4. Reward Function

The reward function guides the agent towards learning in a direction that yields higher rewards. In this work, the reward function is constructed manually based on the model’s objectives. We propose a multi-objective reward function, derived from two pivotal aspects—safety and efficiency.

Safety is the assessment of the overall condition of the aircraft, judging whether the aircraft has successfully exited. Considering the scenario constraints described in the previous section, the terminal state of this MDP includes two different types of state:

When two aircrafts cannot maintain the non-conflict requirement, a conflict will ensue and the state will terminate instantaneously;
When both aircrafts cross the intersection safely, it is regarded as the success state and it will terminate this process.

This safety objective can be handled with the following reward function:

R_{t e r m i n a l} = {\begin{matrix} + 1, & s u c c e s s \\ - 1, & c o l l i s i o n \\ 0, & n o t d o n e \end{matrix}

(8)

Efficiency is an evaluation of the taxiing time and speed for each aircraft, primarily encompassing two aspects. Firstly, penalties are applied when an aircraft stops or taxis at a slow speed. Secondly, the stability of the aircraft’s acceleration during taxiing is taken into account, and penalties are imposed for fluctuations in aircraft acceleration.

R_{s p e e d} = {\begin{matrix} \frac{(V_{n} - 3.6)}{3.6} \cdot 0.05, i f V_{n} \in [0, 3.6] \\ - 0.01, i f a_{n} \neq a_{n - 1} \end{matrix}

(9)

where

n

represents the current decision step;

V_{n}

and

a_{n}

denote the speed and acceleration in the current step.

Based on the above reward function, the total reward for the MDP is the sum of all individual rewards, shown as follows:

R = R_{t e r m i n a l} + \sum_{i = 0}^{2} R_{s p e e d}^{f_{i}}

(10)

3. Methodology

The Monte Carlo Tree Search algorithm is widely applicable to problems involving sequential decision making, including various games and real-world planning, optimization and control problems [28]. Compared with the high time cost of reinforcement learning, MCTS is an online heuristic search algorithm that does not require sample collection for training and possesses backward thinking capabilities. However, computation time is a crucial limiting factor for online search algorithms. The running time of MCTS will increase exponentially with an expansion of the action space and the simulation depth of the tree. In recent years, several researchers tackled this problem with great success by combining MCTS with machine learning [29]. Neural network architecture has emerged as a promising tool to alleviate human effort. The most remarkable accomplishment may be AlphaGo, which employs deep convolutional networks for modelling both value and policy functions, enabling action abstractions and rapid value estimations [30].

Therefore, we propose a conflict resolution framework that combines a deep neural network with MCTS. The overall structure of the framework is shown in Figure 3; it can be divided into three components:

The MDP model updates the state and provides feedback to MCTS;
MCTS is the main search component of the agent, used as an online MDP solver to generate optimal decisions;
The policy neural network will be pre-trained based on the historical taxiing trajectory and takes the “current state” explored using the MCTS as inputs and gives out “action distribution” as outputs to guide MCTS.

3.1. Policy Neural Network

To obtain a strategy aligned with the preferences of ATCOs based on their past experiences, our approach incorporates a strategy neural network. This network is designed to learn the decision-making processes employed with controllers during actual surface operations, thereby improving the efficiency and performance of the MCTS process. In real operation, ATCOs usually issue instructions to an aircraft to slow down or wait after detecting the potential conflict. Due to volatility of the real ground radar trajectory data, acquiring precise and reliable aircraft acceleration data can be challenging. As an alternative, we chose to estimate the waiting time of conflict aircrafts near intersections to train the neural network. The specific steps of extracting feature information of conflict scenes are as follows:

The taxiing trajectory data of each flight are preprocessed to eliminate singular points, match the taxiway map of the airport and obtain the actual path of every flight during taxiing.
Identify flights that encounter stop-and-go situations during the taxiing process, and confirm the intersection position where the flight is located when waiting, as well as the start time of the waiting.
Find out other flights passing through the same intersection, compare their time difference and determine the collision aircraft set.
Extract the feature information of the conflict scene when the aircraft began to wait, including the aircraft’s own flight number, model, speed, position and heading, as well as the center position of the intersection. Additionally, categorize the waiting time of the conflict aircraft into four intervals: 5 s, 10 s, 15 s and 20 s.

The neural network is trained to establish the mapping between real-world conflict scenarios and the assigned waiting time using controllers. This allows the network to output waiting times similar to those following the strategy of ATCOs. In this study, we adopt the most classical neural network structure, Multi-Layer Perceptron, which is widely used to solve problems requiring supervised learning. MLP is a feedforward artificial neural network consisting of fully connected neurons with a nonlinear kind of activation function, organized in at least three layers. Our policy network structure consists of five fully connected layers, including an input layer, three hidden layers and an output layer, as depicted in Figure 4. The input to the network is derived from the feature information extracted from historical conflict scenarios, which consists of the aircraft’s position, velocity and heading, and the location of intersections, as previously mentioned in the feature extraction. The output represents the waiting time category for flights. Finally, a Softmax function layer is added to the end of the network structure to output the probability distribution corresponding to each waiting time category. The activation functions for the input layer, hidden layers and output layer are Relu, Relu and Softmax, respectively, while the loss function employs Cross-Entropy loss. The accuracy of the network’s predictions will be evaluated using a separate test set, the details of which are provided in Section 4.

3.2. MCTS Combined with PNN

This section provides an in-depth discussion on the integration of an MCTS and policy neural network. As an iterative algorithm, MCTS is employed to probe the state space and identify the most promising action. It consists of searching combinatorial spaces represented by trees. The problem tree is built asymmetrically, with the main focus on the most promising branches. To incorporate prior knowledge, the trained policy neural network is integrated into the MCTS process, guiding the construction of the search tree through probabilistic evaluation. This integration accelerates the search process by limiting the available set of actions to the most promising ones. In such search trees, nodes denote states of the problem, whereas edges denote transitions (actions) from one state to another. In our MCTS-PNN model, every MCTS state transition is accompanied by a policy neural network evaluation, and the network output actions and probabilities are saved to the edge. Therefore, each edge

(s, a)

within the search tree stores its visit count

N (s, a)

, action value

Q (s, a)

and prior probability

P (s, a)

.

Given the current time is

T_{0}

, the initial state

S_{0}

aligns with the tree’s root node. The tree is traversed via multiple iterations, originating from the root node that corresponds to the current state under consideration. As shown in Figure 5, each iteration of the MCTS algorithm has four steps: selection, expansion, simulation and backpropagation. The steps of the MCTS approach are elaborated in the following:

(1): Selection: Starting from the root node, traverse down the child nodes already stored in the tree, and choose the most promising node to explore using the predefined tree policy. At this stage, the tree policy should ensure an appropriate balance between exploitation (selecting nodes with high rewards) and exploration (selecting nodes that have not been chosen before). Classic Upper Confidence Bounds applied to Trees (UCTs) are introduced to specifically handle the exploitation–exploration problem. Each node has an associated UCT value, and during selection, it always chooses the child node with the highest value [31]. We integrate PNN into MCTS and modify the UCT formula. The improved UCT formula is as follows:

U C T (v, v^{'}) = \frac{Q (v^{'})}{N (v^{'})} + C \times P (v^{'}) \sqrt{\frac{N (v)}{1 + N (v^{'})}}

(11)

where v is the current node, v′ is the child node of the current node, Q(v′) denotes the total reward of all playouts that passed through this child node, N(v′) is the number of visits to the child node, P(v′) is the probability value of the child node and C is a constant parameter that can be adjusted to control the balance between exploration and utilization; the smaller the value, the more emphasis on exploitation, and the larger the value, the more emphasis on exploration.

(2): Expand: It occurs when the current selected node is a non-terminal state and has not been visited before. In this case, at least one new child node can be added to the current tree under its parent node (the previous state). The specific expanding operation relies on PNN and the action set. During each expansion step, the state of the leaf node is processed through the PNN to derive a probability distribution of all possible child nodes. This distribution is subsequently used to determine the UCT value of the node during the action selection phase. Since the PNN only provides the action evaluation for waiting time, the expansion of nodes is divided into two situations. When the aircraft’s speed drops to 0, signaling its transition into a holding state, the PNN processes the current node to ascertain the waiting time action and its corresponding probability (P ≤ 1). Simultaneously, the probability of speed actions is uniformly set to 1 (P = 1), discouraging an aircraft from coming to a halt in the conflict area. Conversely, the node will not be put into PNN for evaluation and the child nodes are only created with a set of speed actions.
(3): Simulation: After adding the new nodes to the tree, a node is randomly selected following a default policy for simulation, which generates a new state. The simulation continues in the new state using the random policy until reaching a terminal state with a final reward. In the context of this paper, the aircraft taxiing process is simulated using the aircraft kinematics model, the terminal state of MDP formulation is used to determine if the simulation process has ended and the final reward value is calculated based on the reward function.
(4): Backpropagation: When simulating the whole process to a terminal state, the evaluation value will be retroactively propagated to all nodes along the path. This process involves updating the visit count and action value of each node. Specifically, the visit count of each node increases by 1, and the average reward value of the node is computed based on the final reward during the simulation and the number of visits accumulated. These updated parameters are then utilized to calculate and update the UCT value of each node.

Each iteration consists of the four steps outlined above. During each time step, the Monte Carlo tree will repeat sufficient iterations to select the optimal action. The subsequent state for the two aircrafts will be updated based on the dynamical model and returned to the agent as the next environmental state. Consequently, the process of conflict resolution can be represented as a sequence of actions, each accompanied by its execution time. At each execution time, the MCTS algorithm will choose an action that is executed with one of the aircrafts until the conflict is successfully resolved.

4. Experiment

4.1. Experimental Setup

Our experimental data are sourced from the aircraft surface ground movement trajectory data at Guangzhou Baiyun Airport during the period from February to October 2020. This dataset encompasses the aircraft’s taxiing trajectory data and the geographical coordinates (longitude and latitude) of all nodes within the airport’s road network structure. According to the layout information of the actual airport and the description of conflict areas and aircraft protection zones in Section 2, a square of 80 m × 80 m is set up for each taxiway intersection conflict area, and a 50 m × 50 m protection zone is set up for an aircraft.

Based on the feature extraction method in Section 3, the feature of the waiting scene is extracted at the moment when the aircraft in the conflict aircraft set starts to wait. A total of 1880 historical aircraft waiting scenes are extracted as the neural network dataset, and the training set and test set are divided according to 4:1. As for the design of the scenes for MCTS simulation, we define the scenario in which two flights have just pulled into the taxiway containing the conflicted intersection as the intersection conflict scene. A total of 1000 intersection conflict scenes for simulation were generated.

Moreover, utilizing historical trajectory data, we confirmed that the average taxiing time for an aircraft in real operation is 140.4 s, with a maximum taxiing time of 525 s. The average speed of the aircraft during taxiing is 20.7 km/h, reaching a maximum speed of 40.8 km/h. The average distance for conflict aircrafts is 232 m, with a maximum distance of 717 m. For the configuration of the simulation model, in alignment with airport operation requirements, we set the maximum taxi speed limit to 55 km/h. At turning points, the maximum taxi speed is set to 18.52 km/h to ensure safety and compliance [32].

The hardware environment for all experiments was an HP Z840 workstation; an Intel(R) Core (TM) i5-7500 CPU@3.4 GHz; 32 GB RAM; and a Windows 10 operating system. The software environment is Python programming language (version 3.7) and its integrated development environment (IDE) is PyCharm (2021.1).

4.2. Experiment Results

4.2.1. Training Results of PNN

In order to narrow the search space, a PNN is trained with the state–action pairs and outputs a prior probability for each transition. Under the Python framework, the TensorFlow library is utilized to construct a Multi-Layer Perceptron (MLP) model. The model configuration incorporates a batch size of 32, training with the Adam optimizer over 1000 epochs and a learning rate of 0.001. Detailed parameters are presented in Table 1.

The Cross-Entropy loss value during the training process of the policy network based on supervised learning is depicted in Figure 6. The horizontal axis signifies the number of training steps while the vertical axis represents the loss value. The orange curve maps the loss value alteration on the test set, and the blue curve illustrates the change on the training set. Throughout the training process, we observed that the loss value gradually decreased with the increase in the number of iterations. This indicates that the ability of the neural network to imitate the controller’s decision of waiting time allocation gradually became effective during the training process. The classification accuracy of the trained network model was evaluated on a separate test set, and the number of categories that correctly predicted aircraft waiting time accounted for 70.1% of the total sample.

4.2.2. Performance of MCTS Combined with PNN

(1): Parameter Settings

The efficacy of the MCTS algorithm relies significantly on two parameters: the number of iterations performed each time a decision is required and the search depth. In this experiment, different parameters are selected to assess their impact on performance. The number of iterations is set between 100 and 600, while the search depth is limited to 1, 2, 3, 4. Based on the results and performance comparison, the optimal parameters are chosen to conduct the next experiment. The comparative analysis of performance impact is presented in Table 2.

(2): Evaluation metrics

From a practical perspective, the taxiing conflict resolution process should be safe and efficient. This study primarily assesses the performance of the Monte Carlo tree algorithm combined with PNN by considering metrics such as the success rate of conflict resolution, average computation time and aircraft taxiing time in the simulation environment.

The success rate (SR) of conflict resolution indicates the percentage of the runs where the conflict aircrafts successfully pass the intersection:

S R = \frac{n_{s o l v e d}}{n_{t o t a l}} \times 100 %

(12)

where

n_{s o l v e d}

is the number of scenarios successfully resolved, and

n_{t o t a l}

is the number of total scenarios.

The average running time (ART) refers to the mean computation time at each decision step:

\begin{matrix} A R T = \frac{1}{n} \sum_{i = 0}^{n} \frac{T_{s i n g l e s c e n e}}{N_{d e c i s i o n s t e p}} \\ n = s c e n a r i o_{s o l v e d} \end{matrix}

(13)

where

T_{s i n g l e s c e n e}

is computing time for a single scenario;

N_{d e c i s i o n s t e p}

is the number of decision steps per scenario.

Efficiency can be measured using the taxiing time for resolving the conflict based on MCTS-PNN. We define efficiency as the glide time reduction rate, which is obtained using the average time difference (ATD) between the taxiing time in actual ground operation and the taxiing time obtained with the simulation model:

\begin{matrix} A T D % = \frac{1}{n} \sum_{i = 0}^{n} \frac{T_{h i} - T_{s i}}{T_{h i}} \times 100 % \\ n = s c e n a r i o_{s o l v e d} \end{matrix}

(14)

where

T_{h i}

is taxiing time in real surface operation;

T_{s i}

is taxiing time in our simulated model.

(3): Result analysis

Figure 7 shows the correlation between the success rate and the number of iterations and the depth of simulation. Increasing the simulation depth has a positive effect on improving the success rate. This can be ascribed to the advantage that greater depths allow for simulating in the future movement of aircrafts over a period of time, thereby enabling proactive measures to avert conflicts. However, as the simulation depth gradually increases, the success rate tends to converge within a stable range. For instance, increasing the search depth to 4 does not result in significant performance improvement. In addition, the success rate initially increases with the number of iterations but eventually plateaus, showing no significant further improvement. We observed where an increase in simulation depth leads to an earlier stabilization of the success rate. When the simulation depth exceeds 3, the gradual stabilization of the success rate begins at around 200 iterations.

Observing the result in Figure 8, it is evident that the average running time escalates with an increase in the number of iterations and the simulation depth, as the frequency of strategy utilization increases exponentially with deeper simulation depths. Considering the success rate and running time, it is evident that when the iteration number is N = 100 and the depth is D = 4, the success rate is superior to that of N = 600 with a depth of D = 1. Additionally, the runtime for this configuration is significantly shorter than that for N = 600. These findings suggest that enhancing the model’s performance relies less on the number of iterations and more on increasing the depth.

Based on the provided Table 3, an apparent correlation emerges between taxiing time and simulation depth. Upon contrasting the taxiing process of an aircraft at varying depths within the same scenario, we found that deeper simulation depths resulted in longer taxiing durations. In scenarios with greater depth, the model’s ability to make forward predictions encourages a proactive strategy to ensure safety. As the model anticipates potential conflicts in the future, aircrafts tend to reduce their speed and wait in advance. Consequently, this approach leads to an overall increase in taxiing time. In contrast, at shallower simulated depths, the timing of aircraft deceleration occurs later, resulting in fewer instances of aircraft waiting situations. Also, this is the reason why the lower success rate observed at lower depths exists.

Given real-time constraints, it is impractical to perform extensive calculations to obtain an optimal solution. A synthesis of these experimental categories suggests that setting the number of simulations to 200 and the search depth to 3 yields satisfactory performance in terms of the success rate, average taxiing time and average onboard computation time for this problem. Specifically, the success rate achieved was 96.8%, and the taxiing time required to resolve intersection conflicts decreased by 42.77% compared to real trajectory data.

4.3. Discussion

To implement the learning of prior knowledge from ATCOs in real airport surface operation, we trained neural network parameters using supervised learning. The accuracy metric for network training reflects the degree of similarity between the waiting time allocation strategy output with the policy network and the actual strategy employed with ATCOs. Compared to imitation learning, supervised learning involves a direct mapping from data to behavior, where the optimization model of imitation ability is updated by assessing the error between predicted values and labeled values. This approach not only simplifies implementation but also results in improved training efficiency and a higher degree of imitation. The MCTS algorithm, when under the condition of combining prior knowledge, demonstrates a high success rate in conflict resolution and ensures a certain level of safety. This is due to most existing studies predominantly relying on predefined rules for conflict avoidance, while others have tended to integrate conflict resolution modules into path planning without a specific statistical analysis of conflict resolution. Zhou’s research focused on conflict resolution at intersections, presenting the DQN algorithm, which achieved a 70% success rate in resolving conflicts while reducing taxiing time through intersections by 13.16% [14]. In comparison to Zhou’s method, our approach demonstrated even more promising results. Additionally, our method takes into account the controller’s waiting time allocation habits, making it more adaptable to real-word airport surface operations.

5. Conclusions

In response to the challenge of addressing intersection conflicts during aircraft taxiing in airport surface operations, we present a conflict resolution framework that incorporates prior knowledge by combining MCTS with a neural network. The waiting time allocation strategy utilized with ATCOs in the historical conflict scene is extracted as prior knowledge, which is used for training the neural network. Combined with prior knowledge, the PNN-MCTS algorithm is utilized to resolve conflicts, achieving a success rate of 96.8%.

The application of a deep neural network can provide a heuristic search to reduce the width of the search space, while the forward-looking feature of MCTS makes up for the shortfall of the deep network that only relies on the historical data. Compared to traditional path planning methods, the PNN-MCTS algorithm requires less decision-making time, making it suitable for addressing short-term and local conflicts. Unlike rule-based methods that command an aircraft to wait before entering the conflict zone, this algorithm minimizes unnecessary waiting situations at intersections by adjusting speeds. In contrast to reinforcement learning algorithms, it does not require extensive model training time, allowing for an online search for the best resolution strategy.

Due to the limited conditions, this paper still has some deficiencies. It only deals with local conflicts at intersections without considering whether conflict adjustments will trigger other subsequent conflicts. In the future, we will integrate this intelligent conflict resolution method into the overall airport ground operations. This extension will involve considerations for seamless air–ground integration, enhancing information sharing among airlines, airports and Air Traffic Controllers; improving situational awareness; and enhancing the safety margins of airport operations. Moreover, at the practical level, this paper considers conflict avoidance under ideal conditions but does not account for the influence of weather factors, differences in application scenarios, communication delays of controllers, etc. Incorporating the approach into real operations is a crucial aspect to ensure its practical utility and effectiveness. One potential avenue to explore is the integration of our method with traditional rule-based approaches and advanced airport traffic management infrastructure, thereby bolstering the safety measures pertaining to aircraft taxiing operations. This is a process that requires careful planning, collaboration and ongoing refinement to ensure a successful integration into real-world scenarios.

Author Contributions

Conceptualization, D.S.; methodology, H.C.; software, H.C.; validation, H.C. and T.Z.; formal analysis, D.S.; investigation, D.S.; resources, D.S.; data curation, H.C.; writing—original draft preparation, H.C.; writing—review and editing, H.C.; visualization, H.C.; supervision, D.S., H.C. and T.Z.; project administration, D.S.; funding acquisition, D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Nanjing University of Aeronautics and Astronautics Research and Practice Innovation Program funded projects, xcxjh20220710 and xcxjh20220730.

Data Availability Statement

The data are not publicly available.

Conflicts of Interest

The authors declare no conflict of interest.

References

IATA 2020 Air Passenger Numbers to Recover in 2024. Available online: https://www.iata.org/en/pressroom/2022-releases/2022-03-01-01/ (accessed on 30 September 2022).
Vaddi, V.; Cheng, V.; Kwan, J.; Wiraatmadja, S.; Lozito, S.; Jung, Y. Air-ground integrated concept for surface conflict detection and resolution. In Proceedings of the 12th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference and 14th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Indianapolis, Indiana, 17–19 September 2012; p. 5645. [Google Scholar]
Yazgan, E.; Erdi, S.E.R.T.; ŞİMŞEK, D. Overview of studies on the cognitive workload of the air traffic controller. Int. J. Aviat. Sci. Technol. 2021, 2, 28–36. [Google Scholar] [CrossRef]
IBCA-2019-03; Research and Practice Report on Smart Airport Development. Department of Airports, Civil Aviation Administration of China: Beijing, China, 2019.
National Airspace System (NAS) Subsystem Level Specification for Airport Surface Detection Equipment—Model X (ASDE-X), FAA-E-2942, Version 1.1; Department of Transportation, Federal Aviation Administration: Washington, DC, USA, 2001.
Scardina, J. Overview of the FAA ADS-B Link Decision, Office of System Architecture and Investment Analysis; Federal Aviation Administration: Washington, DC, USA, 2002. [Google Scholar]
ICAO. Advanced Surface Movement Guidance and Control Systems (A-SMGCS) Manual; International Civil Aviation Organization: Montreal, QC, Canada, 2004. [Google Scholar]
Vaddi, V.; Sweriduk, G.; Cheng, V.; Kwan, J.; Lin, V.; Nguyen, J. Concept and Requirements for Airport Surface Conflict Detection and Resolution. In Proceedings of the 11th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference, including the AIAA Balloon Systems Conference and 19th AIAA Lighter-Than, Virginia Beach, VA, USA, 20–22 September 2011; p. 7050. [Google Scholar]
Evertse, C.; Visser, H.G. Real-time airport surface movement planning: Minimizing aircraft emissions. Transp. Res. Part C Emerg. Technol. 2017, 79, 224–241. [Google Scholar] [CrossRef]
Clare, G.; Richards, A.G. Optimization of Taxiway Routing and Runway Scheduling. IEEE Trans. Intell. Transp. Syst. 2011, 12, 1000–1013. [Google Scholar] [CrossRef]
Beke, L.; Uribe, L.; Lara, A.; Coello CA, C.; Weiszer, M.; Burke, E.K.; Chen, J. Routing and Scheduling in Multigraphs with Time Constraints-A Memetic Approach for Airport Ground Movement. IEEE Trans. Evol. Comput. 2023, 1. [Google Scholar] [CrossRef]
Weiszer, M.; Burke, E.K.; Chen, J. Multi-objective routing and scheduling for airport ground movement. Transp. Res. Part C Emerg. Technol. 2020, 119, 102734. [Google Scholar] [CrossRef]
Ma, J.; Zhou, J.; Liang, M.; Delahaye, D. Data-driven trajectory-based analysis and optimization of airport surface movement. Transp. Res. Part C Emerg. Technol. 2022, 145, 103902. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, W. Taxiing Speed Intelligent Management of Aircraft Based on DQN for A-SMGCS. J. Phys. Conf. Ser. 2019, 1345, 042015. [Google Scholar] [CrossRef]
Tien, S.L.; Tang, H.; Kirk, D.; Vargo, E. Deep Reinforcement Learning Applied to Airport Surface Movement Planning. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), San Diego, CA, USA, 8–12 September 2019; pp. 1–8. [Google Scholar]
Ali, H.; Pham, D.T.; Alam, S. A Deep Reinforcement Learning Approach for Airport Departure Metering Under Spatial–Temporal Airside Interactions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23933–23950. [Google Scholar] [CrossRef]
Zhao, N.; Li, N.; Sun, Y.; Zhang, L. Research on Aircraft Surface Taxi Path Planning and Conflict Detection and Resolution. J. Adv. Transp. 2021, 2021, 9951206. [Google Scholar] [CrossRef]
Jiang, Y.; Liu, Z.; Hu, Z. A Priority-Based Conflict Resolution Strategy for Airport Surface Traffic Considering Suboptimal Alternative Paths. IEEE Access 2020, 9, 606–617. [Google Scholar] [CrossRef]
Zhou, H.; Jiang, X. Research on Taxiway Path Optimization based on Conflict Detection. PLoS ONE 2015, 10, e0134522. [Google Scholar]
Bode, S.; Feuerle, T.; Hecker, P. Local conflict resolution for automated taxi operations. In Proceedings of the 1st International Conference on Application and Theory of Automation in Command and Control Systems, Barcelona, Spain, 26–27 May 2011; pp. 60–67. [Google Scholar]
Smeltink, J.W.; Soomer, M. An Optimisation Model for Airport Taxi Scheduling. Inf. J. Comput. Oper. Res. 2004, 11, 1. [Google Scholar]
Luo, X.; Tang, Y.; Wu, H.; He, D. Real-time adjustment strategy for conflict-free taxiing route of A-SMGCS aircraft on airport surface. In Proceedings of the IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 2–5 August 2015; pp. 929–934. [Google Scholar]
Zhu, X.P.; Lu, N. Airport Taxiway Head-on Conflict Prediction and Avoidance in ASMGCS. Appl. Mech. Mater. 2014, 574, 621–627. [Google Scholar] [CrossRef]
Bastas, A.; Vouros, G. Data-driven prediction of Air Traffic Controllers reactions to resolving conflicts. Inf. Sci. 2022, 613, 763–785. [Google Scholar] [CrossRef]
Duggal, V.; Tran, T.N.; Alam, S. Modelling Aircraft Priority Assignment by Air Traffic Controllers During Taxiing Conflicts Using Machine Learning. In Proceedings of the 2022 Winter Simulation Conference (WSC), Singapore, 11–14 December 2022; pp. 394–405. [Google Scholar]
Pham, D.T.; Tran, T.N.; Alam, S.; Duong, V.N. A Generative Adversarial Imitation Learning Approach for Realistic Aircraft Taxi-Speed Modeling. IEEE Trans. Intell. Transp. Syst. 2022, 23, 2509–2522. [Google Scholar] [CrossRef]
Lizotte, D.J.; Laber, E.B. Multi-objective Markov decision processes for data-driven decision support. J. Mach. Learn. Res. 2016, 17, 7378–7405. [Google Scholar]
Browne, C.B.; Powley, E.; Whitehouse, D. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef]
Świechowski, M.; Godlewski, K.; Sawicki, B.; Mańdziuk, J. Monte Carlo Tree Search: A review of recent modifications and applications. Artif. Intell. Rev. 2022, 56, 2497–2562. [Google Scholar] [CrossRef]
Silver, D.; Huang, A.; Maddison, C.J. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature 2016, 529, 484. [Google Scholar] [CrossRef]
Kocsis, L.; Szepesvári, C. Bandit based Monte Carlo planning. Machine Learning: ECML 2006. In Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, 18–22 September 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 282–293. [Google Scholar]
737/600/700/800/900 Flight Crew Training Manual. Available online: https://pdf4pro.com/view/737-600-700-800-900-flight-crew-training-manual-6f7587.html (accessed on 1 April 1999).

Figure 1. Taxiing conflict types.

Figure 2. Intersection conflict scenario.

Figure 3. Conflict resolution framework.

Figure 4. Construction of neural network.

Figure 5. Monte Carlo Tree Search phases.

Figure 6. Training loss of the PNN.

Figure 7. Success rate of conflict resolution.

Figure 8. Average running time of each decision step.

Table 1. Neural Network Parameters.

Model Parameters	Parameters’ Value
Number of layers in the network	5
Number of nodes in each layer	(9, 9, 9, 9, 4)
Activation function	Relu
Loss function	Cross Entropy
Number of training epochs	1000
Network optimizer	Adam
Batch size	32
Learning rate	1 × 10⁻³

Table 2. MCTS parameters.

Algorithm Parameters	Parameters’ Value
Number of iterations	(100, 200, 300, 400, 500, 600)
Number of search depths	(1, 2, 3, 4)

Table 3. Percentage of taxiing time difference.

ATD%	D = 1	D = 2	D = 3	D = 5
N = 100	45.62%	43.91%	42.5%	36.24%
N = 200	47.83%	45.84%	42.77%	36.36%
N = 300	46.02%	42.82%	39.7%	33.21%
N = 400	43.9%	41.43%	37.19%	31.48%
N = 500	42.12%	39.08%	35.35%	29.17%
N = 600	41.85%	39.11%	35.23%	29.98%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sui, D.; Chen, H.; Zhou, T. A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge. Aerospace 2023, 10, 914. https://doi.org/10.3390/aerospace10110914

AMA Style

Sui D, Chen H, Zhou T. A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge. Aerospace. 2023; 10(11):914. https://doi.org/10.3390/aerospace10110914

Chicago/Turabian Style

Sui, Dong, Hanping Chen, and Tingting Zhou. 2023. "A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge" Aerospace 10, no. 11: 914. https://doi.org/10.3390/aerospace10110914

APA Style

Sui, D., Chen, H., & Zhou, T. (2023). A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge. Aerospace, 10(11), 914. https://doi.org/10.3390/aerospace10110914

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Conflict Resolution Strategy at a Taxiway Intersection by Combining a Monte Carlo Tree Search with Prior Knowledge

Abstract

1. Introduction

2. Problem Formulation

2.1. Problem Statement

2.2. MDP Formulation

2.2.1. State Space

2.2.2. Action Space

2.2.3. State Transition

2.2.4. Reward Function

3. Methodology

3.1. Policy Neural Network

3.2. MCTS Combined with PNN

4. Experiment

4.1. Experimental Setup

4.2. Experiment Results

4.2.1. Training Results of PNN

4.2.2. Performance of MCTS Combined with PNN

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI