A Survey on Artiﬁcial Intelligence (AI) and eXplainable AI in Air Trafﬁc Management: Current Trends and Development with Future Research Trajectory

: Air Trafﬁc Management (ATM) will be more complex in the coming decades due to the growth and increased complexity of aviation and has to be improved in order to maintain aviation safety. It is agreed that without signiﬁcant improvement in this domain, the safety objectives deﬁned by international organisations cannot be achieved and a risk of more incidents/accidents is envisaged. Nowadays, computer science plays a major role in data management and decisions made in ATM. Nonetheless, despite this, Artiﬁcial Intelligence (AI), which is one of the most researched topics in computer science, has not quite reached end users in ATM domain. In this paper, we analyse the state of the art with regards to usefulness of AI within aviation/ATM domain. It includes research work of the last decade of AI in ATM, the extraction of relevant trends and features, and the extraction of representative dimensions. We analysed how the general and ATM eXplainable Artiﬁcial Intelligence (XAI) works, analysing where and why XAI is needed, how it is currently provided, and the limitations, then synthesise the ﬁndings into a conceptual framework, named the DPP (Descriptive, Predictive, Prescriptive) model, and provide an example of its application in a scenario in 2030. It concludes that AI systems within ATM need further research for their acceptance by end-users. The development of appropriate XAI methods including the validation by appropriate authorities and end-users are key issues that needs to be addressed.


Air Traffic Management
Air Traffic Management (ATM) is a vast and complex domain [1] encompassing all activities carried out to ensure the safety and fluidity of air traffic. In a nutshell, ATM aims at efficiently managing and maximising the use of the different resources available to it-e.g., the airspace and its subdivisions such as the sectors (see Figure 1), the air routes (see Figure 2), the airport, the runways-by the users of the resources-e.g., aircrafts, airlines-, in any time-frame of their use of the resources-i.e., in the taxi phase in the airport, or any  Nowadays, computer science plays a major role in data management and decisions made in ATM, and even if humans remain as the main agents, computer science remains important, and is more likely to have a more relevant part in the future with increasing air traffic-notwithstanding actual COVID situation [5]-and its complexity-notably with the insertion of new aerial vehicles such as drones, e-VTOL into the airspace [6].
Artificial Intelligence (AI), being one of the most researched topic in computer science, should be part of the picture.

AI and XAI for ATM
'Artificial Intelligence' term was first used in 1956 for the first "Dartmouth Summer Research Project on Artificial Intelligence", and is generally refers to any machine that exhibits traits associated with a human mind, such as learning and problem-solving. Since then, the discipline has known several 'summer' with important interest, and 'winter', disinterest from the field, associated with scepticism [8]. In particular, AI in general has experienced a new bloom during the 2010s, boosted by the increasing access to massive volumes of data, and the discovery of the very high efficiency of computer graphics card processors to accelerate the calculation of learning algorithms [9]. This bloom has been materialised by some significant public successes and has boosted funding, such as Watson-IBM's IA-winning the television game show Jeopardy against two of its champions [10], Google X being able to have an AI recognise cats on videos [11], or later in the decade, AlphaGo-and its successor AlphaGo Zero-beating one of the world players of Go [12]. EXplainable Artificial Intelligence (XAI), methods and techniques enabling humans to understand (i) the AI algorithm (i.e., global explanation or interpretability), or (ii) its solutions(i.e., local explanation or justification), being strongly linked to the systems it explains, followed the same tendency, and is actually in its third generation-according to Muller et al. [13]. Artificial Intelligence in Air Traffic Management roughly followed the same tendencies, with some delay. As the following shows, AI in ATM roughly evolved from AI systems used to optimise the traffic, to AI systems to predict various objects-like predicting 4D trajectories-during the last decade.
Despite historical research work in AI for ATM, a researcher facing a problem in the ATM domain-to our knowledge-will find no general guide presenting how to resolve this problem (or similar ones), nor the limitation of current work, which is detrimental to the domain and its evolution. Some review exist in the domain, but they are specialised into a category of AI algorithms-e.g., meta-heuristics [14,15], multi-agent systems [16]-, focused on other aspects, e.g., communications [17], or are outdated [18], and focus more on the techniques than the integration for the end users.
Unfortunately, despite several research work already carried in AI for ATM domain, it has not been 'fully operational' nor has it brought any benefits to end users. Slow progress within the use of AI in the ATM domain is explainable by the fact that the ATM domain is a critical domain with life at stake, and that safety is the top most priority. Historically, safety has been achieved in ATM with human-in-the-loop-in particular but not restricted to, Air Traffic Controller (ATCO)-, and will most likely, as contend by the authors, evolve by designing tightly human-centered systems, requiring those systems to be understandable by the end-user, and to adapt to its characteristics-mental and physical-and to its psychological state. For example, if the operator's workload is exceeds their cognitive capacity, or if some kind of incapacitation is occurring, their cognitive state could be automatically detected by the system, and used by this assessment to execute actions autonomously along an escalating scale of automation (i.e., adaptive automation) [19,20]. In other domains such as healthcare and criminal justice, among others, the increasing interest in AI to support high-consequence human decisions has spurred the field of XAI and User-Centric eXplainable Artificial Intelligence (UCXAI) [21]-User Centered Design (UCD) refers to the methods employed when designing systems for endusers to validate novel algorithms/working methods/interaction techniques [22][23][24][25]. This primordial aspect is yet to be fully assessed in ATM, but the interest is growing [26,27].
Based on the two previous observations, the goal of this article is to depict the trends of AI and XAI, and set the trajectory that these works must take in order to reach end-users. Our main research questions stem from the two previous observations: • RQ1: What are the current trends of AI and XAI in ATM tasks? • RQ2: What are the limitation that arise from the use of AI and XAI in ATM tasks? • RQ3: How could the general XAI field benefit AI and XAI in ATM? • RQ4: What limitation may arise from the use of general XAI in ATM? • RQ5: What should the trajectory of AI and XAI be for this domain?
To answer these questions, this article is divided into two parts: (i) The first part of this article is dedicated to the review of research work of the last decade of AI in ATM, presenting the methodology employed (Section 2), the extraction of relevant trends and features, the clustering of these work into representative groups (Section 3), and the extraction of representative dimensions, allowing us to create a design space representing those works, used then to analyse the publications (Section 4); (ii) the second part of this article is based on the dimensions extracted in the first part, to analyse general and ATM XAI work, analysing where and why XAI is needed, how it is currently provided, and the limitations (Sections 5 and 6), then synthesise the findings into a conceptual framework (Section 7), that is then applied to different scenarios (Section 8). Finally, we conclude the different findings of this article (Section 9).

Paper Selection
This section provides details for the procedure involved in the selection, inclusion, and exclusion of research articles. The review was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [28]. Overall, the review was performed using different well-ranked conferences and journals, judge representative of the domain, namely, Transportation Research Part C: Emerging Technologies (TR_C) and IEEE Transactions on Intelligent Transportation Systems (IEEE Trans. on ITS) (top two journal on Transportation according to Google Scholar metrics [29]), Journal of Air Transport Management (JATM) (the first ATM journal [29]), and International Conference on Research in Air Transportation (ICRAT) and Air Traffic Management Research and Development Seminar (ATM seminar) (the two ATM conferences supported by EURO-CONTROL and the Federal Aviation Administration). Figure 4 describes the complete pipeline of the paper selection.

Identification
The focus of this research is on English published articles from 2010 until the end of December 2021. The year 2010 has been determined as a starting point of the searching process so as to be able to determine trends and evolution of the vast field from the beginning of the new bloom of AI interest-see Section 1.2-, and be sure to fully capture the essence of the research space. Apart from this, the inclusion and exclusion criteria of this review are shown in Table 1. To be firstly selected (Identification), articles of the different journals and conferences required: (i) To be in the ATM domain, a characteristic that is either obtained by the database used-i.e., ATM specific conferences or journals, or using keywords to filter, in detail, "air traffic" OR "airplane" OR "aircraft", named in the following "ATM_Filter"; and (ii) to work with the AI algorithm, characteristic that was filtered using general regular expression "Predict*" OR "Estimat*" OR "Optimi*" OR "Cluster*" OR "Analy*" OR "Visu*" OR "Learn*" OR "Explain*" OR "Model*" OR "Plan*" OR "Conflict" OR "Classif*". The number of articles identified per source are represented in Figure 4, and details about the keyword filters catches are represented in Table 2. The keywords used were first refined in a preliminary study performed to plan a systematic review, using different keyword-extraction techniques, ATM domain insights from interviews performed, and general author domain knowledge. In a nutshell, these keywords represent main tasks for which AI can be employed in an ATM, along with some specific AI keywords, such as the theory or framework employed-e.g., neural network, genetic algorithm.

Screening
The review process used conferences and journal databases, duplicates came only from external databases from previous work, and forward backward reference searching, hence the low number of exclusion for duplication (29 excluded).
Selected papers were then manually screened for relevance, based on a first superficial reading using an empirical keyword list-containing mostly primary keywords and methods employed-resulting in the exclusion of 383 papers.

Eligibility
The remaining 366 articles were selected for a full-text review and content analysis. For inclusion in the final list, articles must be related with ATM-excluding example articles where drones are used for surveillance-, and AI or XAI. These inclusion criteria resulted in 257 relevant articles. In regard to exclusion criteria, the articles about passenger experience and security were not included in this review, as this was not the core focus of this study. Furthermore, only the articles written in English were considered in this review.

Inclusion
Inclusion rate was highly different depending on the sources of the papers. On one side, papers coming from ATM domain conferences-ICRAT and ATM seminar-only needed to be consistent with the AI field, hence the low exclusion rate at the identification phase-285 out of 383 + 343 = 726, roughly 39 percent, see Figure 4.
On the other side, papers coming from more general journals-TR_C and IEEE Trans. on ITS-needed to be consistent with both the AI and ATM domain, hence the high exclusion rate at the identification phase-(n = 7186), around 98.5 percent. In between, papers coming from the Journal of Air Transport Management targeting ATM and other Aeronautical issues, the exclusion rate was medium-high-(n = 870), 70 percent.
Past this phase, exclusions were mostly in the screening and in the eligibility phases, due to the fact that AI techniques were not used: In the screening phase-n = 383, roughly 51.1 percent-clearly defined in the title/abstract the use of techniques not from the AI field; and in the eligibility phase-n = 109, around 29.8 percent did not meet the criteria. As a result, the overall exclusion rate is a bit low compared to some systematic review, but still high in general-around 97.2 percent.

Paper Clustering
This section provides details about data extraction from the publications, the different statistics on the extracted data, and the clustering of the publications into representative groups.

Data Extraction
The aim of this section is to create an information extraction form to derive out the accurate data from the selected articles. In this step, the relevant data were derived from selected articles through the use of spreadsheets and reference management software.
The following primary features were extracted in this systematic review: Author(s), Publication, Year, Title of the Study, Source Type, Theory or Framework, Objective, and Factors. The description of these items is presented in Table 3. Table 3. Extracted features from primary papers.

Data Description
Author(s) Name of the author(s).

Publication Year
The year of publishing the paper.

Title of the Study
The title of each paper that is visible in the searching step. Source Type Journal, book chapter, and conference proceeding. AI Theory or Framework The AI theory or framework that the study had adopted, e.g., Neural Network (NN).

XAI Theory or Framework
The XAI theory or framework that the study had adopted, e.g., LIME. Objective The main objective of papers.

Factors
The examined factors of the studies, detailed in Table 4.
The extracted factors were then refined in a preliminary study performed to plan the systematic review-and of course adapted if needed in the rest of the study-, using different keyword extraction techniques, ATM domain insights from interviews performed, and general author domain knowledge. The refined extracted factors are presented in Table 4.

First Data Clustering on Additional Extracted Data
The clustering of the paper was performed in two steps. The first step was performed in the preliminary study, and resulted in the additional extracted feature presented in Table 4.
In detail, while extracting the different features of Table 3, it seemed promising, at first, to categorise the publications by the specific part of the ATM world every publication was benefiting to. However, although interesting, the feature was not selective enough to assess any trends or categories in the different works. Nonetheless, it seemed that the main Objective from Table 3, was a promising feature to fully categorise the Design Space of AI in ATM. This clustering was performed by refining the Objective feature by extracting the subject, the time-frame, and any complementary information pieces about the subject from the feature. For example the following Objective "Predict the future location of a general aviation aircraft" from [30], can be cut into "Predict" "the future" "location of a general aircraft". After this quite simple phase, the different extracted data-i.e., subject, time-frame, complementary information pieces-have analysed and clustered into more general categories. In a nutshell, the categories are regrouped into "Object", subdivided into "Complement", and, if any, subdivided into "Sub-Complement". All Object categories except Time-Frame are named in the following "Material Object" and represent the general subject of the objective, namely, "Aircraft", "Traffic", "Airport/Controlled Traffic Region (CTR)", "Airspace", "ATCO", and "Pilot" (the distribution of the publications in function of the object feature is represented in Figure 5). These categories have been created as: (i) They represented a distinct part for practitioners; and (ii) the publications across those categories appeared to have a partial distinction of Objective and employed algorithms and methodologies. Route/Flight Plan The publication focuses on the description of the intended flight.

4D Trajectory
The publication focuses on the description of the actual flight.

Indicators
The publication deals with any descriptor of the traffic, such as time buffer separation, or delay.

Conflict avoidance
The publication deals with the avoidance of separation losses between Aircrafts.

Optimisation
The publication deals with the optimisation of Aircraft trajectories.

Prediction
The publication deals with the prediction of Aircraft trajectories, and their potential interactions.

Simulation
The publication deals with the simulation of Aircraft trajectories, and their potential interactions.

Analysis
The publication deals with the analysis of Aircraft trajectories, and their potential interactions.
Airport/CTR Traffic

State
The publication deals with any descriptor of the state of the airport, e.g., the runway configuration.

Indicators
The publication deals with any descriptor of the ground traffic, e.g., Taxi-Speed, or Estimated Take-Off Time (ESOT), Arrival Runway Occupancy.
"5D" Traffic The publication focuses on the trajectories of the taxiing aircrafts

CTR Traffic
The publication deals with the arrival of aircrafts(e.g., the sequencing of arriving aircraft), the departure, or both (e.g., optimisation of departure and arrival).

State Static Structural State
The publication deals with any descriptor of the state of all or a part of the airspace, e.g., the capacity of a sector, without modifying it.

State of Environment
The publication focuses on the weather, the wind or any other environmental descriptor.

Structure Sector
The publication deals with the structure of the sector(s), e.g., the configuration of the sectors, or their geometrical structure.

Route
The publication deals with the route network structure.

Demand/Capacity Balancing
The publication focuses on the balancing of the demand and capacity.

ATCO
The publications focuses on the Air Traffic COntroller (ATCO).

Pilot
The publication focuses on the Pilot.
Most publications targeting the ATCO or the Pilot focused on predicting or analysing their behaviour and their decisions-the command for ATCO and the flight decisions for the Pilot-, or analysing their audio transmissions. Publications targeting those two categories are less represented than the others (see Figure 6), this most likely comes from the global interest of this category, and possibly from the databases used that are not focusing on this area.
The Airspace category regroups publications that target: (i) The analysis or the prediction of the "State" of the airspace; (ii) optimisation of the airspace "Structure" in function of different criteria; or (iii) the optimisation of the capacity/demand balance. The Airport/CTR Traffic category regroups any publication about the Airport, that either: (i) Predict its state (e.g., opened runway, or their configuration); (ii) model, optimise, automate, or predict the "Ground Traffic", i.e., aircraft(s) taxing; and (iii) model, optimise, or automate the CTR traffic, i.e., Aircraft taking off or landing. While both the Airport and CTR Traffic could be separated, lots of publication are mixing the two traffic, most of the time to optimise the area.
The Aircraft category regroups any publication focusing only on one flying Aircraft, with the publications focused on taxiing aircraft being already regrouped in the Airport/CTR Traffic category. This category regroup publication focusing on: (i) Predicting the state of an Aircraft (e.g., its bearing); and (ii) modelling, predicting, analysing, or optimising its Trajectory. Sub-categories have been created in the Trajectory category, namely, Indicators, Route/Flight Plan, 4D Trajectory, as the intent behind the work-especially for indicators where its is mostly to foresee the said indicator-, or the employed algorithms and methodology change between them.
The Traffic category regroup any publication focusing only on a set of more than one flying Aircraft, any publication focused on a set of more than one aircraft on the ground being already regrouped in the Airport/CTR Traffic category. These publications either focus on: (i) Predicting indicators of the traffic; and (ii) analysing, predicting, modelling, automating, or optimising all traffic.
Within the Aircraft, Traffic, and Airport/CTR categories, two type of clusters have been created regarding publications working on (i) Indicators, measures on intangible object (like a trajectory), such as indicators of the complexity of the traffic, or on (ii) a State, measures on tangible object (like an aircraft), such as the mass of an Aircraft.
The distribution of the article in function on the secondary extracted features from Table 4 is represented in detail in Figure 6.  Table 4. Each rectangle represent a leaf of the table-the lowest level of description between object, complement, and sub-complement-, with the number of article focusing on this feature.

Second Data Clustering on All Extracted Data
The second clustering of the publications have been performed from an analysis on all the extracted features-listed in Tables 3 and 4-and it allowed the extraction of additional knowledge on the general AI in ATM work.
This clustering resulted on the following four categories tightly connected with AI in general, that globally define the purpose of the application: Modelling/Simulation. This last clustering was performed using the previously extracted feature. Among the different extracted features, the Time Frame feature- Table 3-and Objective features- Table 4-proved to be very valuable while categorising the different work- Table A1 in Appendix A details these two features per article. Based on these two features, the different group could be clustered into 3 categories, further refined into 4.
The papers from the first and second categories, namely Prediction and Optimisation/Automation categories, contain paper from the "Pre-Flying" or the "Flying" parts of the Time Frame feature-based on the actual state of the considered "Material Object".
The Prediction category contains paper seeking to foresee the future behaviour of a "Material Object", answering "what if" questions about premises-e.g., "what will be the trajectory of this aircraft, performed by this company, between this city pair". As an example, this category contains papers about the prediction of traffic or prediction of the runway occupancy time. This category is about foreseeing an event, not to be confused with the "prediction" of an AI algorithms on a subject, such as its "prediction" of the labels of an image.
The Optimisation/Automation category contains papers seeking to enhance the behaviour of a "Material Object" according to different criteria, the most represented being "avoiding separation losses between aircraft", and "optimising the traffic". As an example, this category contains papers about sequencing arrival aircraft, avoiding conflict, or optimising a trajectory.
The Analysis category, contains papers focusing mainly on the "Post-Analysis" part and on the "Flying" part of the Time Frame feature. These papers are seeking to understand the observed behaviour of a "Material Object"-see Table 4-answering "why" and "why not" questions about the facts-i.e., what happened-and/or the foil-what is expected or plausible to happen. As an example, this category contains papers about assessing the workload of an Air Traffic Controller (ATCO) in a sector, or evaluating the important factors influencing the arrival of an aircraft.
The Modelling/ Simulation category contains papers that did not apply to any part of the Time Frame feature. Those papers are modelling the behaviour of "Material Object" in order to simulate it, which in the long run could lead to answer "why", "what if", "why not", and "how to" questions depending on the model constructed. As an example, this category contains paper about simulating the air traffic of an airspace, or modelling the arrival of aircraft.

Design Space: Trends of AI in ATM
This section analyses the trends of AI trough the lens of the previously presented Design Space-composed of four categories: Prediction, Optimisation/Automation, Analysis, and Modelling/Simulation-, first from a general point of view, and then from the category point of view.  Figure 9.

General Insights
Through the lens of our categorisation, the growth of the last four years of publications originate from the growing work in the Prediction category-publications about Prediction tripled between 2013 and 2020-and the Optimisation category-doubled between 2013 and 2020. It seems that AI in ATM-and the number of publications-highly benefited from AI community work-researcher work or developer community, e.g., scikit [31], tensorflow [32], keras [33]-of the last decade, that democratised and made AI model generation far more accessible, and in some areas, more effective, in particular in prediction and optimisation. The time window of the review does not allow any previous trend in AI for ATM before 2010. However, previous work and other reviews [14] suggest that publications of AI in ATM has grown in the last decade but had a strong core base for many years, in particular in Optimisation-collision avoidance and traffic flows being arguably one of the most consistent subjects of AI in ATM.   The following section analyses the trends of AI through the lens of the Design Space from the category point of view.

Categorisation Insights
The primary Objective of the model already categorise successfully the different AI models used in ATM. The different models used in the different categories are instead presented in the following, using selected references-please refer to Table 4 for the description of the descriptors.
AI Prediction in ATM is performed using a vast range of AI models, with most used utilised being: (i) Multi-Agent Systems (MAS); (ii) Neural Network (NN); (iii) Random Forest (RF); (iv) Gradient Boosting Machine (GBM); (v) Support Vector Machine (SVM); and (vi) Linear Regression. The five later models-NN, RF, GBM, SVM, and linear regression-are mostly used to predict: (a) A descriptor of the State or an Indicator of the Trajectory of an aircraft, e.g., mass estimation [34], descent length [35], phase of flight [36]; and (b) a Ground Traffic Indicator or a State descriptor of an Airport, e.g., the estimated take-off time [37], or taxi speed [38]. Authors using these models are mostly capitalising on framework availability of the past years and are often used jointly for comparison, with the linear regression often used as a baseline. Nonetheless, (ii)-(v) AI models have also been used for other type of predictions such as Route choice [39], the Structure of a Sector configuration [40], the Environmental State of the Airspace, ATCO action prediction [41], or-short-term-4D Trajectory prediction [42,43]. Multi-Agent Systems on their side have been used to model and predict more complex tasks, like Indicators of the Traffic such as delay propagation on networks [44], 4D Trajectory, and to a certain extent, 5D Traffic prediction [45] and CTR Traffic-CTR Traffic being easier to predict due to the important amount of constraints.
AI Optimisation and Automation works use mostly a more restricted range of AI models: (i) Multi-Agent Systems (MAS); (ii) Evolutionary Algorithm (EA), mostly genetic algorithms; (iii) Simulated Annealing (SA); and (iv) Reinforcement Learning (RL) (e.g., RL, multi-agent RL, Deep RL, and DQN, see Table 5). A majority of these works focus on optimising the traffic and/or avoiding collisions, from the point of view of the trajectories. 5D Traffic Optimisation works focus in general on one flight phase, such as optimising (a) En-Route traffic, using centralised, i.e., SA [46] or EA [47], or decentralised, i.e., MAS [48,49]; (b) arrival traffic [50]; (c) departure traffic [51]; (d) Ground Traffic ("5D" Traffic) Optimisation-although this traffic is not really using altitude, it is still noted 5D Traffic in the categorisation to avoid confusion between traffic and trajectory-; or (e) the whole CTR Traffic [52]. Notable other focus of AI model for optimisation are optimising Airspace Structure, such as Route network [53], Sectors [54], and Optimising a 4D Trajectory [55], optimising the Demand/Capacity Balance, or the Route of an Aircraft.
Analysis of ATM activities using an AI model is mostly composed of: (i) Techniques that clusterise-e.g., DBSCAN, BIRCH, or auto-encoder NN-4D Trajectories in order to analyse the different factor influencing the 4D Trajectory, the Route choice [56], the CTR Traffic, in particular the arrivals [57], but also Traffic Indicators such as delays [58], in order to understand the different influencing factors and/or as a first analysis to latter use in a predictive model; or (ii) more precise analysis, such as trajectory analysis to detect ATCOs action [59], or speech recognition and analysis or utterance of ATCOs [60] or a Pilot.

Validating the Design Space
While the Design Space presented previously seems to seize the global structure of the articles of AI in ATM from our data set, one may wonder if: As the Design Space was generated from an analysis of a representative data set of articles, the Design Space should cover all the AI in ATM work of the last decade. Furthermore, as the Design Space focuses more on the abstract task which make, in the end, quite general-i.e., predict the future state of an "object" , understand the actual state of an "object" , automate or optimise the behaviour or the state of an "object", and simulate an "object".
In terms of overlapping, some boundaries between the categories might seem blurry, but they are in fact quite distinct. The Optimisation/Automation category is quite distinct from the other 3 categories, and this distinction is also noticeable from the different algorithms used-in particular the use of Meta-Heuristics. The boundary between the Prediction category with the Modelling/Simulation category might be seen as blurry as the historic of a realistic simulation-e.g., the trajectory performed by the aircraft during the simulation-can be used as a prediction. Nonetheless, the distinction comes from the fact that the historic is only a side-effect of the technique, and not its main purpose, which is to simulate an "object" in a virtual environment. Similarly, the borders of the Analysis category with the Modelling/Simulation category could seem permeable, as a simulation can help to understand a past situation(e.g., understanding a collision alert afterward), but this would be the result of the analysis of an end-user. The analysis would result easier for the end-user with AI, but it would still be made by the human and not by the system. Finally, the Analysis algorithms are often used as an input for Prediction algorithms, but both fulfil very distinct ATM tasks.
Looking into the future and the sustainability of the Design Space, as the category focus on abstract tasks on "object", the Design Space should be valid for at least the next decade. The "object" might change, e.g., the aircraft feature from Table 4 might be divided into different new sub features but the abstract task will not remain. Nonetheless, new categories could appear, and sub-categories could be created.

Analysing General XAI
The concept of XAI is comparatively contemporary to other concepts of AI and the term "explainability" is deliberately used in various research domains including AI. Thus, it is expedient to develop a general understanding of the term explainability from the perspective of XAI. The prime hindrance towards developing the ground knowledge of explainability concerning AI, is the interchangeable use of several terms in the literature, such as: Interpretability, transparency, explainability, etc. Before proceeding to the literature review, the commonly used terms are presented briefly according to the definitions compiled by Barredo Arrieta et al. [261].
"Understandability", often termed as "Intelligibility", is the characteristics of a model that helps a user realise its functions. In other words, how the model works without any requirement of further explanation for the model's internal operations on the data. Another similar term is "Comprehensibility", which has been used to define the ability of an ML model to represent its learned knowledge to humans in an understandable way. Clearly the prior terms differ from the second on representing the internal operations on the data and the knowledge acquired from the data. In addition, the terms "Interpretability" and "Transparency" are mostly used in describing similar concepts to explainability and they refer to a model's ability to provide meaning or explain in an understandable way to human beings. Nonetheless, model transparency also indicates the ability to be understandable to humans. There are three types of transparent models [262]: • Simulatable models have the capacity to make humans understand their structure and functioning entirely. • Decomposable models can be decomposed into individual components, i.e., input, parameters and output, and their respective intuitions. • Algorithmically Transparent models behave "sensibly" in general with some degree of confidence.
Above all, the term "Explainability" affiliates the interface between humans and decision-makers, which is concurrently comprehensible to humans and an accurate representation of the decision-maker [263]. In XAI, explainability is the interface between the models and the end-users through which an end-user gains clarifications on the decisions from an AI/ML model.
The AI/ML models learn the underlying characteristics of the available data and subsequently try to classify, predict, or cluster new data. The stage of explainability refers to the period in the process mentioned above in which a model generates the explanation for the decision it provides. The stages are found to be ante-hoc and post-hoc [264]. Brief descriptions of the categorised methods based on these stages are: • Ante-hoc methods generally consider generating the explanation for the decision from the very beginning of the training on the data while aiming to achieve the optimal performance. Mostly, explanations are generated using these methods for transparent models, such as, Fuzzy models, Tree-based models, etc. • Post-hoc methods comprise an external or surrogate model and the base model. The base model remain unchanged, and the external model mimics the base model's behaviour to generate an explanation for the users. Generally, these methods are associated with the models where the inference mechanism remains unknown to users, e.g., Support Vector Machines, Neural Networks, etc. Moreover, the post-hoc methods are again divided into two categories: Model-agnostic and model-specific. The model-agnostic methods are applicable to any AI/ML model, whereas the modelspecific methods are confined to particular models.
The scope of explainability defines the extent of an explanation produced by some explainable methods. Vilone and Longo deduced after scanning more than 200 scientific articles published on XAI that the scope of explainability can be either global or local [264]. That is, the whole inferential technique of a model can be made transparent or comprehensible to the user by a full decision tree (global) vs. only a single instance of inference can be explicitly presented to the user (local) e.g., a single branch can be termed as a local explanation.

XAI in Terms of Design Space
The prime reason of adding explanation to the dimensions of the design space defined in Section 4 ( Prediction, Optimisation/Automation, Analysis, and Mod-elling/Simulation), is to increase trust of humans on the decision making process of AI/ML models [265] and improve human decisions or predictions based on the models [266]. Moreover, human decisions can be often driven by biases and heuristics that may present some limitations under certain working conditions [267]. However, abundant relevant information also does not necessarily assist people in making proper decisions. In human decision-making for the stated dimensions, the heuristics and biases are mostly controlled by statistical constraints [268]. Regardless of the domain expertise, people generally can not produce fully optimal decisions [269]. On the contrary, XAI can be beneficial to human decision-making by increasing the trust on the automated systems. This can be investigated through human experiments testing AI systems with explanations compared with traditional AI systems alone for different dimensions of the design space.
Literature indicates that mostly four different forms of explanations are generated to "explain" to the human the decisions of the AI/ML models as well as the process of deducing a decision for the tasks from different dimensions of the design space. Those four forms of explanations have the following format: Numeric, rules, textual, and visual explanations. Some researches [270][271][272] investigated the use of a combination of formats to make the explanation more understandable and user friendly. The available methods for adding explainability of these forms to the existing and proposed AI/ML models are clustered on the basis of the categories of tasks; prediction, optimisation/automation, analysis, and modelling/simulation from the publications on AI in ATM. In addition, the available explainable methods are grouped based on the explanation generation scope and stages. The summary of the clustering is represented in Table 6. The bottom row of the table presents the total number of articles that contribute to each of the components of explainability and categories of tasks from the design space. It is evident that most of the methods were deployed to add explainability to prediction tasks whereas the tasks of modelling/simulation were investigated least with a view to add explanations. On the other hand, for specific methods, the Adaptive Neuro-Fuzzy Inference System (ANFIS) [273] was observed to be utilised to generate all four types of explanations in the tasks containing optimisation/automation and analysis. However, Sequential Rule Mining (SRM) [274] algorithms were found to be exploited in all task categories of design space except prediction. Among other methods available for adding explainability to intelligent systems, Local Interpretable Model-Agnostic Explanations (LIME) [270,272,275,276] and Shapley Additive Explanations (SHAP) [21,271,272,277] are worth mentioning due to their wide acceptability among researchers.

Analysing XAI in ATM
Globally, the explainability of the AI algorithms used by the different works reviewed is barely addressed.
In our review, the algorithm belonging to the Prediction category is the only one where explainability is approached clearly even if it is secondary to the papers main goals and it was restricted mainly to predicting an indicator of the trajectory, i.e., the landing time [138] and take-off time [37]. The main goal of Xie et al. [27] was to explain the results of their risk of incidents and accidents predictive model. Although those AI algorithms are already useful to the ATM community, fully understanding the underlining reasons of congestion, trajectory routes, and delay, i.e., answering "why" and "why not" questions, is more than required to better enhance latter traffic, or more simply better predict it. Predicting the traffic and its delays is one key to enhance the general traffic, its congestion, and better balance the demand and resources, so more research should focus on this topic. Additionally, another example of the need for explainability for those algorithms would be to predict the future behaviour of those algorithm with other features, or in a new environment-e.g., transposing the landing time prediction model from one airport to another. Moreover, being able to efficiently and effectively alert the ATM users on the prediction result and explain the reasons behind the alert, has proven to be key for not only building trust in the system. For example, alerting the user that the prediction result might be false (because of the class-imbalanced data set for that class) or alerting a possible lack of feature provided to perform the analysis (e.g., an over simplification of the trajectories), are all "explanations" that need to be communicated in a certain way, during a certain time to the final human operator, otherwise the advantages of automation support would not be considered [327].
Unfortunately-as shown in the previous section-AI algorithms used to Optimise and Automate are not the main target of the general explainable AI community, and has not been further studied in the ATM field. Nonetheless, optimising and/or automating the general traffic and avoiding collision is one key-arguably the most important-to enhance general traffic and its safety. Due to the upmost importance of safety in ATM, fully understanding the underlining reasons of conflict avoidance procedures(e.g., explaining why one aircraft is moved away from its planned trajectory and not another), sequencing, or any other optimisation result, is more than required to be accepted and used by human operators such as ATCO. Going further, it would be interesting to predict how the AI algorithm would perform in a different environment-e.g., how it would organise the traffic when a part of the airspace changes and becomes not available (e.g., military zones). Furthermore, alerting on a possible bad solution, for example alerting on a possible bias from the training, could also prove to be essential to the acceptance of the algorithm.
In our review, AI algorithms used to Analyse did not receive any effort to explain the result or the algorithm, despite some algorithms being strongly addressed by the general XAI community-see Section 5. Adding explainability could help the end-user understand the classification performed by the algorithm (e.g., the different factors influencing this analysis), and help one to understand also how this analysis could evolve with new parameters or data, and how to modify the analysed "Material Object" to obtain the desired behaviour. Additionally, it would be interesting to be able to predict the change of the classes if a new feature is added, or predict the evolution of classes with a modification of the problem structure-e.g., predict the evolution of the flows of trajectories with a new route structure.
Similarly, Modelisation and Simulation AI algorithms were not found to be explained either. In a sense, AI modelling most of the time start from the underlying reasons motivating actions of the different actors taking part in the simulated world. This is particularly true with MAS systems that are dominant in this category, where MAS designers try to represent the different entities in the system, their actions and reasoning, allowing a global state to emerge for agent interactions. Nonetheless, explainability could be added to explain some emerging behaviours instead of the local behaviour(e.g., delay propagation), with a valuable impact to global understanding.
In the current state, the effectiveness and acceptability of AI algorithms in ATM will be limited by the machine's inability to explain its thoughts and actions to human users in these critical situations, and fully understand the needs and desires of the end-user. Finally, the following section formalises the lack of explainability using a conceptual framework to further detail the need and direction that must be taken in order to tackle the explainability problem in ATM.

XAI in ATM Synthesis
The ATM domain being a critical one, with life at stake, the effectiveness of AI systems will be limited by the machine's inability to explain its thoughts and actions to human users in these critical situations, and fully understand the needs and desire of the end-user. As seen in previous sections, the third wave of XAI is yet to be at a general XAI level. Nonetheless, as contended by the authors, actual general XAI will not be totally sufficient for AI to reach end-users as ATCO.
Indeed, developed XAI systems are more directed to the developer or the debugger than the final end-user [13]. Although some XAI are focused on an explanation that might be presented to a non-developer [270,328], little justification is provided for choosing different explanation types or representations, and it is unclear why these explanations will be feasibly useful to actual users or simply understood [329]. Already existing formal psychological theories that are greatly summarized for XAI in [330][331][332][333], are poorly used to guide explanations facilities, as argued in [13,21]. The last concern is essential to move towards human-centric AI since it is essential to understand how humans think as well as being able to adapt to different ways of thinking. A second or parallel step can be to understand what information they seek and the biases that impair their reasoning, so as to understand what reasoning method triggers actual XAI facilities [334], and how XAI can be leveraged to mitigate decision biases.
Furthermore, explanation is both a product and a process, in particular a social process [335]. XAI systems are required to fully understand the user, which means to adapt to the one that receives the explanation [21,336]. This is crucial to determine the explanation requirements for a given problem, and understand the 'why' behind user actions [337]. Furthermore, understanding is required to adapt to its socio-technical environment since the AI user will interact with other humans outside of the 1-1 human-computer interaction, and thus trust should be transitive to them [334]. Lastly, the systems also need to understand the user, which means it has to be able to interact with the user, which here is beneficial in both ways, i.e., human understanding the machine and machine understanding the human. In order to adapt while the XAI systems is in use and not only during the development process to enhance explainability, the XAI systems must be able to adapt to the user and provide information which is not only about the internal state of the AI [338]. The previous paragraphs about general XAI-and thus Section 5-and Section 6, let emerge user-centric XAI requirements for ATM that can be synthesised by the three following XAI: • Descriptive XAI, any XAI that describe an AI algorithm, or its outputs. • Predictive XAI, any XAI that predict the behaviour of an AI algorithm to a certain input or system modification. • Prescriptive XAI, any XAI that detect errors or an unwanted behaviour of an AI algorithm and prescribe a way to overcome it.
Descriptive XAI is particularly covered by actual general XAI state of the art, and is required by the end-user to understand the machine. Predictive XAI is required by the end-user to interact and to ask counterfactual 'what if' or 'why not' questions to the AI system-must it be on its internal behaviour or its output-, on a more semantically accessible level. This level requires the description level-as shown in Figure 10-to be intelligible by the end user. Finally, the prescriptive XAI is required by the end-user expressing to the machine, allowing 'how to' questions to be asked, and overcoming errors and unwanted behaviours. The prescriptive XAI requires the predictive level to analyse different outcomes and prescribe adequate modifications to the AI system, and the descriptive level to be intelligible by the end user.
We illustrate these different types of XAI in the following section on a selected scenario about conflict avoidance. Figure 10. Synthesis of EXplainable Artificial Intelligence (XAI) conceptual framework.

Scenarios XAI in ATM
In this section, we present scenarios and benefits of our XAI framework within aviation concrete examples. It is assumed that safety within the context of civil aviation is taken by default of high importance during our scenario presentation.
Let us assume that the fully functional XAI is deployed in 2030 within the aviation domain and is being used by all stakeholders including an Air Traffic Controller and Management. We assume further that at point C (see Figure 11), aircraft A 1 and A 2 will go below the minimum safety distance in X minutes if no actions are taken by the pilots. ATC gets a notification that actions need to be taken-e.g., safety warning with blinking notifications on the radar screen. Let us further assume that the controller in charge changed the flight level of the Aircrafts such that this dangerous situation is avoided. However, due to change in the flight level or flight path, Aircraft A 1 will reach the final destination during peak landing hour, which increases the congestion and holding time of several aircraft, which will drastically impact the take-off/landings sequences.
The situation leads to congested airspace inducing stress and a high workload of several stakeholders (e.g., ATC, Pilots, Airline's ground staff, or other stakeholders), degrading the landing and take-off performance of several aircraft-e.g., delay, cost.
In the above scenario, based on the inputs from all interconnected complex systems, any safety events are identified well in advance (SAFETY PREDICTION) such that all actors-including ATC/pilots-can take appropriate actions.
In 2030, when AI modules are induced within the complex Air Traffic Management System, algorithm transparency and explanation should be able to appropriately provide all stakeholders, including ATC retrieve, the following three major pieces of information: (1) Descriptive XAI: The system should be able to provide to all users the detailed description and rational of the action to be taken. In the above example, the XAI should be able to provide information on why there is need of change in the flight plan which is due to a possible collision risk. Or it should be able to provide information on potential congestion in airspace, during take-off or landing, which will help to optimise among other things the efficiency of the whole systems and stakeholders in addition to the avoidance of safety catastrophe.
(2) Predictive XAI: In the above example, the XAI should be able to determine the 'what if' conditions or in other words, provide information to all stakeholders what will be the consequences of the actions that will be taken. The XAI in the above case should be able to provide information to ATC so that if the ATC performs certain actions to avoid collision, then it will lead to congestion in the airport due to their actions. This will help and support the ATC including other stakeholders understand the consequences of certain actions, 'what if I perform this action'. (3) Prescriptive XAI: The induced AI functions will, in addition to the above information, be able to suggest/propose the appropriate actions and options along with an appropriate explanation such that stakeholders can decide on the next course of actions. This next course of action will be based on safety criteria as a main contributor but will also take other appropriate considerations like congestion, weather information, induced workload to ATC, pilots taking into account human factors, cost benefits, and environmental benefits to name the few. In the above scenario, the user can use the XAI prediction to assess the efficiency of potential actions-'what if'. XAI prescription will provide sufficient information to enable the user with immediate action to perform without testing them. For instance, the 'what if' function shows that aircraft A will induce high delay in the landing sequence, and XAI prescription will provide the immediate solution which will address this issue-i.e., providing a different solution for conflict resolution with a change of path rather than a change of altitude.

Setting the Trajectory of AI and XAI in ATM
This paper presented a systematic review of the literature regarding Artificial Intelligence (AI) and EXplainable Artificial Intelligence (XAI) in Air Traffic Management (ATM). The review found that there is a wide range of ATM tasks that have been considered (Sections 2 and 3). Some of these applications include, but are not limited to, tasks related to AirSpace Management (ASM) (e.g., optimising the structure of the sectors), tasks related to Air Traffic Flow and Capacity Management (ATFCM) (e.g., predicting the network delay), and tasks related to Air Traffic Controller (ATCO) (e.g., conflict detection and avoidance).
This application space has been clustered by a iterative process into four categories (Sections 3 and 4), namely (i) the "Prediction" category, papers seeking to foresee the future behaviour of a "Material Object", see Table 4, (ii) the "Optimisation/Automation" category, papers seeking to enhance the behaviour of a "Material Object", (iii) the "Analysis" category, papers seeking to understand the observed behaviour of a "Material Object", and finally (iv) the "Modelling/Simulation" category, paper modelling the behaviour of a "Material Object" to simulate it.
These four categories that constitute the author Design Space of AI in ATM have then be used to analyse current AI in ATM work (Section 4). As a summary, the main challenges for the Optimisation/Automation category is to reach end users that are mostly performing high risk/time pressure tasks (since, most of those works optimise actual traffic). MAS and RL could particularly be of great interest for potential UAV uses. Modelling could also greatly benefit from using MAS more often in order to enhance the reality of the simulations. As for the Prediction category, the main opportunity lies in enhancing their models and embracing the complexity of air traffic. Finally, works from the Analysis category would benefit from more transparency toward its users.
The Design Space has then be used to analyse XAI in general and XAI in ATM (Sections 5 and 6). It has shown that AI in ATM still needs to apply XAI to its algorithm in order to reach end users, as nearly no article were focusing and explaining their results, and that XAI for ATM should move toward a more user-centric design, where both the AI system and the end-user can understand and interact with each other thanks to this user-centric XAI (UCXAI).
This analysis of XAI in ATM through the lens of the Design Space has presented some common points between the four categories and have then been summarised into a conceptual framework composed of three interconnected XAI levels required in ATM, namely, (i) "Descriptive" XAI, XAI system that describes an AI algorithm, or its outputs, (ii) "Predictive" XAI, that predict the behaviour of an AI algorithm to a certain input or AI system modification, and (iii) "Prescriptive"XAI, that detect errors or an unwanted behaviour of an AI algorithm and prescribes a way to overcome it.
The current investigation highlights that general XAI are focused on the Descriptive level, although some research provides Predictive characteristics-mostly the sensibility of the prediction to some variables. However, more research on XAI for the predictive and prescriptive levels within ATM are required to grab the potential value that XAI in general can bring to the aviation community.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Principal Extracted Features from Articles
We present in this annex the principal extracted features from all publications (i.e., the material object and time frame feature from Table 4), along with their title and reference.          Leveraging local ADS-B transmissions to assess the performance of air traffic at general aviation airports Traffic, Traffic Indicator Post-Analysis [196] Modelling and Detecting Anomalous Safety Events in Approach Flights Using ADS-B Data CTR, CTR Traffic Post-Analysis  [257] Prediction and extraction of tower controller commands for speech recognition applications ATCO Flying [134] Estimating the impact of COVID-19 on air travel in the medium and long term using neural network and Monte Carlo simulation Traffic, Traffic Indicator Pre-Flying [169] Ratio-based design hour determination for airport passenger terminal facilities Airport, Airport State Pre-Flying [207] An integrated SWOT-based fuzzy AHP and fuzzy MARCOS methodology for digital transformation strategy analysis in airline industry Traffic, Traffic Indicator Pre-Flying [251] Artificial neural network models for airport capacity prediction Airport, Airport State Pre-Flying