Automatic Speech Recognition and Understanding in Air Traffic Management

A special issue of Aerospace (ISSN 2226-4310). This special issue belongs to the section "Air Traffic and Transportation".

Deadline for manuscript submissions: closed (1 November 2023) | Viewed by 24421

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor
Institute of Flight Guidance, Department of Controller Assistance, German Aerospace Center (DLR), 38108 Braunschweig, Germany
Interests: automatic speech recognition; ATM; ASR; air traffic control; speech recognition; machine learning

E-Mail Website
Guest Editor
Institute of Flight Guidance, Department of Controller Assistance, German Aerospace Center (DLR), 38108 Braunschweig, Germany
Interests: air traffic control; controller working position; multimodal interaction; speech understanding; eye tracking; gesture recognition

Special Issue Information

Dear Colleagues,

Since Alexa, OK Google and Siri at the latest, voice recognition has become part of everyday life. It does not only allow us to keep our hands free, when we speak a new address into the navigation system, but it can also reduce workload of air traffic controllers (ATCo) and increases air traffic management (ATM) safety.

Voice communication between ATCos and pilots using radio equipment is still widely used in air traffic control (ATC). The ATCo issues verbal commands to the cockpit crew. Whenever the information from voice communication has to be digitized, ATCos are burdened to enter the information – that has already been uttered – manually. Research results show that up to one third of the working time of controllers is spent on these manual inputs. Radar label maintenance is one application of automatic speech recognition and understanding (ASRU). Long known is the support of simulation pilots by ASRU. Another area is, e.g., offline evaluation of historic ATCo-pilot communication to answer questions like:

  • How often do ATCos deviate from standard phraseology?
  • How many weather reports of pilots are digitized?
  • How often does an ATCo or pilot request “say again”?
  • How many visual approaches were flown?

The king’s discipline of ASRU, however, is automatic readback error detection, because noisy pilot utterances need to be recognized and understood semantically.

These are only some applications of automatic speech recognition and understanding in ATM, which have been implemented during the last decade in Europe, US, and also in Asia. Researchers are encouraged to publish their latest results. Special focus is on work, which really aims to bring ASRU from the lab environment to the operational environment in the ATC centers in Europe and in the US. 

Prof. Dr. Hartmut Helmke
Dr. Oliver Ohneiser
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Aerospace is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • automatic Speech Recognition
  • automatic Speech Understanding
  • ontology
  • air traffic management
  • machine learning
  • readback error detection
  • safety
  • certification

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 1801 KiB  
Article
Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data
by Mohammed Saïd Kasttet, Abdelouahid Lyhyaoui, Douae Zbakh, Adil Aramja and Abderazzek Kachkari
Aerospace 2024, 11(1), 32; https://doi.org/10.3390/aerospace11010032 - 29 Dec 2023
Cited by 1 | Viewed by 1179
Abstract
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed [...] Read more.
Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed and accuracy that can often achieve a Word Error Rate (WER) below 10%. These toolkits can nowadays be deployed, for instance, within aircraft cockpits and Air Traffic Control (ATC) systems in order to identify aircraft and display recognized voice messages related to flight data, especially for airports not equipped with radar. Hence, the performance of air traffic controllers and pilots can ultimately be improved by reducing workload and stress and enforcing safety standards. Our experiment conducted at Tangier’s International Airport ATC aimed to build an ASR model that is able to recognize aircraft call signs in a fast and accurate way. The acoustic and linguistic models were trained on the Ibn Battouta Speech Corpus (IBSC), resulting in an unprecedented speech dataset with approved transcription that includes real weather aerodrome observation data and flight information with a call sign captured by an ADS-B receiver. All of these data were synchronized with voice recordings in a structured format. We calculated the WER to evaluate the model’s accuracy and compared different methods of dataset training for model building and adaptation. Despite the high interference in the VHF radio communication channel and fast-speaking conditions that increased the WER level to 20%, our standalone and low-cost ASR system with a trained RNN model, supported by the Deep Speech toolkit, was able to achieve call sign detection rate scores up to 96% in air traffic controller messages and 90% in pilot messages while displaying related flight information from ADS-B data using the Fuzzy string-matching algorithm. Full article
Show Figures

Figure 1

16 pages, 4294 KiB  
Article
Analyzing Multi-Mode Fatigue Information from Speech and Gaze Data from Air Traffic Controllers
by Lin Xu, Shanxiu Ma, Zhiyuan Shen, Shiyu Huang and Ying Nan
Aerospace 2024, 11(1), 15; https://doi.org/10.3390/aerospace11010015 - 24 Dec 2023
Viewed by 955
Abstract
In order to determine the fatigue state of air traffic controllers from air talk, an algorithm is proposed for discriminating the fatigue state of controllers based on applying multi-speech feature fusion to voice data using a Fuzzy Support Vector Machine (FSVM). To supplement [...] Read more.
In order to determine the fatigue state of air traffic controllers from air talk, an algorithm is proposed for discriminating the fatigue state of controllers based on applying multi-speech feature fusion to voice data using a Fuzzy Support Vector Machine (FSVM). To supplement the basis for discrimination, we also extracted eye-fatigue-state discrimination features based on Percentage of Eyelid Closure Duration (PERCLOS) eye data. To merge the two classes of discrimination results, a new controller fatigue-state evaluation index based on the entropy weight method is proposed, based on a decision-level fusion of fatigue discrimination results for speech and the eyes. The experimental results show that the fatigue-state recognition accuracy rate was 86.0% for the fatigue state evaluation index, which was 3.5% and 2.2%higher than those for speech and eye assessments, respectively. The comprehensive fatigue evaluation index provides important reference values for controller scheduling and mental-state evaluations. Full article
Show Figures

Figure 1

23 pages, 2780 KiB  
Article
Ensuring Safety for Artificial-Intelligence-Based Automatic Speech Recognition in Air Traffic Control Environment
by Ella Pinska-Chauvin, Hartmut Helmke, Jelena Dokic, Petri Hartikainen, Oliver Ohneiser and Raquel García Lasheras
Aerospace 2023, 10(11), 941; https://doi.org/10.3390/aerospace10110941 - 3 Nov 2023
Cited by 1 | Viewed by 2094
Abstract
This paper describes the safety assessment conducted in SESAR2020 project PJ.10-W2-96 ASR on automatic speech recognition (ASR) technology implemented for air traffic control (ATC) centers. ASR already now enables the automatic recognition of aircraft callsigns and various ATC commands including command types based [...] Read more.
This paper describes the safety assessment conducted in SESAR2020 project PJ.10-W2-96 ASR on automatic speech recognition (ASR) technology implemented for air traffic control (ATC) centers. ASR already now enables the automatic recognition of aircraft callsigns and various ATC commands including command types based on controller–pilot voice communications for presentation at the controller working position. The presented safety assessment process consists of defining design requirements for ASR technology application in normal, abnormal, and degraded modes of ATC operations. A total of eight functional hazards were identified based on the analysis of four use cases. The safety assessment was supported by top-down and bottom-up modelling and analysis of the causes of hazards to derive system design requirements for the purposes of mitigating the hazards. Assessment of achieving the specified design requirements was supported by evidence generated from two real-time simulations with pre-industrial ASR prototypes in approach and en-route operational environments. The simulations, focusing especially on the safety aspects of ASR application, also validated the hypotheses that ASR reduces controllers’ workload and increases situational awareness. The missing validation element, i.e., an analysis of the safety effects of ASR in ATC, is the focus of this paper. As a result of the safety assessment activities, mitigations were derived for each hazard, demonstrating that the use of ASR does not increase safety risks and is, therefore, ready for industrialization. Full article
Show Figures

Figure 1

33 pages, 2312 KiB  
Article
Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding
by Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault and Khalid Choukri
Aerospace 2023, 10(10), 898; https://doi.org/10.3390/aerospace10100898 - 20 Oct 2023
Cited by 4 | Viewed by 2862
Abstract
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at [...] Read more.
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts aim at integrating artificial intelligence (AI) into ATC communications in order to lessen ATCos’s workload. However, the development of data-driven AI systems for understanding of spoken ATC communications demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, which aimed to develop an unique platform to collect, preprocess, and transcribe large amounts of ATC audio data from airspace in real time. This paper reviews (i) robust automatic speech recognition (ASR), (ii) natural language processing, (iii) English language identification, and (iv) contextual ASR biasing with surveillance data. The pipeline developed during the ATCO2 project, along with the open-sourcing of its data, encourages research in the ATC field, while the full corpus can be purchased through ELDA. ATCO2 corpora is suitable for developing ASR systems when little or near to no ATC audio transcribed data are available. For instance, the proposed ASR system trained with ATCO2 reaches as low as 17.9% WER on public ATC datasets which is 6.6% absolute WER better than with “out-of-domain” but gold transcriptions. Finally, the release of 5000 h of ASR transcribed speech—covering more than 10 airports worldwide—is a step forward towards more robust automatic speech understanding systems for ATC communications. Full article
Show Figures

Figure 1

14 pages, 433 KiB  
Article
An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain
by Driss Khalil, Amrutha Prasad, Petr Motlicek, Juan Zuluaga-Gomez, Iuliia Nigmatulina, Srikanth Madikeri and Christof Schuepbach
Aerospace 2023, 10(10), 876; https://doi.org/10.3390/aerospace10100876 - 10 Oct 2023
Cited by 2 | Viewed by 1195
Abstract
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications—air traffic controller (ATCo) and pilot—are usually transmitted in a single channel, which poses a challenge when developing automatic systems for air [...] Read more.
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications—air traffic controller (ATCo) and pilot—are usually transmitted in a single channel, which poses a challenge when developing automatic systems for air traffic management. Speaker clustering is one of the challenges when applying speech processing algorithms to identify and group the same speaker among different speakers. We propose a pipeline that deploys (i) speech activity detection (SAD) to identify speech segments, (ii) an automatic speech recognition system to generate the text for audio segments, (iii) text-based speaker role classification to detect the role of the speaker—ATCo or pilot in our case—and (iv) unsupervised speaker clustering to create a cluster of each individual pilot speaker from the obtained speech utterances. The speech segments obtained by SAD are input into an automatic speech recognition (ASR) engine to generate the automatic English transcripts. The speaker role classification system takes the transcript as input and uses it to determine whether the speech was from the ATCo or the pilot. As the main goal of this project is to group the speakers in pilot communication, only pilot data acquired from the classification system is employed. We present a method for separating the speech parts of pilots into different clusters based on the speaker’s voice using agglomerative hierarchical clustering (AHC). The performance of the speaker role classification and speaker clustering is evaluated on two publicly available datasets: the ATCO2 corpus and the Linguistic Data Consortium Air Traffic Control Corpus (LDC-ATCC). Since the pilots’ real identities are unknown, the ground truth is generated based on logical hypotheses regarding the creation of each dataset, timing information, and the information extracted from associated callsigns. In the case of speaker clustering, the proposed algorithm achieves an accuracy of 70% on the LDC-ATCC dataset and 50% on the more noisy ATCO2 dataset. Full article
Show Figures

Figure 1

0 pages, 4178 KiB  
Article
In-Vehicle Speech Recognition for Voice-Driven UAV Control in a Collaborative Environment of MAV and UAV
by Jeong-Sik Park and Na Geng
Aerospace 2023, 10(10), 841; https://doi.org/10.3390/aerospace10100841 - 27 Sep 2023
Cited by 2 | Viewed by 936
Abstract
Most conventional speech recognition systems have mainly concentrated on voice-driven control of personal user devices such as smartphones. Therefore, a speech recognition system used in a special environment needs to be developed in consideration of the environment. In this study, a speech recognition [...] Read more.
Most conventional speech recognition systems have mainly concentrated on voice-driven control of personal user devices such as smartphones. Therefore, a speech recognition system used in a special environment needs to be developed in consideration of the environment. In this study, a speech recognition framework for voice-driven control of unmanned aerial vehicles (UAVs) is proposed in a collaborative environment between manned aerial vehicles (MAVs) and UAVs, where multiple MAVs and UAVs fly together, and pilots on board MAVs control multiple UAVs with their voices. Standard speech recognition systems consist of several modules, including front-end, recognition, and post-processing. Among them, this study focuses on recognition and post-processing modules in terms of in-vehicle speech recognition. In order to stably control UAVs via voice, it is necessary to handle the environmental conditions of the UAVs carefully. First, we define control commands that the MAV pilot delivers to UAVs and construct training data. Next, for the recognition module, we investigate an acoustic model suitable for the characteristics of the UAV control commands and the UAV system with hardware resource constraints. Finally, two approaches are proposed for post-processing: grammar network-based syntax analysis and transaction-based semantic analysis. For evaluation, we developed a speech recognition system in a collaborative simulation environment between a MAV and an UAV and successfully verified the validity of each module. As a result of recognition experiments of connected words consisting of two to five words, the recognition rates of hidden Markov model (HMM) and deep neural network (DNN)-based acoustic models were 98.2% and 98.4%, respectively. However, in terms of computational amount, the HMM model was about 100 times more efficient than DNN. In addition, the relative improvement in error rate with the proposed post-processing was about 65%. Full article
Show Figures

Figure 1

37 pages, 3792 KiB  
Article
Safety Aspects of Supporting Apron Controllers with Automatic Speech Recognition and Understanding Integrated into an Advanced Surface Movement Guidance and Control System
by Matthias Kleinert, Oliver Ohneiser, Hartmut Helmke, Shruthi Shetty, Heiko Ehr, Mathias Maier, Susanne Schacht and Hanno Wiese
Aerospace 2023, 10(7), 596; https://doi.org/10.3390/aerospace10070596 - 29 Jun 2023
Cited by 4 | Viewed by 1325
Abstract
The information air traffic controllers (ATCos) communicate via radio telephony is valuable for digital assistants to provide additional safety. Yet, ATCos have to enter this information manually. Assistant-based speech recognition (ABSR) has proven to be a lightweight technology that automatically extracts and successfully [...] Read more.
The information air traffic controllers (ATCos) communicate via radio telephony is valuable for digital assistants to provide additional safety. Yet, ATCos have to enter this information manually. Assistant-based speech recognition (ABSR) has proven to be a lightweight technology that automatically extracts and successfully feeds the content of ATC communication into digital systems without additional human effort. This article explains how ABSR can be integrated into an advanced surface movement guidance and control system (A-SMGCS). The described validations were performed in the complex apron simulation training environment of Frankfurt Airport with 14 apron controllers in a human-in-the-loop simulation in summer 2022. The integration significantly reduces the workload of controllers and increases safety as well as overall performance. Based on a word error rate of 3.1%, the command recognition rate was 91.8% with a callsign recognition rate of 97.4%. This performance was enabled by the integration of A-SMGCS and ABSR: the command recognition rate improves by more than 15% absolute by considering A-SMGCS data in ABSR. Full article
Show Figures

Figure 1

42 pages, 9066 KiB  
Article
Assistant Based Speech Recognition Support for Air Traffic Controllers in a Multiple Remote Tower Environment
by Oliver Ohneiser, Hartmut Helmke, Shruthi Shetty, Matthias Kleinert, Heiko Ehr, Sebastian Schier-Morgenthal, Saeed Sarfjoo, Petr Motlicek, Šarūnas Murauskas, Tomas Pagirys, Haris Usanovic, Mirta Meštrović and Aneta Černá
Aerospace 2023, 10(6), 560; https://doi.org/10.3390/aerospace10060560 - 14 Jun 2023
Cited by 1 | Viewed by 1551
Abstract
Assistant Based Speech Recognition (ABSR) systems for air traffic control radiotelephony communication have shown their potential to reduce air traffic controllers’ (ATCos) workload. Related research activities mainly focused on utterances for approach and en-route traffic. This is one of the first investigations of [...] Read more.
Assistant Based Speech Recognition (ABSR) systems for air traffic control radiotelephony communication have shown their potential to reduce air traffic controllers’ (ATCos) workload. Related research activities mainly focused on utterances for approach and en-route traffic. This is one of the first investigations of how ABSR could support ATCos in a tower environment. Ten ATCos from Lithuania and Austria participated in a human-in-the-loop simulation to validate ABSR support within a prototypic multiple remote tower controller working position. The ABSR supports ATCos by (1) highlighting recognized callsigns, (2) inputting recognized commands from ATCo utterances in electronic flight strips, (3) offering correction of ABSR output, (4) automatically accepting ABSR output, and (5) feeding the digital air traffic control system. This paper assesses human factors such as workload, situation awareness, and usability when ATCos are supported by ABSR. Those assessments result from a system with a relevant command recognition rate of 82.9% and a callsign recognition rate of 94.2%. Workload reductions and usability improvement with p-values below 0.25 are obtained for the case when the ABSR system is compared to the baseline situation without ABSR support. This motivates the technology to be brought to a higher technology readiness level, which is also confirmed by subjective feedback from questionnaires and objective measurement of workload reduction based on a performed secondary task. Full article
Show Figures

Figure 1

32 pages, 2982 KiB  
Article
Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels—Increasing Safety While Reducing Air Traffic Controllers’ Workload
by Nils Ahrenhold, Hartmut Helmke, Thorsten Mühlhausen, Oliver Ohneiser, Matthias Kleinert, Heiko Ehr, Lucas Klamert and Juan Zuluaga-Gómez
Aerospace 2023, 10(6), 538; https://doi.org/10.3390/aerospace10060538 - 5 Jun 2023
Cited by 3 | Viewed by 1894
Abstract
Automatic speech recognition and understanding (ASRU) for air traffic control (ATC) has been investigated in different ATC environments and applications. The objective of this study was to quantify the effect of ASRU support for air traffic controllers (ATCos) radar label maintenance in terms [...] Read more.
Automatic speech recognition and understanding (ASRU) for air traffic control (ATC) has been investigated in different ATC environments and applications. The objective of this study was to quantify the effect of ASRU support for air traffic controllers (ATCos) radar label maintenance in terms of safety and human performance. Therefore, an implemented ASRU system was validated within a human-in-the-loop environment by ATCos in different traffic-density scenarios. In the baseline condition, ATCos performed radar label maintenance by entering verbally instructed ATC commands with a mouse and keyboard. In the proposed solution, ATCos were supported by ASRU, which achieved a command recognition rate of 92.5% with a command error rate of 2.4%. ASRU support reduced the number of wrong or missing inputs from ATCos into the radar label by a factor of two, which contemporaneously improved their situational awareness. Furthermore, ATCos where able to perform more successful secondary tasks when using ASRU support, indicating a greater capacity to handle unexpected events. The results from NASA TLX showed that the perceived workload decreased with a statistical significance of 4.3% across all scenarios. In conclusion, this study provides evidence that using ASRU for radar label maintenance can significantly reduce workload and improve flight safety. Full article
Show Figures

Figure 1

29 pages, 1582 KiB  
Article
Effects of Language Ontology on Transatlantic Automatic Speech Understanding Research Collaboration in the Air Traffic Management Domain
by Shuo Chen, Hartmut Helmke, Robert M. Tarakan, Oliver Ohneiser, Hunter Kopald and Matthias Kleinert
Aerospace 2023, 10(6), 526; https://doi.org/10.3390/aerospace10060526 - 1 Jun 2023
Cited by 2 | Viewed by 1379
Abstract
As researchers around the globe develop applications for the use of Automatic Speech Recognition and Understanding (ASRU) in the Air Traffic Management (ATM) domain, Air Traffic Control (ATC) language ontologies will play a critical role in enabling research collaboration. The MITRE Corporation (MITRE) [...] Read more.
As researchers around the globe develop applications for the use of Automatic Speech Recognition and Understanding (ASRU) in the Air Traffic Management (ATM) domain, Air Traffic Control (ATC) language ontologies will play a critical role in enabling research collaboration. The MITRE Corporation (MITRE) and the German Aerospace Center (DLR), having independently developed ATC language ontologies for specific applications, recently compared these ontologies to identify opportunities for improvement and harmonization. This paper extends the topic in two ways. First, this paper describes the specific ways in which ontologies facilitate the sharing of and collaboration on data, models, algorithms, metrics, and applications in the ATM domain. Second, this paper provides comparative analysis of word frequencies in ATC speech in the United States and Europe to illustrate that, whereas methods and tools for evaluating ASRU applications can be shared across researchers, the specific models would not work well between regions due to differences in the underlying corpus data. Full article
Show Figures

Figure 1

25 pages, 1000 KiB  
Article
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
by Juan Zuluaga-Gomez, Amrutha Prasad, Iuliia Nigmatulina, Petr Motlicek and Matthias Kleinert
Aerospace 2023, 10(5), 490; https://doi.org/10.3390/aerospace10050490 - 22 May 2023
Cited by 9 | Viewed by 2740
Abstract
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and [...] Read more.
In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI)-based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read-back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) an automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) a high-level air traffic control (ATC)-related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we develop a robust and modular system with optional submodules that can enhance the system’s performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% absolute word error rates (WER) on high- and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield a callsign detection accuracy of more than 96%. Full article
Show Figures

Figure 1

16 pages, 1588 KiB  
Article
Automatic Flight Callsign Identification on a Controller Working Position: Real-Time Simulation and Analysis of Operational Recordings
by Raquel García, Juan Albarrán, Adrián Fabio, Fernando Celorrio, Carlos Pinto de Oliveira and Cristina Bárcena
Aerospace 2023, 10(5), 433; https://doi.org/10.3390/aerospace10050433 - 4 May 2023
Cited by 7 | Viewed by 1934
Abstract
In the air traffic management (ATM) environment, air traffic controllers (ATCos) and flight crews, (FCs) communicate via voice to exchange different types of data such as commands, readbacks (confirmation of reception of the command) and information related to the air traffic environment. Speech [...] Read more.
In the air traffic management (ATM) environment, air traffic controllers (ATCos) and flight crews, (FCs) communicate via voice to exchange different types of data such as commands, readbacks (confirmation of reception of the command) and information related to the air traffic environment. Speech recognition can be used in these voice exchanges to support ATCos in their work; each time a flight identification or callsign is mentioned by the controller or the pilot, the flight is recognised through automatic speech recognition (ASR) and the callsign is highlighted on the ATCo screen to increase their situational awareness and safety. This paper presents the work that is being performed within SESAR2020-founded solution PJ.10-W2-96 ASR in callsign recognition via voice by Enaire, Indra, and Crida using ASR models developed jointly by EML Speech Technology GmbH (EML) and Crida. The paper describes the ATCo speech environment and presents the main requirements impacting the design, the implementation performed, and the outcomes obtained using real operation communications and real-time simulations. The findings indicate a way forward incorporating partial recognition of callsigns and enriching the phonetization of company names to improve the recognition rates, currently set at 84–87% for controllers and 49–67% for flight crew. Full article
Show Figures

Figure 1

Back to TopTop