Operational Feasibility Analysis of the Multimodal Controller Working Position ”TriControl”

Current Air Traffic Controller working positions (CWPs) are reaching their capacity owing to increasing levels of air traffic. The multimodal CWP prototype TriControl combines automatic speech recognition, multitouch gestures, and eye-tracking, aiming for more natural and improved human interaction with air traffic control systems. However, the prototype has not yet undergone systematic evaluation with respect to feasibility. This paper evaluates the operational feasibility, focusing on the system usability of the approach CWP TriControl and its fulfillment of operational requirements. Fourteen controllers took part in a simulation study to evaluate the TriControl concept. The active approach controllers among the group of participants served as the main core target subgroup. The ratings of all controllers in the TriControl assessment were, on average, generally in slight agreement, with just a few showing statistical significance. However, the active approach controllers performed better and rated the system much more positively. The active approach controllers were strongly positive regarding the system usability and acceptance of this early-stage prototype. Particularly, ease of use, user-friendliness, and learnability were perceived very positively. Overall, they were also satisfied with the command input procedure, and would use it for their daily work. Thus, the participating controllers encourage further enhancements to be made to TriControl.

working position (CWP), which mainly comprises an interactive air traffic situation data display and the communication infrastructure.
First, controllers observe and analyze the air traffic at their situation data display. Thus, it is only a small step to utilizing eye-tracking technology to select the currently observed aircraft as the one to receive the next command. Secondly, verbal commands are an everyday concept in ATC. However, utterances can be reduced to only articulating command values. Thirdly, performing simple and fast multitouch gestures for command types-as are widely used nowadays on electronic consumer products-complements an easy and natural way of creating commands, which the controller finally confirms.
Multimodal HMIs combine different interaction modalities, aiming to support a natural [1] and efficient way of human communication [2,3]. Recent research has revealed that reasonable interaction technologies [4] for a CWP should recognize touch, speech, and gaze [5][6][7][8]. In accordance with these findings, the German Aerospace Center (DLR) has developed the multimodal CWP TriControl concept, which combines automatic speech recognition, multitouch gestures with one or multiple fingers on a touch input device, and eye-tracking via infrared sensors located at the bottom of the monitor. These modalities can be used to input the three basic ATC command parts, i.e., aircraft callsign, command type, and value, into the ATC system [9]. Hence, conventional subsequent command parts that are uttered verbally are replaced by parallel ATC system input with different modalities [10].
First analyses with the multimodal CWP prototype TriControl showed an acceleration of command input by up to 15% [10]. Furthermore, the artificial voice broadcast or data-link transmission of commands resulting from combined command parts of the parallel ATC system input in TriControl can also reduce misunderstandings in verbal communication caused by various "foreign language" English accents [11], that might even lead to serious accidents [12].
However, the operational feasibility including the system usability and acceptability of TriControl have not yet been systematically evaluated with ATCOs in a realistic environment [9]. As ATCOs work in a highly safety-critical domain, they and the air navigation service providers are very cautious with respect to new technologies [13].
The goal of this paper is to evaluate the multimodal CWP prototype TriControl in practice, i.e., mainly based on questionnaires after simulation runs, to receive input from target users for future development [14]. The evaluation concentrates on operational feasibility in terms of system usability and analyzing the fulfillment of operational requirements.
In the next section, we present the relevant background on multimodal HMIs, the validation methodology, and the TriControl system. Section 3 is the method part, introducing the participants, setup, and contents of the TriControl feasibility analysis study. All analysis results on system usability, acceptability, and performance are presented in Section 4. The main results and further comments are briefly presented in Section 5. Finally, Section 6 summarizes and concludes this paper, and sketches out future work.

Background of Multimodal Interfaces, Feasibility Analysis, and the CWP prototype TriControl
Many studies have investigated the advantages and disadvantages of specific interaction technologies as well as of multimodal HMIs. The most important results regarding multimodal HMIs and their relationship with the ATC domain are outlined in the following section. Furthermore, a theoretical background regarding the main aspects of the feasibility analysis study is outlined, i.e., concerning the validation methodology, the concepts of usability and acceptability, as well as the user-centered design. Finally, the functionality of the multimodal CWP prototype TriControl that was evaluated in a feasibility analysis is explained.

Multimodal HMIs and their benefits for ATC
Human-machine systems comprise reciprocal interaction between system components such as hardware and software as well as humans to achieve specific goals [15]. The communication channel for information between human and machine is called the "modality" [16]. HMIs usually utilize engine. This reduces the command error rate down to 1.7%. The respective radar label maintenance task supported by speech recognition reduces ATCO workload for ATC system input by more than 30% [53].
Eye-tracking interaction has already been analyzed in the air traffic domain [54,55]. The freeing of hands to be used for other manual interaction is one central advantage of this interaction means [56]. However, the visual selection of elements after a gaze dwell time does not seem beneficial [57]. The use of gestures in the ATC context has also been investigated. Earlier prototypes mostly include multitouch surfaces for more complex gestures [58]. Further examples from ATC research prototypes combine gestures with eye-tracking [59] or speech recognition [60]. Another application uses visual gesture recognition of air marshallers [61,62].
Touch gesture recognition has been investigated in the context of multimodal CWP [13,63]. Multitouch based interaction was evaluated as natural and fast enough for ATC applications [13]. Furthermore, users were able to work with the tested modalities quickly and perceived easy interaction [13]. Speed gains of up to 14% could be achieved compared to mouse inputs [64]. As ATCOs work in a highly safety-critical environment, the acceptability and trust by ATCOs of their HMIs is essential.
The user-centered design process takes target users into account in each design step of the HMI. Hence, there are early opportunities to influence the HMI development to the needs of ATCOs also in low technology readiness levels respective to validation phases.

Validation Methodology for Feasibility Analysis with Usability, Acceptability, and User-centred Design
The European Operational Concept Validation Methodology 3 (E-OCVM 3) developed by the European Organisation for the Safety of Air Navigation (EUROCONTROL) [65] provides a processual approach for the validation of air traffic management (ATM) operational concepts. The methodology shall include all relevant stakeholders and support the development process. The E-OCVM concept lifecycle model encompasses eight steps for maturing concepts based on iterative loops for design and evaluation. The steps for "validation phases" (V) are "ATM needs (V0)", "Scope (V1)", "Feasibility (V2)", "Pre-Industrial Development and Integration (V3)", "Industrialization (V4)", "Deployment (V5)", "Operations (V6)", and "Decommissioning (V7)" focusing on V1 to V3 for the concept validation methodology. Many of those phases are similar to the more popular "technology readiness levels" (TRL) 1 to 9 [66]. Hence, V1 corresponds to TRL2 "Technology concept and/or application formulated", V2 corresponds to TRL4 "Component validation in laboratory environment", and V3 corresponds to TRL6 "System/subsystem model or prototype demonstration in an operational environment" [67].
The TriControl CWP prototype is assumed to fulfill step V2 "feasibility" of E-OCVM 3 or TRL4 "Component validation in laboratory environment" respectively. In this step, the technological concept of a prototype in the ATC domain should be elaborated to be operationally feasible in normal and non-normal conditions, the latter e.g., comprising emergency flights or severe weather. An initial functional prototype should undergo a simulation for further analysis and revelation of further development needs. The aspect of feasibility itself is again subdivided into operability respectively usability, (system) acceptability, and performance.
Usability as one aspect of feasibility is defined as a construct with many facets including the aspects being "easy to learn, efficient to use, easy to remember, having a low error rate, and meeting user satisfaction" [68]. It can also be seen as the extent to which a system can be used regarding specified users and context to reach effectiveness, efficiency, and satisfaction [69]. Much background of the concept "usability" is given in [70]. The focus on users and environments next to just the tasks in the tool development is a central factor resulting from usability concerns [71,72]. If usability is taken into account, this can increase productivity, reduce training and support needs, improve users' acceptance [73], or even lead to higher efficiency [74]. Usability can be measured directly or indirectly [75,76] via questionnaires and interviews on perceived usability respectively via behavioral and interaction data from system experiments [77,78]. Therefore, a combination of evaluation methods improves the usability assessment [79]. System usability problems can be detected with small user sample sizes. In specific studies, five study subjects were able to find 80% of usability problems and 15 study subjects detected all usability problems [80,81], however there may be hierarchies in those problems so that fixed numbers of subjects might not make sense [82].
Acceptability as another aspect of feasibility can be defined as the perceived usefulness and ease of use of a system to fulfill a task [83][84][85]. This affects the attitude towards the system as well as the behavioral intention and actual use of this system following the Technology Acceptance Model (TAM) [86]. The TAM has broadly been applied and developed high reliability to become a valuable acceptability assessment model [87]. If acceptability is taken into account during concept and system development especially in complex environments, this can avoid user resistance [88] and avoid negative use of the system such as obstruction or under-utilization [89]. Acceptability can be measured via Likert attitude scales [90] in questionnaires. A widely used questionnaire with 12 items [91] has high reliability [92].
Those aspects of feasibility can be assessed early in the applied "user-centered design" process. User-centered design encompasses the involvement of all relevant stakeholders in iterative design steps having an appropriate view on the requirements of tasks, users, and task distribution [73]. If user-centered design is applied, this can result in better acceptance of users [93], higher satisfaction [94], improved usability [95], and less training needs [96]. The iterative design loop of DIN EN ISO 9241-210 includes an analysis of system usage context, followed by a deduction of requirements and the development of a design solution that is evaluated afterwards. Non-satisfied user requirements of the evaluation tests lead back to the beginning of the design loop. This methodology has also been applied to certain extents to other ATC interfaces [97] next to TriControl.

Multimodal CWP TriControl
Nowadays, an ATCO usually issues verbal clearances to pilots via radio telephony and enters the clearances' contents manually into the ATC system. The clearances include structured information about the necessary pilot actions. The central contents are an aircraft callsign, a command type, and a command value. Pilot actions after readback of the clearance can lead to trajectory changes of the aircraft, e.g., due to speed, altitude, and heading changes; or can be of organizational manner, e.g., to handover the controlling responsibility for an aircraft to an ATCO of the adjacent airspace sector.
When using the multimodal CWP TriControl-combining three input modalities-an ATCO is able to generate a clearance with an aircraft callsign via gaze, a command type via multitouch gesture, and a command value via verbal utterance (see Figure 1).

Figure 1.
Multimodal interaction with TriControl combining the inputs from eye-tracking via gaze, automatic speech recognition via utterance, and multitouch display via gesture to a controller command (adapted from [98]).
The eye-tracking device detects the spot at the radar screen the ATCO is fixing for a certain dwell time. If there is an aircraft label or icon, the respective callsign is selected as the aircraft that will then receive the next command. Afterwards, the ATCO can input the command type and value in sequence or in parallel. The targeted aircraft is locked as soon as the system detects that the ATCO performs a gesture or utters something, to avoid the clearance being sent to another aircraft that may be looked at thereafter.
The command type is entered into the system via two-dimensional gestures on the multitouch display. The one-finger gestures swipe down results in "descend", swipe up for "climb", swipe left for "reduce", swipe right for "increase", and long press for "cleared ILS/handover/direct to" depending on the value. When rotating semi-circle wise with two fingers, this will be recognized as a "heading" type. In an additional multitouch interaction mode that can be activated and deactivated with a button press on the multitouch device, some aspects of the graphical user interface of TriControl can be adjusted. This is done with two further multifinger gestures: Five fingers are used to zoom in and out on the radar map via spreading/contracting or they are just moved to pan the map; two fingers are needed to attach the distance measurement tool circles on two selectable aircraft.
The command value is spoken and recorded with a microphone. This value only consists of digits or is a waypoint, runway, or controller position name, respectively, i.e., "one five zero", "two hundred", "delta lima four five five", "two three right", "tower", etc. The command type gesture and command value utterance can be entered in parallel as Figure 2 demonstrates.
The callsign, type, and value are then combined to a controller command displayed to the ATCO. The uttered value is displayed in yellow in the type field of the corresponding aircraft label. For the validation trial, there was an additional top bar on the radar screen [99] with the three input elements next to a yellow value in the corresponding command type label field of the respective aircraft as shown in Figure 2. This visualization of the generated controller command before issuing it helps to detect mistakenly entered or falsely recognized command parts. Thus, the controller can either completely cancel the clearance or overwrite a wrong callsign as detected by gaze recognition, a wrong command type as analyzed by the multitouch device, or a wrong value as recognized by the speech recognition. Hence, there is even one more additional manual check for correctness with TriControl than just to listen to the pilot readback to determine if a conventional-completely verbal-clearance might contain an error.
If the ATCO acknowledges the completed command via a confirmation tap on the multitouch device, it is entered into the ATC system and could be further processed to influence the aircraft trajectory. Hence, the command could be sent to the aircraft via datalink or could be read by an artificial voice via the usual radio telephony channel. More details on the background of TriControl as well as functionalities especially with respect to aspects of command element input orders and timing can be found in [10]. Input order of TriControl starting with eye fixation (on label of callsign DLH6421), followed by potentially parallel touch gesture (rotating two-finger semi-circle swipe) and speech utterance (two one zero) being recognized, terminated with a confirmation tap (short press) to finalize a command.

Multimodal CWP Feasibility Analysis
The participants, setup, and tasks of the feasibility analysis study as well as questionnaires and study hypotheses are explained in the following sections.

Evaluation Site and Study Participants' Characteristics
The TriControl human-in-the-loop feasibility analysis study took place at the German air navigation service provider DFS Deutsche Flugsicherung GmbH in Langen, Germany in April 2017. Fourteen DFS ATCOs with an average age of 47 years (standard deviation (SD): 10 years) participated as study subjects. They had an average professional ATC experience of 21 years (SD: 12 years). The ATCOs worked at different positions, such as approach (7xAPP), area control center (5xACC), upper area center (2xUAC), tower (4xTWR), and as a generic instructor, so multiple answers were possible, and multiple perspectives were obtained. Four ATCOs were identified as the core target group being active APP ATCOs. This is due to the fact that TriControl is an approach control CWP, and current controlling skills, i.e., not being retired, influences the performance during the simulation study.
Some characteristics and experiences of ATCOs were asked as this could be relevant for the efficient usage of the different input modalities. Five participants had previous experience with eye-tracking, 3 with gesture-based inputs, and 10 with automatic speech recognition. Four participants did not use vision correction devices, two participants wore contact lenses, and eight participants wore glasses. Twelve of 14 participants were right-handed. All participants had appropriate English language skills as needed for air traffic control. The participants' native languages were German (9), English (3), Hungarian (1), and Hindi (1).

Tasks during the Human-in-the-Loop Study for Feasibility Analysis
The complete study included four different phases: Introduction, training, simulation run, and evaluation with debriefing. The TriControl concept and functionalities were described during the 15 min introduction. This included a standardized presentation about project goals as well as the system handling with the three interaction modalities and the graphical user interface taught by the technical supervisor.
The training phase consisted of a practice human-in-the-loop simulation run and lasted roughly 15 min. The traffic scenario used Düsseldorf approach airspace and comprised less aircraft than the later evaluation trial. This gave study participants time for repetition and familiarization with multiple and very different command inputs. Furthermore, they could focus on gathering information using the new radar screen environment. In addition, the eye-tracking device was calibrated to the participants' physical requirements. This phase was accompanied by the technical and psychological supervisors who answered open questions and corrected possible mistakes. As soon as the study participants stated his/her comfort and confidence with the system, the practice run was finished.
The simulation run in which participants worked with the TriControl CWP lasted a bit more than 30 min. The hardware comprised commercial-off-the-shelf devices (laptops, monitor, touchpad, eye-tracker, headset, foot switch). The Düsseldorf approach area with only active runway 23R was used as simulation setup (see Figure 3). The air traffic scenario comprised 38 arriving aircraft including seven of wake turbulence category "Heavy" and 31 "Medium". The scenario did not encompass departure traffic and was sufficient for one-hour maximum simulation time. Each participant's task was to work as a "Complete Approach" controller (meaning combined pickup/feeder ATCO in Europe and combined feeder/final ATCO in the US, respectively). The traffic scenario used standard arrival routes and there were no unusual traffic or weather conditions. The aircraft followed the issued command instructions directly after confirmation. Hence, the participants got an impression of TriControl's functional mechanisms in a standard ATC approach environment.
During the final evaluation and debriefing phase, all study participants needed to fill out questionnaires regarding feasibility with usability and acceptability, demographics, and profession-related data in the presence of the psychological supervisor. More precisely, 10 questions about personal data as well as 146 statements comprising the system usability scale (SUS) as well as the topics TriControl concept (T), eye-tracking (E), clearances (C), gestures (G), speech recognition (S), input procedure (I), and radar screen (R) plus 30 lines for optional comments on certain elements needed to be handled. Examples for those statements contain the ability to guide air traffic with TriControl (topic T), usefulness of eye-tracking (topic E), ability to issue different command types (topic C), learnability of gestures (topic G), user-friendliness of speech recognition (topic S), satisfaction with command input procedure (topic I), and identification of radar information (topic R) (see Appendix A for all statements to be rated). Together with further 17 categories for notes taken by the psychological supervisor during the experiment, this sums up to 203 lines of raw data for each of the 14 participants.
Three classes of requirements have been defined for the feasibility analysis of TriControl: (1) Multimodal interface fitness for intended use in the "TriControl concept", (2) "information retrieval", and (3) "command issuing". The developed questionnaires apply the norm DIN EN ISO 9241-11 (2017) with the subcategories effectiveness, efficiency, and satisfaction, as well as acceptability. For class (1-TriControl concept), the category effectiveness consisted of controlling air traffic as the core task of an ATCO. The category efficiency orients on DIN EN ISO 9241-11 Dialogue Principles (2006) for HMIs. The general requirements of this main norm were used for the category satisfaction. The category acceptability is assessed with widely used items of system use aspects [86]. For class (2-information retrieval), category effectiveness took the retrieving of information into account, category efficiency bases on DIN EN ISO 9241-12 Presentation of Information (2000), categories satisfaction and acceptability used the same sources as for class (1). For class (3-command issuing), category effectiveness again took the core task of issuing commands into account, categories satisfaction and acceptability consider the E-OCVM 3 demands.

System Usability and Feasibility Analysis Questionnaire
To answer the research questions on basic feasibility and usability of the TriControl prototype in a quantitative and qualitative way, different assessment approaches were necessary. The quality of the operational concept and system usability of TriControl in general needed to be analyzed with a globally comparable measure. Therefore, the System Usability Scale (SUS) [100,101]-a subjective system usability assessment tool-was chosen. The SUS questionnaire consists of 10 statements to be rated on a scale comprising five possible answers coded as 0 to 4 points. The statements alter their positive-negative formulation respectively to prevent bias [102]. All ten items are multiplied with 2.5 to span a range from 0 to 100 whereas a higher score indicates better perceived system usability. The SUS proved to be highly reliable with an α of 0.911 [103] and to represent an overall trend [104]. Furthermore, the SUS scale has been used to evaluate TriControl in an earlier phase [9] and thus allows for better comparability and continuing the system usability assessment.
For the current analysis, the SUS score should indicate a sufficient value to represent a system usability as "ok", i.e., be at least at 50.9 as investigated in the literature [104]. Thus, the formal hypotheses on system usability of TriControl are: 01: ̅ < 50.9 (1) The SUS score represents an overall score of ten recorded items [100] that is used for a point estimation. The SUS score is analyzed for a confidence interval (condition α=0.05) for an interval estimation. It also investigates possible significant deviations from the critical cutoff value [79].
The feasibility was tested with a newly developed Likert-scale questionnaire based on user requirements. The self-based assertions aimed to evaluate the single elements of the TriControl system in a systematic manner [105]. The respective scale ranged from 1 (strongly disagree) to 6 (strongly agree) and included two further items (not important) and (not affected) [106]. The newly developed feasibility questionnaire should indicate at least a positive evaluation above the average score of 3.5 on the Likert-scale [90] ranging from 1 to 6 for all items. Hence, the formal hypotheses on feasibility of TriControl's operational concept are: The non-parametric binomial test was used for the statistical significance analysis due to the small sample size of N=14. However, taking into account the robust binomial distribution supporting the null hypothesis, results will less likely be significant with respect to the desired direction [107]. The binomial test for each item included the n answers actually given by ATCOs, a test ratio of 0.5, an α of 0.05, and an expected mean value of 3.5 as the answers lay within 1 to 6. The further qualitative analysis was structured content-wise to deduct recommendations for certain feasibility elements according to [108].
Additionally, verbal remarks by the study subjects on the human-machine interface during non-task-interfering times were noted in a similar version compared to the Thinking Aloud technique [109]. Furthermore, non-verbal mistakes when using the prototype were noted [110].

Results of the Feasibility Study
The questionnaire results on system usability and feasibility as well as the most important comments of the 14 ATCOs are reported in the following sections.

Score of System Usability Scale (SUS)
The average SUS of TriControl for all 14 ATCOs was 60.9 (SD=21.9; lower and upper confidence interval limits: 48.3/73.5). Hence, the mean value indicates a system usability between "ok" (50.9) and "good" (71.7) [104]. However, the confidence interval overlapped the cutoff value. So, the mean value does not significantly deviate from the null hypothesis value of 50.9. It has to be rejected that the TriControl prototype offers a valid operational concept for ATC at the current stage. Though, when reducing the sample set to the core target group of active approach (APP) ATCOs (N=4), the results dramatically improved, i.e., the mean increased to 79.4 (SD=9.7; lower and upper confidence interval limits: 73.8/85.0). This would indicate a system usability evaluation of TriControl between "good" and "excellent" [104] as shown in Table 1. The SUS score of 79 also equaled an older non-systematic pre-evaluation of TriControl [9]. Table 1 also lists the 10 single SUS items S01-S10 representing a similar result regarding the ratings of active APP ATCOs. They did not perceive TriControl as cumbersome (S08) or inconsistent (S06) but would even like to use the system frequently (S01) with 3.5 points or more on the scale up to 4 points. Furthermore, the four usability statements S11-S14 on the three different input modalities and the combination of it was rated above the scale mean and, again, better from active APP ATCOs.  1 Rating per single item from 0 "worst rating" to 4 "best rating", multiplied by 2.5 for Total SUS score. M represents the mean, SD the standard deviation. 2 Statement rating has been "inverted" due to negative formulation, i.e., 0.5 points in the raw data are presented as 3.5 points here to enable better comparability of all items.

Feasibility Questionnaire Ratings
All 25 statements on the TriControl concept (T) are presented in Table A1. The 44 statements on command input in different categories (E/C/G/S/I) are shown in Table A2. Table A3 lists all 63 statements on the used prototypic radar screen (R). The tables include values for ratings' mean, standard deviation, number of answers, and number of positive answers. They also list the p-value of the binomial test for significance analysis, i.e., to assess if the mean value significantly deviates from the null hypothesis value of 3.5. In roughly 85% of all 132 items of those three tables-especially except in the majority of items in category "R1.2 Coordination"-the rating of the active APP ATCOs was equal or better than the rating of all ATCOs. More than 55% of active APP ATCO ratings, on average, were equal or even above 5 points on the six-point scale. Some meaningful results per category are highlighted in the following.

Ratings on TriControl concept (T)
The active APP ATCOs rated the statements on the TriControl concept with an average of 4.8 points (on a scale from 1 to 6, see Table A1). Except for the statement on "T6.1 need of suitability for individualization", the active APP ATCO ratings were better than of all ATCOs, i.e., almost one point higher.
ATCO (in particular active APP ATCOs) were able to guide aircraft to their destination in an efficient way following the common safety requirements with TriControl (Controlling T1.1-T1.3). The TriControl interface was rated as appropriate for the intended use. ATCOs-especially with the parallel command input-felt supported to quickly and effectively achieve their best performance (Task Adequacy T2.1-T2.3). All ATCOs were aware of TriControl command input states and knew which and how actions could be executed to perform their controlling tasks due to the average ratings (Self-Descriptiveness T3.1-T3.4). They were also able to intuitively interact with TriControl as it matched common CWP conventions (Expectation Conformity T4.1-T4.2).
Furthermore, the statement ratings on timing and issuing of commands were rated above the scale mean (Controllability T5.1-T5.3). Particularly, active APP ATCOs felt safe to issue commands with little time and mental extra effort in case of a mistake (Error Tolerance T6.1). Active APP ATCOs less likely wanted to be able to adapt TriControl's interface to personal preferences than all ATCOs on average, even though they preferred to have the settings options (Suitability for Individualization T7.1). The satisfaction, notably of active APP ATCOs with TriControl, was good (T8.6). There were high ratings for the ease of use, user-friendliness, and learnability (Satisfaction and Acceptability of TriControl T8.1-T8.8). Some even wished to use TriControl in their daily work if they had the option.
To sum it up, almost all ratings were in the positive half of the scale, indicating a feasible TriControl concept even if not being statistically significant in all cases. Some circumspection existed to state that TriControl is preferred over common ATC interfaces, even in its current prototypic stage.

Ratings on command input
Every single active APP ATCO rating on the command input statements was better than that of the group of all ATCOs, i.e., more than 0.7 points better in average on the six-point scale (see Table  A2). Almost one-third of the statements even had a significantly positive rating.

Ratings on eye-tracking (E)
The eye-tracking modality worked fine for aircraft selection. ATCOs perceived the eye-tracking as useful, user-friendly, as well as easy to use and learn (Aircraft Selection E1.1-E1.2; Satisfaction and Acceptability of the Eye-Tracking Feature E2.1-E2.8). Ratings of active APP ATCOs were mostly in the positive scale range.

Ratings on clearances (C)
According to the ratings, ATCOs were able to issue each type of clearance that TriControl offers. They also knew the command state they were in and could even simultaneously enter command type and value. Almost all statements were rated statistical significantly positive (Issuing Commands C2.1-C2.9).

Ratings on gestures (G)
The multitouch gestures to input the command type were perceived as useful, user-friendly, as well as easy to use and learn (Satisfaction and Acceptability of the Gesture based Command type input G2.1-G2.8) and could be an option for ATCOs' daily life CWPs with respect to ratings.

Ratings on speech recognition (S)
ATCOs were also satisfied with the automatic speech recognition modality for command value input in average, especially true for the active APP ATCOs (Satisfaction and Acceptability of Speech-Recognition based command value input S2.1-S2.9). Speech recognition was rated as useful, user-friendly, as well as easy to use and learn. Furthermore, the majority did not have problems verbalizing only the value instead of a whole command.

Ratings on input procedure (I)
The ratings on the complete command input are very similar to those of single input modalities (Satisfaction and Acceptability of the complete command input procedure I2.1-I2.8). TriControl received positive ratings for usefulness, user-friendliness, as well as ease to use and learn. In particular, active APP ATCOs were satisfied with eye-tracking, multitouch gesture recognition, speech recognition, and command confirmation elements of TriControl.
If all ATCOs are considered, the ratings on daily work use of TriControl and preference over conventional ATC interfaces are only around the scale mean. It is noticeable that all statements on effectiveness of the single interaction modes and TriControl as a compound, respectively (E/G/S/I.2.2 and T8.2) were rated rather negatively below the scale mean of 3.5. As TriControl would replace or support an APP CWP, respectively, it is also not expected that it would be more effective. Controlling air traffic via commands is still possible and is still the valid method to actively guide the traffic. The term effectiveness does not say anything about the efficiency of TriControl. The potential for efficiency gains-comparing TriControl with pure speech commands and following manual ATC system input-has been reported in [10].

Ratings on radar screen (R)
The majority of ATCOs' ratings on the radar screen used for TriControl were in the positive scale range and even statistical significantly positive (Aircraft within and aircraft heading to ATCOs' sector, orientation aids, centerline separation range (CSR), information design, as well as satisfaction and acceptability R1.1.1-R6.8). Active APP ATCOs had some difficulties to obtain weight classes and alphanumeric distances at the linker line between two aircraft as the appearance was different from their usual radar screen. The basic radar screen appearance should represent a common state-of-the-art like radar display. This was confirmed by ATCOs as it was usable for monitoring, designed to be user-friendly, and was easy to learn (R6.3-R6.5). The active APP ATCO ratings on the radar screen (see Table A3) were slightly better than the ratings of all ATCOs, i.e., more than two-thirds of the statements had better scores. However, even more than 60% of the statements for all ATCOs also had a significantly positive rating.
For the TriControl concept itself, it was important that ATCOs rated the statement on discriminability between different command states within the aircraft label (inactive, active, received, confirmed) positively (T3.2).

Feasibility Questionnaire Comments of all ATCOs
On the one hand, there are hints for improvements. Some ATCOs recommended that speech recognition needs to recognize multiple accents better as it did not work reliably for some ATCOs during the simulation trials. One ATCO perceived the foot pedal for speech recording as not helpful. Thus, a button at the headset microphone would be preferred. The technical issue with the non-reliable confirmation gesture recognition should of course be solved. One ATCO claimed that he disliked the two-finger selection for separation assessment and that the left hand is completely unused. Moreover, the additional graphical user interface mode of the multitouch device offers too few functionalities to deserve its own category. A number of ATCOs reported that the eye-tracking was too slow reacting for them. Furthermore, aircraft have been deselected when looking away during command issue phases. The precision of the mouse has been rated better than eye-tracking especially in cases of vertically separated aircraft with partially overlapping labels. Also, the label itself seems to present too much information. Furthermore, many requests for additional functionalities were formulated such as the option to input combined and conditional clearances, to enter vertical speeds, to differentiate between left and right turns, to see the pilot statements in the aircraft radar labels as well, and a possibility for rotating or moving labels. The labels themselves can be analyzed separately as there are other dedicated studies and developments of labels and label interactivity. The main focus of this paper is on the multimodality. Some further comments were made with respect to the simulation capabilities and the radar screen layout, which have not been central aspects of the feasibility study. Therefore, ATCOs wanted to have a better trail history and heading needles that are not overlapped by radar labels. Some aircraft did not follow instructions and the descent rates were too slow. Some ATCOs wanted better highlighting of heavy aircraft and information about whether an aircraft is on his/her frequency. Furthermore, a traffic forecast for the next ten minutes would be helpful. For some ATCOs, it was uncommon to work with a dark background radar map and without minimum vector altitude and airspace maps.
One ATCO remarked that the focus would change from "watching the traffic" to "watching if the system complies". Thus, the system feels like an extra step. In addition, there would be a great potential for confusion and errors due to convoluted features. Command issuing via voice was perceived as easier by some of the ATCOs, because it allows better situational awareness. A fallback redundancy would be necessary next to a safety assessment during further system development.
On the other hand, there are many aspects liked by ATCOs. Some ATCOs would prefer TriControl over mouse and screen input. Another ATCO still liked the Paperless Strip System (PSS) better; however, the mouse menu in the labels was perceived to be worse than in TriControl. Other ATCOs remarked that the use of TriControl is fun, it is easy to learn, and that they liked the system. The idea of combining three input methods was appreciated. One ATCO experienced no problem at all. For another, the speech recognition worked fantastically. The eye-tracking input was interesting and worked well after short practice for a number of ATCOs. Hence, it has potential with more development. If the system input was successful, the response is much quicker than with common systems to overall save time due to other ATCOs. It could also lead to fewer misunderstandings. Further thoughts were on the helpfulness in ATC training. An on-the-job-training instructor, who teaches ATCOs to be instructors, noticed that TriControl would be a good system to easily see what the controller is thinking and doing. The centerline separation range support functionality was especially appreciated as it was helpful without needing deduction. It was also reported that the plausibility check of command elements is a good idea and better than the solutions of competitors. Many ATCOs reported that their performance improved with practice and would further improve, so TriControl would be a good aid to ATCOs. They also encouraged following up the project.

Discussion of TriControl Feasibility
The system usability of TriControl as rated by all ATCOs was in a range up to a "good" result. This system usability score increased to a range up to "excellent" if considering only active APP ATCOs. However, these values should not be taken as equivalent due to the small sample size of only four active APP ATCOs. It can be a bias indicator with respect to the working positions the ATCOs usually work with. The system usability results were also reflected in the same range by the single system usability statements and the additional items for the specific interaction modalities.
The 132 feasibility analysis statements on TriControl concept, command input, and radar screen were slightly positive, whereas active APP ATCOs again agreed far more positively in the majority of items. Particularly, ATCOs appreciated user-friendliness, usefulness, as well as ease to use and learn.
It is worth mentioning that there were great differences in the TriControl concept ratings depending on some personal and technical abilities of ATCOs during the simulation run. TriControl concept was rated much better by ATCOs-than by other ATCOs-if they: • Were able to perform parallel input with different modalities, • Hardly experienced any malfunction with the multitouch pad correspondence, • Did not forget to perform the confirmation gesture after command completion, • Did not perform wrong gestures, • Did not experience some troubles with eye-tracking, • Experienced more reliable speech recognition, • Did not make other interaction mistakes, such as: o Too long press for confirmation and thus turning into a direct_to command, o Forgetting to toggle back from the multitouch device's graphical user interface mode, o Pressing the foot pedal for voice recording during complete command creation. All of the above criteria do fit for the active APP ATCOs. However, it is not completely clear why the four active APP ATCOs performed much better and almost error-free compared to the other 10 ATCOs even if TriControl was designed as an APP CWP. The average age might be an indicator for that, i.e., the four active APP ATCOs were 37 years, the other ATCOs 52 years in average. Assuming that younger people are more familiar with modern interaction technologies from their daily life, this could explain the better ratings of active APP ATCOs. Furthermore, the simulation run time of 30 min might have been too short for ATCOs usually working on other positions to familiarize with the APP environment in addition to the new input modalities.

Summary, Conclusions, and Outlook
The feasibility of the multimodal CWP prototype TriControl-integrating eye-tracking, multitouch gesture, and speech recognition for command input-has been analyzed with 14 ATCOs in a human-in-the-loop study. Feasibility, system usability, and acceptability were judged slightly positive. The subgroup of active approach controllers agreed even more positively due to statistical analysis of questionnaire results. They were also motivated to further improve the TriControl system to bring it closer to operational needs as the achieved feasibility scores do not indicate significant showstoppers.
The SESAR2020 (Single European Sky ATM Research Programme) project PJ.16-04 CWP HMI "Workstation, Controller productivity" also dealt with automatic speech recognition, multitouch inputs, and eye-tracking in three different activities. However, those interaction technologies are not combined, but investigated stand-alone. Further research activities on the three interaction technologies will be continued in SESAR2020′s wave 2 projects PJ.10-96 "HMI Interaction modes for ATC centre" and PJ.05-97 "HMI Interaction modes for Airport Tower". Hence, there is and will be research on modern interaction technologies in air traffic control. However, TriControl is one of the few concepts integrating multiple promising technologies to extract the benefits of each of them.
Recent iterations of ATC system development in general-and in particular, interface developments-have resulted in significant efficiency gains in the ability to process increased traffic levels, which soon rise in the real world to reach the new system limitations. A significant limitation in all further developments, however, seems to be the "bottleneck" of frequency congestion. The process of getting clearances clearly and safely submitted from the ATCO to aircraft and checking pilot readbacks for correctness is time consuming. The TriControl system seeks to address this with what could potentially be the effective removal of the existing bottleneck, allowing a greatly improved capacity increase.
According to the above study results, and to already revealed increased efficiency potential, further development of the early prototype TriControl will be performed to overcome the revealed malfunctions and integrate many suggestions for improvement. Afterwards, TriControl should be applied in different contexts also comprising non-nominal conditions such as weather influence, high-density air traffic, and emergency aircraft. Then, TriControl's operational concept can be compared with current systems including cognitive workload assessment. Overall, the feasibility analysis motivated to foster multimodal interaction for air traffic control.

Patents
TriControl can serve as an input means and usable environment for the command generator European patent application 17158692.8.

Acknowledgments:
We like to thank all ATCOs that participated in the human-in-the-loop study with TriControl. Besides, we are grateful for the support of Dr. Konrad Hagemann (DFS Planning and Innovation) in preparing the simulation trials in Langen. Prof. Dr. Sebastian Pannasch (Technische Universität Dresden) provided valuable input during initial reviews and for the master thesis contents of Axel Schmugler regarding the feasibility analysis. Thanks also to Malte Jauer (DLR) for assisting during the study and for performing earlier implementations.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.