1. Introduction
At present, the introduction of Remotely Piloted Aircraft Systems (RPAS) into controlled airspace and thus the shift from manned to unmanned aviation is underway. However, a seminal, cross-platform review on a large number of incidents and accidents from 2004 [
1] found that RPAS tend to possess higher accident rates compared to conventional aviation [
1]. Since this study, no such comprehensive updates at a comparable scale have been published. A major factor contributing to RPAS accidents and incidents was attributed to design deficiencies in the Ground Control Stations (GCS), the means by which the operator or remote pilot may control, supervise and monitor the RPAS [
2,
3,
4,
5,
6]. Example design deficiencies include pop-up windows obscuring safety-critical information, the proliferation of display screens, complicated menu structures, the reliance on text displays, multifunctional controls and displays, inadequate feedback on the system settings and state, as well as overloading the remote pilots with raw data [
4,
5,
6]. These design deficiencies were found to lead to confusion about the system’s behavior, automation surprises, as well as attentional and visual scanning failures [
6]. In particular, attentional and visual scanning failures in turn can lead to the risk of remote pilots perceiving safety-critical information and warnings too late, and therefore increase the probability of human error. In addition, the lack of a standardized approach to display design led to an increasingly diverse number of GCS display examples, which tend to be composed of multiple screens with different orientations and adaptable configurations.
Future displays in GCSs should ensure that the remote pilot can effortlessly perceive the current states of safety-critical aircraft systems. When compared to written text, numerous studies have highlighted the advantages of icons in terms of recognition [
7], cognitive processing [
8], making errors [
9] and learning and performance [
10]. Many studies have also shown an increase in human performance when icons are used to assist with understanding the associated text [
11]. In addition, icon-based interfaces have been shown to be effective in supporting good interaction between the remote pilot and the RPAS [
12], and were proven to support the remote pilot in quickly perceiving safety-critical aircraft system states [
13,
14]. Examples of icon-based interfaces for RPAS are described by Friedrich and Lieb [
13] and Friedrich and Vollrath [
14]. The authors present and evaluate displays for a GCS (called U-FLY) that would enable a single remote pilot to control and supervise (via supervisory control [
15]) multiple highly automated RPAS [
13] or small unmanned aircraft systems [
14].
Figure 1 depicts the U-FLY GCS during an observation mission (incl. a simulation of electro-optical camera and a Lidar sensor payload [
16]) with three RPAS.
Instead of relying on the text-based presentation of safety-critical information and warnings, the displays utilize hue changing icons to make the remote pilot aware of states of safety-critical system functions at a glance. The goal of using function-specific icons that change hue in accordance to the current system state is to enable the remote pilot to quickly perceive a critical situation without having to read and interpret textual indications, leading to additional workload. As such, the use of unambiguous icons aims to keep the demands on cognitive resources (particularly mental workload) at acceptable levels, especially in situations wherein workload is already high or vigilance is low, and the remote pilot is experiencing pressure or fatigue. It is, however, extremely important that the icons are unambiguous and easy to understand. Otherwise, an icon has the opposite effect and increases workload, because the pilot does not understand it and tries to interpret its meaning.
In the icon display (see left side of
Figure 1 and
Figure 2), each column represents a different RPAS, and each row a safety-critical system function. If the user (or remote pilot) hovers the mouse over the icon, a small text field occurs, showing the meaning of the icon (e.g., HYD for hydraulics or ALT for altitude). This feature was implemented to avoid potential confusion in case the icon is ambiguous for the user (which should be avoided). In order to indicate different system states, the icons change their hue from grey (indicating a non-critical, nominal system state) to either cyan, indicating an expected change (e.g., intended heading change), yellow for caution states (e.g., low fuel) or red for warning states (e.g., loss of hydraulic pressure). The system functions represented by the icons were derived through a work domain analysis of an Airbus A320 aircraft [
17], and thus represent safety-critical functions related to the operation of an aircraft (e.g., altitude control or hydraulic supply). In a study by Friedrich and Lieb [
13], pilots were able to quickly identify critical system failures and subsequent secondary failures (such as a hydraulic pump failure leading to the inability to deflect flaps and the landing gear; cf.
Figure 2).
2. Icon Design Considerations
The design of intuitive icons that accurately represent their intended meaning is challenging. While icons may allow for the effective transmission of information, their meaning can sometimes be ambiguous if they are not chosen appropriately [
18,
19,
20,
21]. This is of significant concern when we consider safety-critical domains where it is necessary that icons transmit their intended meaning accurately. Therefore, it is important that icons are designed and used according to empirically established icon design principles. Numerous studies have investigated various icon characteristics that were found to significantly positively contribute to the performance of an icon. The performance of an icon can be measured by the so-called icon-function fit, which describes how well an icon intuitively represents a specified function [
18,
19,
20,
21,
22].
The most commonly reported icon characteristics are concreteness, complexity, familiarity, meaningfulness and semantic distance, all of which are obtained by subjective ratings. Concreteness describes how well an icon depicts the real-world object it intends to represent. Sometimes more detail within an icon can assist in conveying the icon’s meaning. However, an icon that is too complex cannot be perceived as quickly. Thus, it is important to also measure the complexity of the icon. The remaining three icon characteristics relate to how familiar the icon is in terms of how often it is encountered in everyday life (familiarity), how much meaning the icon conveys (meaningfulness) and how close the icon is perceived in relation to its meaning, i.e., what the icon is intended to represent (semantic distance). Measures for these icon characteristics are obtained by asking participants to rate an icon on a Likert scale from one to five.
Table 1 describes each of the five icon characteristics in detail.
In addition to the icon characteristics that are measured via subjective ratings, various studies also investigated objective metrics for measuring icon-function fit. The most common objective icon-function fit metrics are name agreement and concept agreement [
18,
21]. These are measures for icon ambiguity and are obtained by asking participants to guess the meaning of each icon. Concept agreement is a measure of the percentage of participants who provided the actual intended meaning of the icon. Name agreement is commonly referred to as the percentage of participants who gave the same, most common name for an icon or a picture [
22,
23]. Sometimes, participants can attribute different names to icons. In these instances, we can calculate an agreement measure, which, in addition to name agreement, considers the number of alternative meanings that were provided for an icon. This agreement measure is the information statistic
H [
24,
25].
H is defined as:
In the equation,
k refers to the number of different meanings and
pi is the proportion of participants that provided the respective meaning. An icon for which only one meaning was given by all participants has an
H value of 0. When two different meanings were given with an equal frequency, the icon has an
H value of 1. An increasing
H value would thus indicate a decrease in name agreement. As
H accounts for the number of alternative meanings, it is considered a better indicator for name agreement than just using the name agreement metric alone [
25].
Several studies have reported interdependencies between the icon characteristic and agreement metrics. McDougall et al. [
18] found correlations between concreteness and meaningfulness (
r = 0.82), as well as concreteness and semantic distance (
r = 0.65). McDougall et al. [
26,
27] further suggested that concreteness was the best predictor for the successful transmission of the intended meaning, especially in situations that required rapid understanding of an icon, and also when icons were used infrequently. This would be especially relevant during the first exposures to the icons, with its effect diminishing with progressive icon exposure [
26,
27,
28,
29]. In addition to this, strong associations have been found between the icon characteristics of familiarity and meaningfulness (
r = 0.93), as well as meaningfulness and name agreement (
r = 0.86, [
18]). However, the effect of icon familiarity, as a predictor of ease of icon identification, does not diminish with increasing experience with the icon [
29,
30]. Furthermore, McDougall et al. and Smythwood and Hadzikadic [
31,
32] reported that icon complexity negatively affected search efficacy, which would suggest that simple icons are detected faster than more detailed, complex icons.
Taking these previous findings into account, we can begin to suggest what represents a “good” icon. The icon should resemble the real-world object that is being represented as much as possible (concreteness), entail familiar features (familiarity) and be kept simple (complexity). Addressing these icon characteristics should then positively impact semantic distance, meaningfulness and name agreement (H). As a consequence, the icon will be less ambiguous, more distinct, interpretable, and easier to detect, thus enhancing the icon-function fit.
3. Icon Design Standards and Norms
Several design standards exist for aircraft display symbology, such as the U.S. Airforce (USAF) MIL-STD-1787D [
33] and the Radio Technical Commission for Aeronautics (RTCA) standards DO-257A [
34] and DO-365 [
35]. These icon design standards define symbology requirements, describe fundamental relationships between symbol motion and aircraft system states (MIL-STD-1787D), and prescribe the shapes and colors of icons used to transmit specific meanings related to detect and avoid functionalities (DO-365) and for the depiction of navigation information (DO-257A). The standards thus tend to focus on the symbology used in displays for a specific purpose (such as a navigation or a detect and avoid display), but do not cover symbology used for representing broader safety-critical aircraft functions, such as hydraulic and electric supply or engine thrust. However, for the remote supervisory control of (multiple) RPAS, it is necessary to have a holistic overview of the states of safety-critical functions in order to always have an adequate understanding of the current state of the systems. The aforementioned standards are therefore not applicable to the design of icons representing safety-critical aircraft functions, such as the icons used in the U-FLY GCS [
13].
Furthermore, no study so far has reported norms for icon characteristic and agreement metrics to measure icon-function fit (such as concreteness, semantic distance or name agreement) for aircraft cockpit symbology-related icons. However, for a quantitative icon-function fit evaluation, these norms would be essential. In order to evaluate how well a newly designed icon represents the intended function, its characteristic ratings and agreement measures would need to be compared against previously established norms for icons of the same domain (i.e., aircraft cockpit symbology).
Apart from the aviation domain, different studies report norms for icon characteristic and agreement metrics for standard icon sets, emojis and emoticons [
18,
22,
23,
25,
36,
37,
38,
39,
40,
41]. However, Souza et al. [
42] found that the number of normative studies is still quite sparse. Most studies examining icon characteristic and agreement metrics have tended to focus on icon sets in general, and not on a specific domain. However, the context of using a given icon set is key when considering how the user interacts with them. This is especially pertinent when we examine safety-critical applications, where time and error are significant factors. Monitoring and controlling a single (or multiple) RPAS is one of these safety-critical applications where multiple information sources need to be presented to the remote pilot. The nature of the information that is needed to be conveyed to the remote pilot relates to not only the asset(s) they are in command of, but also to the associated sub-systems that may very well impact the safety and efficiency of the mission. This, along with considerations of varying flight conditions, presents a significant amount of information for a single person to process and act upon. Even with the introduction of automated (or autonomous) support, there is still the need to present the human with the appropriate amount of information. Through the use of icons, the information can be presented in a manageable way, if the icons can convey their meaning effectively.
4. Icon Design
Since none of the above-mentioned icon design standards and norms cover the required icons for representing safety-critical aircraft system functions, new icons were designed and evaluated. Due to the lack of appropriate norms for aircraft display symbology, it is not possible to evaluate newly designed icons by comparing their characteristic ratings and agreement scores to previously established norms.
In order to ensure good icon-function fit of the newly designed icons, a design that embraced an understanding of the introduced icon characteristics was adopted. The design especially considered concreteness, familiarity and complexity. Since the icons were designed to be used by pilots who would operate a GCS such as the U-FLY, an online evaluation study with pilots was conducted in order to gather their views on the icons. Alongside the subjective ratings on concreteness, familiarity, complexity, meaningfulness and semantic distance, objective measures on concept and name agreement were computed for each icon. The results of the analysis serve as a first step towards establishing norms for icon characteristic and agreement metrics for icons representing safety-critical aircraft systems.
7. Discussion
7.1. Discussion of Results
This study designed and evaluated eighteen icons that represent safety-critical aircraft system functions. The icons were designed using established icon design principles. This specifically took the icon characteristics of familiarity, complexity and concreteness into account, whilst also using these characteristics to reinforce the individual’s likelihood of relating to the icons—particularly in terms of icon-function fit. Previous research has suggested that by strictly adhering to these icon characteristics in the design, the resulting icon set would be less ambiguous, more distinct, more interpretable and easier to detect [
21,
27,
32]. All icons in this study were evaluated via an online study, in which pilots guessed the meanings of the icons and rated each icon against the different icon characteristics.
The best results in terms of name agreement H and concept agreement were observed for the electrics, landing gear and lights icons. For these icons, the agreement scores were the highest, and the differences between the name agreement and concept agreement scores were fairly low. Additionally, the icon characteristic ratings for these icons were among the highest in the icon set. The worst results for name agreement H were observed for the collision avoidance, hydraulics and flight controls icons, whereas for concept agreement the worst icons were collision avoidance, radio communications, engine secondary and ADIR. While all icons should transmit their intended meaning unambiguously, the severity of the consequences of misinterpreting an icon depends on the criticality of the function that the icon represents. For example, misinterpreting the radio communications or hydraulics icon poses a lower threat to safety than misinterpreting the collision avoidance, ADIR or flight control icons, particularly when they are used to capture the pilot’s attention in an emergency situation. As such, the collision avoidance, ADIR and flight controls icons in particular need to be redesigned. Further, icon characteristic norms may need to be adjusted in accordance with the safety threat that a misunderstood icon could pose. This means that, particularly for safety-critical icons, such as collision avoidance or fire warnings, the norms should be stricter than for icons representing less safety-critical functions.
The results also show that emphasizing concreteness and familiarity in the design of the icons worked well, as can be seen, for example, in the results for the gear and altitude icons. The findings further show that with increasing concreteness and familiarity, the scores for concept and name agreement/
H increase (cf.
Figure 5). However, the findings also suggest that emphasizing familiarity only increased icon-function fit as long as the familiarity was directed at aviation-related artifacts. One should not use domain-spanning shapes to increase familiarity, as the results for the ADIR, radio communication and engine secondary icons show. These icons contain shapes familiar from other domains, such as photography (ADIR icon) or the IT domain (radio communication icon). As a consequence, the pilots mostly agreed on an unintended meaning, leading to high name but very low concept agreement scores. Additionally, one should not apply a part–whole design, as the use of the shapes of a hydraulic piston for representing the hydraulic systems or the use of a yoke for representing the flight controls demonstrates. The agreement scores for these icons are extremely low. The findings suggest that the design approach of showing only a part of the intended system function led to confusion with regard to the icon’s intended meaning. It is worth noting that despite the poor results for the hydraulics icon, the pilots in the study by Friedrich and Lieb [
13] were able to quickly identify a hydraulic failure through the hue change of the icon, confirming that the effect of icon concreteness diminishes with progressive icon exposure [
29]. The results further show that for some icons, the meaning was correctly recognized, but was misnamed. This is, for example, true for the altitude icon, for which various different yet correct meanings were given, such as height, altitude above reference or altitude or height which has to be flown. Since each of the given meanings is counted as a different name, name agreement decreases, although most given meanings represent the same concept—altitude. This example shows that when it comes to using icons to transmit (safety-critical) information, concept agreement seems to be more informative than name agreement or
H.
7.2. Establishing Icon Characteristic Norms for Aviation
As mentioned before, the currently existing symbology standards in aviation focus on icons used in displays for specific purposes, such as detect and avoid. They do not cover the symbology requirements for representing safety-critical aircraft functions, such as hydraulic and electric supply or engine thrust. However, icons representing safety-critical aircraft functions are needed in order to provide a holistic overview of the states of safety-critical functions of the RPAS through an icon-based display.
Furthermore, no norms for the icon characteristics and agreement scores of icons representing safety-critical functions of aircraft exist. Consequently, it is currently not possible to evaluate newly designed icons representing safety-critical aircraft system functions by comparing their icon characteristic ratings and agreement scores to previously established norms. For standard icons, many studies have already proposed icon characteristic and agreement norms [
18,
25,
38,
41]. However, these studies used a wide array of different icons/picture stimuli, with no research being conducted on aerospace-specific icons that would establish norms for icons used in this domain. As such, there is a need to develop standardized norms of icon characteristics and agreement metrics for the aviation domain, against which newly designed icons can be compared. The results obtained in this study can serve as a first step in establishing such norms.
Further, various studies can be found in the literature that investigate the use of icons on the flight deck, for example, on head up displays [
44,
45,
46]. The icons used in these studies, as well as the icons used on modern flight decks, prescribed for example by the DO-257A and DO-365 [
34,
35], could be used as icon sets to establish these norms.
7.3. Limitations and Next Steps
This study displays three main limitations. First, the fact that we consulted only one pilot in the design process and not several may have resulted in some less adequate icons in the icon set. This could have been avoided by asking for direct feedback from several other pilots. However, it was also somewhat intentional to have non-optimal designs in the icon set to create a greater variance in the ratings. Second, the discussed issue of the lack of appropriate norms to which the icons could be compared limits the significance of the results. Third, the relatively small sample sizes of 29 pilots and 18 icons limits the explanatory power of the findings. However, the adequacy of the sample size should be interpreted in terms of the size of the target population. Since the population of ATPL pilots tends to be small compared to the general population, one could argue that a sample size of 29 is adequate to represent the population of ATPL pilots. The same argumentation may be applied to the sample size of the icons. The number of plausible icons used to represent the intended functions is likely rather small, justifying the small sample size.
The icon set can now be used by other researchers, and potentially be validated further. For further evaluating the icons, additional performance data could be gathered within the operational environment in simulation studies, such as the study by Friedrich and Lieb [
13]. It would also be worth investigating whether the evaluation results differ between ATPL pilots and pilots of large RPAS, such as the Global Hawk [
47].
7.4. Implications
The findings of this study may well serve as a first step towards defining norms for icon characteristic and agreement metrics. They also provide relevant information for the design of icons representing safety-critical aircraft system functions, and reveal two central aspects that need to be taken into consideration for icon design. The use of shapes that remind the pilot of icons familiar from non-aviation domains may lead to confusion, and should thus be avoided. Further, reflecting only a part of an entity as a means to represent the whole entity (e.g., a piston to represent the hydraulic system) partially leads to a false understanding of the icon. This design approach should therefore be used with great caution.
Indeed, the icon ratings from this study can be used as a stepping stone for future developments in designing specific icon sets for remote pilots. Simply designing icons that are composed of good characteristics is only a single step within the design process. The icon set would then need to be considered against the complexity of how the user interacts with the system being designed, and the level of decision support (via automation/autonomy) that is provided. Adopting a cognitive engineering approach that understands how the individual would process icon information within the wider operational context would facilitate the wider design of the ground station. For example, presenting an icon as an important “anchor point”—from which the user may choose to interrogate or expand beyond the abstraction—that can sometimes be represented via a single icon [
48].
8. Conclusions
In summary, this study demonstrates how established principles can be applied to design and evaluate icons representing safety-critical aircraft system functions for which no standards or norms currently exist. While the findings from the literature on icon design could be confirmed, and emphasizing concreteness and familiarity in the design of the icons led to more unambiguous and meaningful icons, some relevant icon design guidelines were discovered. First, the use of domain-spanning shapes to increase familiarity should be avoided. Second, showing only a part of the intended system in order to represent the entire function should also be avoided. Third, the findings suggest that for evaluating safety-critical icons, concept agreement seems to be a more informative measure than name agreement or H. Another important aspect that the findings revealed is that for the creation of icon characteristic standards and norms, the norms should potentially be stricter for safety-critical icons, such as collision avoidance or fire warnings, than for icons representing less safety-critical functions.
The findings of this study may well serve as a first step towards defining norms for icon characteristic and agreement metrics, and they provide relevant information for the design of icons representing safety-critical aircraft system functions. The icon set can now be used by other researchers, and be further validated and developed.