Standardized Test Procedure for External Human–Machine Interfaces of Automated Vehicles

: Research on external human–machine interfaces (eHMIs) has recently become a major area of interest in the ﬁeld of human factors research on automated driving. The broad variety of methodological approaches renders the current state of research inconclusive and comparisons between interface designs impossible. To date, there are no standardized test procedures to evaluate and compare di ﬀ erent design variants of eHMIs with each other and with interactions without eHMIs. This article presents a standardized test procedure that enables the e ﬀ ective usability evaluation of eHMI design solutions. First, the test procedure provides a methodological approach to deduce relevant use cases for the evaluation of an eHMI. In addition, we deﬁne speciﬁc usability requirements that must be fulﬁlled by an eHMI to be e ﬀ ective, e ﬃ cient, and satisfying. To prove whether an eHMI meets the deﬁned requirements, we have developed a test protocol for the empirical evaluation of an eHMI with a participant study. The article elucidates underlying considerations and details of the test protocol that serves as framework to measure the behavior and subjective evaluations of non-automated road users when interacting with automated vehicles in an experimental setting. The standardized test procedure provides a useful framework for researchers and practitioners.


Introduction
With the introduction of automated vehicles into mixed traffic environments, drivers may be (temporarily) allowed to engage in non-driving-related tasks while driving. As a consequence, the drivers of automated vehicles will often be unavailable for communication while their vehicle is interacting with non-automated road users. To face this change and to ensure safe interactions, there is a broad acceptance among practitioners and researchers that in some situations, automated vehicles may need to replace the informal communication of human drivers (such as gestures and eye contact) with external human-machine interfaces (eHMIs) [1,2]. Currently, eHMI systems represent a completely new and immature technology. Before introducing such a new technological system to the market and to the traffic environment, it is important to carefully determine its usability.
Since 2017, a large body of research has been investigating the impact of different eHMI approaches on the subjective evaluations and behavior of non-automated road users. Previously studied eHMI approaches basically differed with regard to the content of communication (e.g., maneuver intention, automation status, and request for action) [3] and concrete interface design solutions (e.g., the position and modality of the signal) [4][5][6][7][8]. Results are inconclusive regarding the benefit of using an eHMI to signal maneuver intentions of automated vehicles. In some studies, communicating the maneuver intention of the automated vehicle increased the subjective ratings of interaction partners in comparison to interactions without an eHMI [9][10][11][12]. In other studies, such eHMI concepts did not have any impact on pedestrians' perceived trust and safety [1] or even had a negative effect on pedestrians' workload during interactions with automated vehicles [13]. Moreover, it is still unclear whether communicating the vehicle's automation status with eHMI systems improves the subjective experiences of interaction partners. On the one hand, eHMI systems that signaled the automation status with light-emitting diode (LED) strips had a positive effect on pedestrians' emotional experience [11] and perceived safety [12] compared to interactions without an eHMI. On the other hand, other studies did not reveal an impact of communicating the automation status on pedestrians' perceived stress [14] and perceived safety [1], such as on cyclists' reported behavior [15]. Furthermore, previous studies have offered contradictory findings on the effect of eHMI signals on the behavioral decisions of non-automated road users. eHMI concepts that communicated the vehicle's intention to stop [10] or gave a concrete request for action ("Walk!" or "Ok") [16] increased pedestrians' willingness to cross the road in a shared space compared to interactions without an eHMI. In addition, two studies found that pedestrians needed less time to make their decision to cross or not cross the road with than without an eHMI [6,17]. However, the results of [14,18] revealed that pedestrians focused to a higher degree on vehicle speed and distance to the vehicle when making crossing decisions than on eHMI signals. Deb et al. [19] found that a verbal warning saying "safe to cross" shortened the time pedestrians needed to cross the street compared to no eHMI, while different visual eHMI concepts had no effect on crossing time.
Overall, although extensive human factors research has been carried out on eHMIs, a systematic understanding of the usability of different eHMI concepts is still lacking. Previous research has used very different methodological approaches and has had methodological limitations. Methodological limitations include a lack of behavioral measurements [8,9], small sample sizes [12,13], and vague result reports [4,20]. In some studies, participants evaluated the eHMI after they had received a thorough briefing and explanation of the signal meanings [9,11,12]. In other studies, participants reported their subjective ratings of the situation even though some of them had not even perceived the eHMI [1,14]. Another limitation pertains to the research environments used in previous studies. Commonly used methods such as the Wizard of Oz technique, virtual reality (VR) pedestrian simulators, and video or photo studies use only simplified behavioral measurements, resulting in limited external validity. For example, participants were instructed to simply report their behavior [15], to press a button [10], or to take only one step forward to indicate their intention to cross [14]. The outlined methodological differences and limitations render comparisons of different eHMI variants impossible. Therefore, results are inconclusive regarding the required content of communication (e.g., maneuver intention, automation status, and detection feedback), interface design requirements (e.g., modality, position, and text or symbols), the operational design domain for eHMIs (e.g., urban environment, crosswalks, and intersections), and the role of the interaction partner (e.g., pedestrian, cyclist, or manual driver).
Furthermore, most studies have not provided an explanation for the selection of the use case under investigation. The majority of studies have examined interactions in urban areas in a low speed range where communication was required to negotiate the right of way. The most frequently investigated use cases so far have been interactions with pedestrians at crosswalks [4,6,9,12,[17][18][19][21][22][23] or crossing situations with an ambiguous right of way, e.g., shared spaced or parking areas [1,8,[10][11][12][13][14]16,18,20,24,25]. While prior work has already developed frameworks to derive use cases to test the in-vehicle HMIs of automated driving systems [26][27][28][29], there has been very limited research on taxonomies for use cases of eHMIs [30].
To date, there are no standardized test procedures to assess the usability of eHMIs of automated vehicles. There is no consensus on relevant use cases, evaluation requirements, and proper experimental designs yet. To advance the development of eHMIs, there is a necessity to standardize the evaluation process of eHMIs. Standardized test procedures allow for reliable and meaningful conclusions and enable comparisons between different studies and interface designs. Standardized methods already exist for other research areas of traffic psychology, e.g., for the evaluation of the in-vehicle HMIs of vehicles with automated driving systems [31] or to measure the eyes-off-road time as an indicator of distraction potential when interacting with in-vehicle information systems [32,33]. In their review article on the current state of research on eHMIs, Rouchitsas and Alm [34] declared that the "standardization of relevant procedures is a fundamental requirement for effective interface evaluations and meaningful comparisons. Therefore, future conceptual and empirical work in the field should primarily be concerned with producing standardized procedures for evaluating and comparing different implementations"(p. 10). The present article provides a response to this request. We propose a newly developed methodological framework that standardizes the usability evaluation process of eHMIs. This standardized test procedure consists of three parts: 1.
Definition of relevant use cases: The selection of relevant use cases represents the basis for a test procedure to evaluate the usability of eHMIs. We developed a methodology to deduce relevant use cases for a given eHMI from an exhaustive set of all possible use cases.

2.
Definition of usability requirements: We define the usability requirements of an eHMI according to the International Organization for Standardization (ISO9241-11) [35]. Thus, to ensure the usability of an eHMI, it needs to be effective, efficient, and satisfying. To be able to evaluate whether an eHMI meets these requirements, we derived appropriate parameters and criteria for each requirement.

3.
Test protocol for empirical studies: The test protocol provides an experimental framework to empirically evaluate a given eHMI with a user study. We outline the methodological details of the test protocol, e.g., sample, test environment, and instruction.

Definition of Use Cases
Prior research has mainly focused on vehicle-pedestrian interactions at crosswalks or at ambiguous crossing points in urban environments at a low speed. However, this only represents a limited selection of the possible use cases of an eHMI. To evaluate the usability of an eHMI in a standardized way, it is important that study participants encounter the eHMI with a set of relevant use cases. Thus, the definition of relevant use cases is the core of each evaluation process [31], as it ensures that the test procedure generates meaningful and comparable results. Fuest, Sorokin, Bellem, and Bengler [30] published a taxonomy of traffic situations that intends to serve as a basis to assess the communication between automated vehicles and human road users. Their taxonomy provides an overview of attributes and associated value facets that are considered to influence implicit and explicit communication in traffic, e.g., the attribute "right of way" with the value facets automated vehicle, human road user, or undefined. To define a traffic situation, one can choose and combine attributes and value facets that are relevant for the research question at hand. The combination of all listed value facets results in 373,248 situations. The authors do not provide an instruction how to deduce relevant use cases. Furthermore, the taxonomy lacks attributes that specify the approach direction of the interaction partners and the currently executed driving maneuver of the automated vehicle. Therefore, we developed a new methodological approach to deduce relevant use cases for a given eHMI.
We used a multi-stage gradual methodological approach that claims to consider an exhaustive set of use cases of an eHMI. These use cases are subsequently reduced step-by-step by applying different filters. More specifically, the collection and combination of use cases and their specifications alternate with stepwise reductions of use cases based on redundancies and theoretical and practical considerations. Figure 1 illustrates an overview of the procedure of this approach.

Defining a Use Case of an eHMI
The basis of the approach was the definition of a use case of an eHMI. A use case of an eHMI is defined as a situation where an automated vehicle and at least one non-automated road user intend to "occupy the same region of space at the same time in the near future" [36]. This situation requires the interactive behavior of at least one involved road user to avoid a potential traffic conflict. Interactive behavior signifies that the road user adapts its initially planned behavior to the anticipated behavior of the other road user, e.g., by changing speed or trajectory. Traffic conflicts arise when "two or more road users approach each other in space and time to such an extent that a collision is imminent if their movements remain unchanged" [37]. The use of eHMIs as communication aids of automated vehicles can potentially support non-automated road users in understanding and anticipating the interactive behavior of the automated vehicle. From this, the users can draw conclusions for their own interactive behavior.

System-Based Approach
The system-based approach was used to collect all possible driving maneuvers that an automated vehicle can execute. Driving maneuvers were divided into lateral and longitudinal maneuvers. Lateral maneuvers consist of driving straight ahead, turning (left, right), and changing the lane (left, right). When the vehicle is in motion, longitudinal maneuvers are keeping a constant speed, decelerating, and accelerating. When at a standstill, longitudinal maneuvers are keeping a constant speed (0 km/h), starting to drive forward, and reversing. Filter A (Figure 1) reduced the number of collected maneuvers based on the assumption that an eHMI should be only used in situations in which it adds benefit to conventional lighting. Consequently, all lateral maneuvers and the reversing maneuver were filtered, as they can be signaled by the turn signal and the reversing light. In principle, acceleration, deceleration, and keeping a constant speed can be perceived by other road users by observing the automated vehicle. However, these cues are often very subtle, and eHMIs could support the perception by signaling these maneuver intentions prior to action execution. In conclusion, the resulting maneuvers are keeping a constant speed (while driving or at standstill), accelerating (while driving or at standstill), and decelerating (while driving).

Generic Situation-based Approach
The generic situation-based approach considered all factors that characterize interactions between traffic participants. This approach is generic because it does not consider the context in which a situation takes place, e.g., urban context, highway, intersection, or parking area. The first factor represents the intended moving direction of the interaction partner, which may be in the opposite direction to the automated vehicle, at a crossing angle to the automated vehicle, laterally approaching the automated vehicle in the same direction, and driving in the same direction as the automated vehicle (see first row of Figure 2). The second factor represents the position of the interaction partner relative to the automated vehicle (see second row of Figure 2). A combination of these two factors leads to certain combinations that would never result in traffic conflicts between the two traffic participants (see definition of eHMI use cases), e.g., when the interaction partner is located next to the automated vehicle while driving in the opposite direction. Filter B (Figures 1 and 2) was used to reduce those combinations that cannot lead to traffic conflicts. The remaining combinations represent situations that would result in traffic conflicts without the interactive behavior of at least one involved road user (see third row of Figure 2). We hypothesized that the driving direction of the interaction partner (left or right) and the exact start position of the interaction partner in a merging situation do not lead to relevant differences between the resulting use cases. These redundant situations are indicated by blue boxes in Figure 2. Thus, Filter C (Figures 1 and 2) filtered out these redundant situations. The resulting three generic situations are shown in the bottom row of Figure 2: The interaction partner approaches the automated vehicle frontally, orthogonally from the side, or merges in front of the automated vehicle with a lateral approach direction. Figure 3 illustrates possible ways to implement these three situations in a driving simulation with a cyclist as the interaction partner.

Combination of Maneuvers and Situations: Context-Independent Use Cases
In the next step, the remaining maneuvers of the automated vehicle and generic situations were combined (see Figures 1 and 4). The resulting context-independent use cases are illustrated in Figure 4. For example, the interaction partner could approach the automated vehicle orthogonally from the side while the automated vehicle keeps a constant speed.

Collection of Situation-Specific Factors
In order to ensure an exhaustive set of use cases, we collected a set of all situation-specific factors that could potentially influence an interaction between the automated vehicle and its interaction partner. Following the procedure by Fuest et al. [30], we assigned value facets to the collected factors. Table 1 presents the collected situation-specific factors and their value facets. Filter D was used to reduce certain value facets or complete factors (filtered factors and value facets are marked by 2 in Table 1). This reduction was based on the guideline that the use cases should be used to evaluate the usability of an eHMI. Accordingly, if we expected that a certain factor and/or its corresponding value facets would not lead to different requirements for an eHMI, they were not further considered. The following paragraph elucidates the collected situation-specific factors and the application of Filter D.
The type of road can be either urban, rural, or a highway [38]. We assumed that the usability of an eHMI would not differ depending on the type of road on which an interaction partner experiences the system. For example, a certain eHMI signal should have the same usability during a merging maneuver regardless of whether the maneuver takes place on an urban or rural road. Independent of the type of road, an eHMI must be able to communicate whether it is letting the interaction partner merge or whether he/she must brake and merge behind the vehicle. Furthermore, use cases can take place in different traffic environments, such as at intersections, in parking areas, or somewhere on the road. It was hypothesized that the system perception and interpretation and, thus, the requirements for an eHMI will not change depending on the traffic environment. Regardless of communicating at an intersection or on the road, the eHMI must signal if the automated vehicle will let the interaction partner cross or not. Thus, by applying Filter D, the factors of the type of road and traffic environment were not further considered as situation-specific factors. Table 1. Situation-specific factors, their value facets, and the application of Filter D.

Situation-Specific Factor Value Facets
Type of road Distance between automated vehicle and interaction partner at beginning of interaction X meters 1 These factors are based on the taxonomy by Fuest et al. [30]. 2 These factors and value facets are filtered by Filter D. 3 These value facets were combined by Filter D.
The right of way can be either assigned to the automated vehicle (e.g., green traffic light), to the interaction partner (e.g., crosswalk), or can be undefined [30]. To test the usability of an eHMI, the eHMI should be the only mean that influences the interaction between the automated vehicle and the non-automated road user. Thus, we decided to filter those value facets in which clear traffic rules determine the right of way for one of the interaction partners. The use cases for the eHMI test procedure should take place in a traffic environment without right-of-way rules.
The type of interaction partner can be either motorized vehicles (cars, powered two-wheelers, and trucks) or non-motorized vulnerable road users ((VRUs) such as pedestrians and cyclists). To date, there have been no studies that systematically compare the impact of eHMI signals on the interaction of automated vehicles with different types of road users. Interactions with motorized vehicles are usually more dynamic (higher velocities) than with VRUs. The drivers of motorized vehicles often have another visual perspective on the automated vehicle than VRUs. However, prior research and technical developments have suggested that automated vehicles will primarily use vehicle-to-X (V2X) technology to communicate with manual car drivers. With this technology, automated vehicles can send messages directly to the in-vehicle displays of manually-driven vehicles, e.g., about their intent, their willingness to cooperate, or requests of cooperative behavior of the human driver [39]. Thus, with V2X communication, automated vehicles do not necessarily need an eHMI to communicate with the human drivers of manually-driven vehicles. Furthermore, unsuccessful interactions usually have more severe consequences for VRUs than for the drivers of motorized vehicles. Compared to pedestrians, interactions with cyclists are evaluated to be more critical because they move at higher speeds, and, thus, interactions evolve more dynamically [15]. These differences might lead to different requirements for an eHMI when the automated vehicle interacts with different types of interaction partners. Principally, the use cases should be experienced from the perspective of a manual car driver (as the most common representative of a motorized interaction partner), as well as from the perspective of a cyclist (as the worst-case representative of a VRU). However, due to V2X technology as another communication aid between automated and manually-driven vehicles, we recommend to primarily focus on use cases with VRUs as interaction partners.
The automation level represents a further potentially relevant factor. The categorization published by the Society of Automotive Engineers defines six automation levels [40]. Driving on automation levels 0-2 does not represent a use case of an eHMI, as the human driver is responsible to monitor the driving environment and must remain attentive. Thus, the driver him-or herself can still communicate with other road users. On automation levels 3-5, the driver is allowed to engage in non-driving-related tasks as soon as the automated driving system is activated. The system makes decisions about upcoming driving maneuvers and could communicate these to other road users via an eHMI. In general, levels 3-5 can be considered as a single use case because the requirements for an eHMI do not differ. In comparison to levels 4 and 5, however, an automated driving system at level 3 could potentially hand over control to the driver during an interaction situation when a system limit is reached. A so-called take over situation results in the additional requirement that the interaction partner needs to understand that the previous eHMI signal might be no longer be valid once the driver has taken control. As a consequence, a takeover situation during an interaction with an automated vehicle at level 3 should be considered as an additional special use case.
Visibility conditions might influence the perceptibility of an eHMI and, thus, might lead to different requirements of an eHMI. However, these requirements rather relate to the pure visibility of eHMI signals in different visibility conditions than to the usability of the system. Thus, Filter D neglects different visibility conditions. Use cases to test the usability of an eHMI should take place under normal visibility conditions.
As additional factors, the speed of both interaction partners determines how fast an interaction builds up and develops. This might lead to different requirements of the eHMI with regard to the degree of detail and the required velocity of communication. For example, it is conceivable that the communication with eHMI signals must be faster when the driving speed is higher. Additionally, a more detailed eHMI signal could be more useful at a low speed than at a high speed. The speed of the automated vehicle at the beginning of an interaction depends on the used automated driving system and its operational design domain. According to the taxonomy by Fuest et al. [30], 0 km/h represents a vehicle at a standstill, 30 km/h is considered as a low speed range, 50 km/h is considered as an urban speed range, and 130 km/h is the permissible maximum speed in most European countries. These different speeds of the automated vehicle should be considered as use cases within the scope of the operational design domain of the respective automated driving system. The speed of the interaction partner at the beginning of an interaction depends on the type of interaction partner. Motorized vehicles could theoretically approach the automated vehicle at many different speeds. We assumed an average speed of 4.4 km/h for pedestrians [30,41] and 17.5 km/h for cyclists [30,42]. Additionally, speeds of 0 km/h (interaction partner at standstill), 30 km/h (low speed range), 50 km/h (urban speed range), and 130 km/h (maximum speed) should be considered. The reduction of these value facets depends on the type of interaction partner.
The distance between automated vehicle and interaction partner at beginning of interaction depends on their current speed. Based on the initial speed of both interaction partners, the prerequisite that both interaction partners should theoretically arrive at their "virtual crossing" point at the same time (see Figure 3) and a certain predefined time for the interaction partner to perceive and interpret the eHMI to make a behavioral decision, and to execute an action, one can calculate the distance of the interaction partners at the beginning of the interaction. For example, the use case represents the situation in which the interaction partner (cyclist = 17.5 km/h) approaches the automated vehicle (low speed range of 30 km/h) frontally. If we assume a time interval of 4 s, the cyclist drives 12.25 m and the vehicle drives 20.75 m in this time until they reach the virtual crossing point. Thus, the total initial distance must be 33 m. Based on this procedure, the distance does not represent an independent factor but results from other factors.

Combination of Context-Independent Use Cases and Situation-Specific Factors
In a next step, the context-independent use cases and remaining situation-specific factors were combined ( Figure 1). However, there were still 864 possible combinations to deduce use cases. Filter E deleted implausible use cases from the full use-case set ( Figure 1). This reduction was based on an analysis of realistic and unrealistic combinations of the type of interaction partner, speed, situation, and maneuver of the automated vehicle. For example, when the automated vehicle is at standstill, it can only remain at standstill or start from standstill (lower part of Figure 4). A deceleration maneuver is not possible. Other examples are realistic speeds for the three situations (upper part of Figure 4). An initial speed of 130 km/h for those situations in which the interaction partner approaches the automated vehicle frontally or orthogonally is not realistic for any of the interaction partners.

Deduction of Relevant Use Cases
In the last step, Filter F serves to select those use cases that are relevant for testing the usability requirements defined in Section 2.2 with the eHMI and the automated driving system under investigation. For example, we would like to test the usability of an eHMI of an "urban pilot" with the following specifications: The operational design domain of the system is in urban areas with a speed range between 0 and 30 km/h. If the system detects another road user within a radius of 60 meters, it will not accelerate due to safety reasons. Furthermore, the eHMI signal for keeping a constant speed is the same when the vehicle is at standstill or is moving. When deducing the relevant test cases from the use case set, these specifications further reduce the number of relevant test cases. Filter F can be applied to test different eHMI variants of automated driving systems with varying specifications.
The advantage of this methodological approach is that it provides a reproducible and clear procedure to select relevant use cases to test the usability of any given eHMI. The present set of use cases represents all scenarios that are relevant to test the usability of eHMIs during interactions with automated vehicles. It needs to be noticed that controllability or misuse tests might need different procedures for reducing and selecting relevant use cases. Furthermore, it must be emphasized that this method can and will not cover all conceivable use cases and situations-in particular, sound adaptations will be required for corner cases. Accordingly, researchers and practitioners who want to use this method will have to take care when they apply this method, thus extending and strengthening its validity.

Usability Requirements, Parameters, and Criteria
Prior research on eHMIs has not yet provided consensus on specific requirements for the usability of eHMIs. For the evaluation of the in-vehicle HMIs of automated driving systems, the National Highway Traffic Safety Administration (NHTSA) has defined minimum requirements that must be fulfilled by an HMI [43]. However, there are no published standardized requirements to assess the usability of eHMIs.
In order to define evaluation requirements, it is important to recall the initial considerations for the development of eHMIs. There were concerns that interactions between automated vehicles and other road users could result in difficulties and dangerous situations because the driver/passenger will not be available for informal communication [18]. Therefore, automated vehicles must ensure safe and efficient interactions with other road users [3]. The implementation of eHMIs is one possible way to support non-automated road users during interactions with automated vehicles. An alternative or complementary approach is to informally communicate driving behavior and intentions to other road users by developing appropriate driving strategies of automated vehicles [44,45]. In order to justify the implementation of an eHMI, it must have advantages for interaction partners compared to automated vehicles without an eHMI. At least, it should not deteriorate the quality of interaction. Thus, the basic requirement for an eHMI is its usability. According to the usability definition by ISO 9241-11 [35], the usability of a system is determined by its effectiveness, efficiency, and satisfaction. To be effective, an eHMI must support the non-automated road user in choosing an accurate behavioral decision during interactions with automated vehicles. An eHMI improves the interaction partner's efficiency if it has a positive effect on the time and mental effort required for a successful interaction. To be satisfying, the interaction partner must perceive the use of the eHMI as pleasant. This is relevant to facilitate its use and acceptance. As such, we defined three usability requirements for an eHMI: 1.
The eHMI must be effective.

2.
The eHMI must be efficient.
The test procedure needs to differentiate between eHMIs that fulfill or do not fulfill these requirements. To decide whether a certain eHMI meets the defined requirements, it is necessary to define parameters for each requirement. These parameters are used to make the respective usability requirement measurable. The following paragraphs define specific parameters for each usability requirement (effectiveness, efficiency, and satisfaction) and propose methods for how to assess these parameters. To finally decide whether an eHMI is compliant with the respective requirement, it is necessary to define a pass/fail criterion for each parameter. In sum, only when an eHMI passes the specified criteria of all parameters per requirement does it fulfil the specific usability requirement as a whole.
Such parameters can be assessed by behavioral or self-reported measures. Behavioral measures can indicate if and how fast the interaction partner is able to understand the eHMI signal and if they are able to deduce correct behavioral decisions. However, there is a certain guess probability that the interaction partner makes the correct decision by chance (e.g., to either continue driving or to brake/stop). Furthermore, the driving behavior of the automated vehicle serves as an additional indicator for the interaction partner to make an appropriate behavioral decision. Thus, correct behavioral decisions of the interaction partner cannot be exclusively explained by their correct understanding of the eHMI signal. Additionally, self-reported measures are necessary to assess whether the interaction partner understands the eHMI signal correctly or not. On the other hand, self-reported measures alone would be insufficient because it must be ensured that a correct system understanding leads to correct behavior. Therefore, we propose a combination of both behavioral and self-reported measures.
Compared to an interaction without an eHMI, an eHMI should improve the effectiveness and efficiency of an interaction. At least, it should not deteriorate the interaction. To assess this difference between interactions with and without an eHMI, a baseline condition without an eHMI is required.
With this methodological approach, relative criteria can be used to assess the effectiveness and efficiency of an eHMI. However, certain parameters require an absolute instead of a relative criterion. For example, an eHMI should completely prevent the safety-critical behavior of interaction partners. Thus, the investigation should not focus on the question of whether there are less safety-critical situations with than without an eHMI. Instead, it is most important that there are no safety-critical situations with an eHMI at all (absolute criterion). Additionally, to evaluate the satisfaction with an eHMI, absolute criteria appear to be more appropriate than relative criteria.

Parameters and Criteria to Prove the Effectiveness of an eHMI
The effectiveness of an eHMI can be assessed by the parameters system comprehension and the correctness of behavioral decision. To measure system comprehension without giving participants the possibility to additionally consider the observed driving behavior of the automated vehicle as a confounding factor, we propose the occlusion method (see Section 2.3.2 for a detailed explanation). After the view on the automated vehicle has been occluded, participants need to answer the open-ended question "What will the automated vehicle do next?" The experimenter categorizes the answer as either correct or incorrect. The occlusion method does not allow for a comparison with the baseline condition because the screen is blanked before participants can deduce the vehicle's intention from its driving behavior. An absolute criterion can be used to evaluate the system comprehension. We propose a criterion of 85% correct answers for each use case. The appropriate indicators to assess the correctness of the behavioral decision depend on the driving maneuver of the automated vehicle in the respective use case. When the automated vehicle decelerates, the correctness of the behavioral decision can be measured by the minimal speed of the interaction partner during the interaction. The eHMI can be considered as being effective if the interaction partners reduce their speed to a significantly lower extent with an eHMI than without an eHMI (relative criterion). No or only slight reductions of speed would demonstrate that the eHMI supported interaction partners in predicting the unobserved behavior of the automated vehicle prior to real time. When the automated vehicle keeps a constant speed or accelerates, the interaction partner must reduce his or her speed or wait to prevent a safety-critical situation. Continued driving or walking represent incorrect behavioral decisions. However, the correctness of the behavioral decision should be assessed by an absolute criterion with a pass-fail logic. The relevant criterion is the resulting minimum distance between the automated vehicle and the interaction partner. A minimum distance that falls below one meter can be considered as a safety-critical distance. Following the guidelines of the RESPONSE Code of Practice [46], 20 of 20 participants need to pass the defined criterion to support the assumption that 85% of the population would also pass the criterion.

Parameters and Criteria to Prove the Efficiency of an eHMI
To measure the efficiency of an eHMI, we propose the parameters mental workload, time to cross, and visual attention. Mental workload can be assessed by a self-reported measure. After each interaction, the participant answers the question "How high was your mental workload during the interaction with the automated vehicle?" on a 7-point Likert scale ranging from very low to very high. Using a relative criterion, the mental workload should be significantly lower with than without an eHMI. To measure if the eHMI supported the efficiency of the interaction in a timely manner, the time between the first visual contact with the automated vehicle and the crossing of the virtual crossing point (see Figure 3) can be compared with and without the eHMI. The time to cross should be significantly shorter with than without the eHMI (relative criterion). To determine whether the eHMI improved the efficiency of the interaction with regard to the required visual attention, the proportion of visual attention towards the automated vehicle during the interaction should be significantly lower with than without the eHMI (relative criterion). Visual attention can be measured by eye tracking, head tracking, or by video coding.

Parameters and Criteria to Prove the Satisfaction with an eHMI
The satisfaction with the eHMI can be determined by the parameters satisfaction, attitude toward use, behavioral intention, and preference. All parameters are measured by items after participants have encountered all use cases with an eHMI. Table 2 includes a list of proposed items and the respective scales. All parameters use an absolute criterion. With regard to satisfaction, attitude toward use, and behavioral intention, at least 85% of all participants must choose a positive judgement (ratings between 5 and 7 on a 7-point Likert scale). To assess the preference, participants need to decide whether they would prefer interactions with automated vehicles with or without an eHMI in the future. To pass the relative criterion, a significantly higher proportion of participants must prefer future interaction with an eHMI to interactions without an eHMI.  (4), very satisfied (7) Self-formulated Attitude toward use The interaction with the system is a wise idea.

Behavioral intention
Given that I had access to such signals when interacting with automated vehicles, I predict that I would use them.

Preference
In the future, would you prefer to interact with automated vehicles with or without signals? Binary scale: with; without Self-formulated 1 Item adapted from [47].
The proposed requirements, parameters, and criteria contribute to the standardization of test procedures for evaluating the usability of eHMIs. Together with the definition of use cases, these standardized requirements form the basis for reliable eHMI evaluations and allow for meaningful comparisons between different eHMI variants and the results of different studies. Overall, this contribution will support the definition of design requirements for optimal interface specifications. The selection of the parameters can be adapted to the respective research questions and selected use cases.

Test Protocol
To evaluate the usability of an eHMI, it is important that users interact with the system in a standardized manner. We developed a test protocol for the empirical evaluation of eHMIs with a user study. The test protocol provides a proper experimental design to systematically investigate the usability of eHMIs. The objective is to prove whether a certain eHMI meets the usability requirements defined in Section 2.2. For this purpose, the test protocol defines a methodological procedure to observe and measure users' behavior and subjective evaluations during specified use cases and experimental conditions. The following sections elucidate the methodological details of the test protocol and underlying considerations.

Test Environment
The test environment must allow for controlled, standardized, and economic testing in a safe environment. At the same time, participants should encounter realistic scenarios to guarantee external validity. Furthermore, it is important that the parameters defined in Section 2.2. can be measured. Thus, the test environment must enable behavioral measurements, the observation of participants' behavior and the communication between experimenter and participants for interim questions. Additionally, a realistic implementation of an eHMI is important. Prior research mainly used methods such as VR pedestrian simulators with head-mounted displays [7], desktop computers to demonstrate photos or videos [23], or the Wizard of Oz technique [14]. These test environments often do not enable the dynamic development of interactions. This leads to limitations of external validity, limited use case selection, and limited possibilities to measure behavioral data. We recommend the use of high-fidelity driving simulators to investigate interactions with motorized interaction partners (cars, trucks, powered-two wheelers) or VRUs (cyclists). The chosen simulator should include a realistic mock-up; active intervention options for braking, accelerating, and steering; and the possibility to implement the eHMI. To investigate interactions with pedestrians, VR pedestrian simulators remain the most suitable test environment. However, it is important that the pedestrian simulator provides enough of a physical environment to enable dynamic interactions and possibilities to measure dynamic pedestrian behavior, e.g., by using a motion suit [6].

Procedure and Instruction
The procedure of the test protocol is shown in Figure 5. The instruction informs participants that the study investigates interactions between automated vehicles and manual drivers/cyclists/pedestrians. They are told that automated driving systems perform the entire dynamic driving task, at least in a specific operational design domain. Thus, the car driver can perform tasks other than driving. Furthermore, participants are informed that the experimental drive will take place on a simulated test track without right of way rules. The latter information is very important to ensure ambiguous interaction situations. The instruction at the beginning of the study does not include any information about the eHMI. After a short familiarization with the respective simulator (about 5 min) without any interactions with an automated vehicle, participants go through a learning period. They already encounter all use cases in which they interact with automated vehicles that use the tested eHMI. The learning period serves as the opportunity to learn to associate the eHMI signals with the subsequently executed driving maneuver of the automated vehicle. After the leaning period, a short interview is conducted. The experimenter asks the following questions:

1.
Did you notice anything while interacting with the automated vehicle? 2.
Did you see the signals of the automated vehicle? Please describe the signals.

3.
What was the meaning of the signals?
The participants' answers to these questions indicate the perceptibility and visibility of the eHMI signals. Furthermore, the questions serve to assess a first, global understanding of the eHMI. Independent of the answers of the respective participant, the experimenter explains at the end of the interview that the study aims to investigate signals that automated vehicles use to communicate with other road users. This information is important to achieve a common basis for all participants for the subsequent test blocks. The experimenter emphasizes to not being able to give any advice or help during the experimental drive as the objective is to investigate whether the signals are comprehensive and helpful.
Thereupon, participants either first experience the test block with the eHMI (Test Block 1) or without the eHMI (Test Block 2). The sequence of the test blocks should be counterbalanced to control for transition and learning effects. Test Block 1 consists of three parts that should be encountered in the same recommended sequence (see Figure 5). In Test Block 1a, participants encounter all use cases with the eHMI while behavioral data (driving data and visual attention) are constantly recorded. Additionally, participants verbally indicate their mental workload after each interaction (see Section 2.2). The scale to measure mental workload should be located somewhere in the simulator that is visible to the participants. In Test Block 1b, the occlusion test block serves to measure system comprehension. Therefore, participants experience all use cases once again. With the occlusion method, the simulation screen is blanked during each interaction at predefined points in time. This method was adapted from [48] and intends to achieve an open outcome of the situation. The screen should be blanked when the eHMI already signaled the subsequent intention (or communication content in general) but before the automated vehicle has already started to execute the signaled maneuver. After some seconds (e.g., 5 s), the screen shows the last scene again while the automated vehicle has been removed in the meantime. To prevent simulation sickness, it is recommended to automatically brake down the participant to a standstill while the screen is blanked. The suggested open-ended question "What will the automated vehicle do next?" can be adapted to the communication content of the tested eHMI. After the occlusion test block, participants answer a survey that includes different items to measure satisfaction with the eHMI (see Table 2, except for the preference item). In Test Block 2, participants encounter all use cases without the eHMI while behavioral data are recorded and they indicate their mental workload after each interaction. At the end of the study, participants finally evaluate their preference for future interactions with automated vehicles with or without an eHMI. After each test block, participants have the opportunity to take a break. At the end of the experiment, the experimenter thoroughly debriefs the participant.
To control for transition effects between the different use cases, it is recommended to permutate the sequence of the use cases to three different sequences. Thus, the use cases of each test block (1a, 1b, and 2) are encountered in different sequences (A, B, and C). Furthermore, a certain test block should not be experienced in the same sequence by all participants, e.g., each participant experiences Test Block 2 in sequence C. Therefore, the different sequences of use cases should be additionally counterbalanced between the three test blocks. The sequence of the use cases in the learning period can be the same for all participants. In conclusion, the outlined considerations require an equal division of the participants in six different groups. Table 3 shows an exemplary experimental design with six different experimental groups that differ according to the sequence of Test Blocks 1 and 2 and the sequence of use cases in the different test blocks.

Sample
To deduce reliable conclusions from the experimental data, the sample size should be sufficiently large. In reference to RESPONSE [46], at least 20 test persons should take part in the study. The target population of persons who will interact with automated vehicles in the future is very broad. Accordingly, people of all ages, nationalities, educational levels, body heights, and so forth should be eligible for studies that test the effects of eHMIs. To achieve a representative age distribution, NHTSA [43] proposed different age groups of n = 5 each, 18-24, 25-39, 40-54, and older than 54 years. Beyond these age groups, it is important to examine the effects of eHMIs on children's behavior and comprehension [21]. Dependent on the interaction partner under investigation, participants may need to fulfill further specific prerequisites. For example, participants in a bicycle simulator study should ride a bike on a regular basis, and participants in a driving simulator study should hold a driver's license.

Discussion
Due to a great variety of methodological approaches and methodological limitations, the current state of research on the usability of eHMIs does not allow to draw general conclusions. The standardization of test procedures is, thus, a fundamental prerequisite to effectively evaluate and compare different eHMI design variants. Therefore, the aim of the present article was to outline a standardized test procedure that allows for the systematic investigation of the usability of eHMIs. We have proposed a methodological framework that consists of a method to deduce relevant use cases, a definition of specific usability requirements and appropriate parameters, and a test protocol for the empirical evaluation of an eHMI.
The definition of relevant use cases provides the basis of the test procedure to ensure meaningful and comparable results. To make reliable conclusions, the usability of an eHMI must be proved in use cases previously defined as relevant. Prior studies on eHMIs have often used only one randomly selected use case. The proposed multi-stage gradual methodological approach presented in this article claims to consider all theoretically possible use cases of an eHMI. Using a variety of theoretical and practical considerations, the approach finally results in a set of use cases that are relevant to evaluate the usability of an eHMI. The intersection scenario represents the use case that has been studied most often in previous work on eHMIs [1,[3][4][5]7,8,[11][12][13][14][15][16][17][18][19][20][21][22][23]. To the best of our knowledge, there has only been one study that used a narrow area as a use case of an eHMI so far [49], and there has been no study that has examined a merging scenario. Thus, the approach to define use cases provides new perspectives for future research on eHMIs. Researchers can easily apply the proposed procedure to select use cases for the eHMI and automated driving system under investigation. All stages and filters before Filter F can be taken as default. Therefore, the selection process can be entered at Filter F. At this point, users can select those use cases that are relevant for the eHMI and automated driving system at hand. A potential limitation of the presented methodological approach to select use cases is that it only considers use cases in which automated vehicles interact with one non-automated road user. In principle, an eHMI that exclusively communicates information about the automated vehicle, such as its status and intentions, should always have the same usability, independent from the number of non-automated road users with which it is currently interacting. With this content of communication, it is not relevant whether only one pedestrian or three pedestrians and two cyclists need to understand the meaning of the eHMI signal and, thus, make decisions about their subsequent behavior. However, if an eHMI directly addresses its message to a specific road user, interactions with more than one non-automated road user quickly become very complex and require an extended approach to deduce use cases. For example, many previous studies have examined eHMI signals that tell pedestrians to "walk," "go ahead," or "don't walk" [8,16,22], that project green arrows [8] or crosswalks [17] on the road surface in front of the vehicle or show a green pedestrian in the windscreen [22]. If another traffic participant feels addressed by such an eHMI signal that was initially directed to another road user, the situation can become very critical. Therefore, we highly recommend not to use eHMI signals that ask a particular road user to take any specific action. As a result, the methodological approach presented in this paper provides an appropriate tool to deduce use cases for eHMIs that communicate information about the automated vehicle itself rather than communicating requests for action to other road users.
The definition of evaluation requirements constitutes an additional prerequisite to standardize the evaluation process of eHMIs. Following the ISO definition of usability [35], we derived three requirements: An eHMI must render the communication of automated vehicles with non-automated road-users effective, efficient, and satisfying. By defining specific parameters and criteria for each usability requirement, the test procedure can differentiate between eHMIs that fulfill or do not fulfill these requirements. Further work is necessary to evaluate the discriminatory power of the proposed parameters. It might be possible that some parameters can better differentiate between eHMIs that meet or do not meet the requirements than others. With increasing experience based on future empirical studies, the specific measurement methods of the parameters can be adapted and extended, e.g., the selection of appropriate items to measure the satisfaction parameters. The parameters could be supplemented by further parameters and the criteria could be adapted if necessary. For example, following the controllability guidelines of the RESPONSE code of practice [46], it is also justified to aim at a system comprehension rate of 100%. The guideline requires that 20 of 20 participants pass the predefined criterion and give the correct answer. However, it must be emphasized that the proposed requirements, parameters, and criteria focus on the usability testing of eHMIs. To prove the controllability of eHMIs, the test procedure needs to be adapted. Nevertheless, part of our test procedure already addresses controllability testing, as the criterion for the minimal distance to the automated vehicle has a pass-fail logic and does not even allow for one fail event in 20 subjects.
The test protocol provides a proper experimental design to systematically evaluate eHMI variants with user studies in a standardized way. The test protocol provides several advantages. First, results of studies that are conducted in accordance with the test protocol allow for reliable conclusions regarding whether the tested eHMI can fulfill the defined usability requirements. Second, the results of different studies that followed the test protocol allow for comparisons between the tested interface designs. Thus, the test protocol constitutes a basis to derive optimal interface specifications based on comparisons of different studies. Another major advantage of the test protocol is that it enables the measurement of an eHMI's usability without confounding factors. As there are no right of way rules and the sequence of the test blocks with and without eHMIs is counterbalanced, different behavioral decisions of the interaction partners in the different test blocks can be explained by the usability of the tested eHMI. Similarly, the occlusion method ensures that comprehension measurements are also essentially based on the comprehensibility of the eHMI. To compare two or more eHMI variants with each other, the test protocol can be adapted and extended. Test Block 1 can be repeated with an additional eHMI variant with the same group of subjects as a repeated-measures design. However, it is very important to always compare participants' behavior with an eHMI with their behavior during interactions without an eHMI in Test Block 2. Moreover, the inclusion of a further test block requires the permutation of the three test blocks and a random distribution of the participants to the resulting sequences. Alternatively, the test protocol allows for the comparison of different eHMI variants that were examined in different studies with different samples. As prerequisites, the samples must be comparable and the studies must select the same use cases. The next step is the application of the test procedure for the usability evaluation of different eHMI design variants and automated driving systems with different specifications. With increasing experience, the method can be iteratively refined and improved. In turn, the standardized evaluation procedure will become a valuable tool for the scientific and technical community. The standardized test procedure can serve as a basis to establish best practices in the field of communication between automated vehicles and non-automated road users.