1. Introduction
As automated vehicles (AVs) have become an integral part of modern traffic systems, their interaction with other road users has emerged as a pivotal area of research and development. Unlike human drivers, AVs cannot rely on implicit interactional cues (e.g., gaze and facial expression) nor on explicit, conventionalized non-verbal signals such as hand gestures, to make their intentions recognizable. This absence of non-verbal cues presents significant challenges in fostering trust and predictability during interactions with pedestrians, cyclists, and other drivers [
1,
2].
To address these challenges, extensive research has been conducted on external human–machine interfaces (eHMIs) for AVs [
3,
4]. The most common type of eHMI is visual, encompassing a wide range of designs and modalities. These may include intent signals that indicate whether the vehicle intends to stop or proceed [
5], often displayed through symbols or icons on dedicated screens. Other approaches involve anthropomorphic features such as eyes [
6,
7] or gaze-like indicators that convey recognition of vulnerable road users (VRUs) [
8,
9,
10], LED strips [
11,
12,
13], as well as ground projections [
14,
15,
16], textual messages [
4,
17,
18], or symbolic cues communicating the vehicle’s next action or advising pedestrians on how to behave [
19,
20]. The visual elements of these interfaces are typically positioned on the sides of the vehicle, on the windshield [
5], or on the hood or grille area [
4]. Ref. [
12] compared these positions but did not find any position superior for decision making.
In addition to visual modalities, auditory eHMIs have also been explored. These systems can provide spatial cues to indicate the vehicle’s position or direction of movement [
21], as well as verbal or non-verbal alerts. Auditory cues have also been found to enhance interactions, with engine sounds increasing VRU awareness and beeping sounds serving as alerts to other road users of approaching shuttles [
22]. Since 2021, electric vehicles have been required to emit artificial sounds that signal their presence, position, and speed to enhance pedestrian safety [
23]. Auditory cues have therefore become more common.
Several empirical studies have examined how road users perceive and interpret eHMI communication. Ref. [
24] explored the feasibility of using eye-tracking technology to assess the effectiveness of external human–machine interfaces (eHMI) on AVs in a field study. Their study also examines how pedestrians visually perceive and interpret communication cues from a vehicle. One of their conclusions was that many individuals seek eye contact with a driver. Ref. [
2] investigated how eHMI can facilitate interactions between AVs and pedestrians by conveying vehicle intent. They highlight the critical role of communication in ensuring mutual understanding of each other’s intent. Ref. [
25] explored VRUs’ experiences of sharing the road with slow automated shuttles, and through a survey found that participants wished to receive clearer information about the vehicle’s actions—such as whether it was turning or stopping—as well as confirmation that they had been detected. Ref. [
26] analyzed interface designs across three long-term automated shuttle projects and found that while eHMIs can improve subjective perceptions of safety, their objective impact is limited by the reduced ecological validity of both simulated and field studies, with simple light-based indicators proving most robust across contexts.
The most frequently studied scenario concerns facilitating interactions between AVs and VRUs—particularly pedestrians—who need to decide whether it is safe to cross in front of a vehicle whose intentions may otherwise be ambiguous [
8,
11,
27,
28].
Despite extensive research, there is currently no consensus on which eHMI design is most effective [
16,
29]. One limitation in the field is that most experiments rely on artificial setups or participants with little or no real-world experience of AVs. As highlighted by [
30] the ability to investigate eHMI based on vehicles already in public operation using participants familiar with the traffic provides valuable insights into how users interact with systems in realistic and familiar environments, thereby enhancing the relevance and reliability of the research.
Opportunity for such research exists in Linköping, Sweden, where automated shuttles have been operating on the campus of Linköping University for the past five years. These shuttles have become a regular feature of the campus traffic environment, allowing for naturalistic observations of how pedestrians and cyclists interact with them. Since their deployment in 2019, a substantial body of research has emerged focusing on these vehicles—both globally and locally.
Video-based studies have examined how cyclists and pedestrians interpret the intents of shuttles [
31] as well as long-term interactions between cyclists and automated shuttles [
32]. For instance, ref. [
33] analyzed cyclists’ behaviors when automated shuttles operated in Linköping campus bike lanes, revealing that cyclists often adjusted their trajectories and opted to use sidewalks in the presence of shuttles. Similarly, ref. [
31] conducted analyses of video recordings to examine how cyclists interact with the automated shuttles in everyday traffic. The study revealed that while cyclists expected the shuttles to adjust dynamically, for instance by slowing down to allow overtaking or deviating from their path—the vehicles instead responded with abrupt emergency braking. This mismatch exposed specific gaps in coordination, namely the shuttles’ limited capacity for continuous adjustment and their inability to communicate constraints on their movements.
Together, these findings underscore the importance of studying external communication systems in realistic, contextually rich environments where automated shuttles are already integrated into everyday traffic.
The present exploratory study therefore aims to examine how four divergent eHMI design concepts are perceived by road users external to automated shuttles, with the goal of enhancing communication towards improving traffic flow. For this purpose, a mix of individuals–both familiar and unfamiliar with the vehicles–was invited to evaluate the design concepts. In contrast to typical eHMI studies, this study was based on a digital twin of a real-world established shuttle service, which the participants encountered. A virtual reality (VR) simulation setup based on the real-world shuttle model was hence used for experiencing the different eHMI designs.
3. Methods
This study employed an explorative within-subjects design, with each participant experiencing all eHMI systems and the shuttles without an eHMI. Below we describe the participants and the procedure that uses the materials described previously. The study was approved by the Swedish Ethical Review Authority.
3.1. Participants
Participants who expressed interest in the study were selected to ensure diversity across all participant groups. This included adults of various ages, cyclists with different levels of experience, and individuals with varying familiarity with the shuttles.
The study included 28 participants, comprising 18 men and 10 women. Previous research [
45] has not suggested any gender-related differences in relation to experience of the shuttles. Consequently, participant selection was based on prior experience with shuttles, rather than a balanced gender distribution. Half of the participants (n = 14) had prior knowledge of the shuttles, as students, working in the area or living in Vallastaden, while the other half had not. The participants were aged between 18 and 72 years (M = 40.5, SD = 15.7). All individuals over the age of 18 who could ride a bicycle were eligible to participate. Recruitment was conducted through a Facebook advertisement. Participants selected received a registration link with detailed information about the study purpose and how to book a three-hour time slot.
The experiments were carried out every weekday over a three-week period. Participants included high school and university students, employees from Linköping and a neighboring town, as well as retirees. Rather than excluding individuals with visual impairments, the study also included participants who wore glasses and those predisposed to nausea. The only individuals excluded from participation were those who, via the recruitment questionnaire, reported having epilepsy or indicated that they were unable to wear a headset during the study.
3.2. Procedure
The study was carried out in situ at the VTI research institute. Participants received an introduction to the shuttle’s functionality. To ensure that all participants had a baseline experience of the shuttle’s current functionality, they were subjected to two situations while cycling: (A) overtaking the shuttle and (B) meeting the shuttle.
Each situation was practiced both on the campus area and in the simulator (see
Figure 4).
The order of these experiences was balanced, with half of the participants experiencing the real shuttle first and the other half starting with the simulator. After each experience, participants were asked to evaluate their impressions of the shuttle in its current state, without any eHMI.
After the baseline experience of the shuttle the participants were exposed to the four different eHMI design concepts while standing still and observing the shuttle in the VR simulator. Prior to interacting with the eHMI systems, a test leader provided participants with a brief explanation of the purpose of each system immediately before they experienced it. To minimize order effects, a fully counterbalanced sequence design was used, based on all 24 possible permutations of the four eHMIs.
The 28 participants were assigned these sequences so that each unique order was used at least once, and four sequences were randomly repeated for the remaining participants. Participants were allowed to revisit any system if they felt they had missed important details and were encouraged to ask questions regarding the system’s functionality. They were also given the opportunity to freely examine the eHMI systems, including inspecting specific features or details they regarded important. Most participants did not feel the need to revisit any system, and the total time in the simulator watching the eHMI concepts was usually between 5 and 8 min.
The XR-4 headset (from Varjo, Helsinki, Finland) equipped with built-in speakers, was used in the simulator for all participants except for six, who used the Varjo XR-3 when the XR-4 did not function. As the XR-3 headset lacks integrated speakers, neck headphones were used instead to provide a comparable auditory experience. In all other aspects relevant for this study, the headsets were equivalent.
All evaluations were conducted using the tool described in
Section 2.3. After exposure to the various eHMI design concepts, participants completed questionnaires addressing all eHMI systems. The questionnaires comprised both qualitative items, allowing for open-ended responses, and quantitative items, see
Section 2.3. The eHMI concepts were as noted above presented in a balanced order, and the questionnaires were completed in the same sequence, following the observation order. Finally, participants responded to five questions with 7-point Likert scales, regarding what they considered important when designing an eHMI system for automated shuttles. All participants were informed that honest feedback was sought and that there were no right or wrong responses. All responses were collected in Swedish and translated to English after analysis.
4. Results
One participant missed the questions about their overall impression of the eHMI, how easy it was to understand each eHMI, if they would like to have the eHMI on an automated shuttle and the free-text responses. Of the remaining 27 participants, 25 found the Text and 24 the Auditory Alert easy to understand. In comparison, only 6 of the participants rated the Purple Light as easy to understand, while 11 participants found the eHMI Eyes to be comprehensible. 26 participants answered that they would like to have the Text-eHMI. 15 answered the same for the Auditory Alert and 8, respectively, 7 participants answered so for Purple Light and Eyes.
A Friedman test showed a significant difference between the average ratings of the clear positive descriptors (cooperative, communicative, attentive, and predictable) of the five ways the bus was presented described previously, χ2(4) = 37.1, p < 0.001. The degree of agreement among participants was moderate as indicated by Kendall’s W = 0.33. Post hoc comparisons using the Durbin–Conover test with Bonferroni correction showed the following significantly higher ratings on the positive indicators: Auditory Alert over no eHMI (p = 0.01); Text over No eHMI (p = 0.01); Auditory Alert over Purple Light (p = 0.01); Text over Purple Light (p = 0.01); Auditory Alert over Eyes (p = 0.01); Text over Eyes (p = 0.01).
A Friedman test also showed a significant difference between the average ratings of the clear negative descriptors (dangerous, obstructive, and irritating) of the five ways the bus was presented, χ2(4) = 22.3, p < 0.001 but now with a lower degree of agreement among participants (Kendall’s W = 0.20). Post hoc comparisons using the Durbin–Conover test with Bonferroni correction showed the following significantly higher ratings on the negative indicators: No eHMI over Text (p = 0.03); Purple Light over Text (p = 0.01); Eyes over Text (p = 0.01); Auditory Alert over Text (p = 0.01).
To summarize, when rating positive indicators, the Auditory Alert and Text were preferred over the No eHMI condition, as well as over Purple Light and Eyes. For negative indicators, only Text received significantly lower negative ratings compared to the other eHMIs and no eHMI. The Auditory Alert, in contrast, elicited roughly the same number of negative impressions as the other eHMIs and no eHMI.
Participants were also asked to rate their overall impression of the eHMI (excluding the No eHMI condition; n = 27 due to missing data from one participant). Text (M = 4.56) and Auditory Alert (M = 4.11) received the most positive impressions from participants, whereas Eyes (M = 3.15) and Purple Light (M = 3.07) were rated lower. See
Figure 5, which shows violin plots illustrating the distribution of responses for overall impressions of the eHMIs (1 = negative impression, 5 = positive impression). Black squares indicate the mean values.
These results are consistent with the Durbin–Conover post hoc analyses reported above, showing that Text is the most preferred eHMI, while Auditory Alert is also preferred but with more mixed results compared to Text, whereas Purple Light and Eyes show clear mixed preferences among participants. Self-rated experience of riding bicycles overall and in urban areas showed no significant relationships to overall impression of the eHMIs. Spearman’s rank-order correlation showed a significant negative correlation between participants’ self-rated experience with the self-driving buses and their overall impressions of the auditory alert,
rs(25) = −0.397,
p = 0.041 This indicates that participants with more experience of the self-driving buses tended to rate the auditory alert slightly lower. Participants’ self-rated experience with the self-driving buses showed no correlation with the other eHMIs. This is in line with the positive but somewhat mixed impressions of the auditory alert reported above. This is further explored in the qualitative analysis in
Section 4.3.
Figure 6 presents the results from the Kansei Engineering-inspired study conducted as part of this study. The graph depicts the average values for each of the tested eHMI concepts, with each colored line representing a unique “emotional fingerprint” for its respective concept. Upon analyzing the graph, several notable conclusions can be drawn.
Two concepts stand out prominently: The eHMI Text and the Auditory Alert. To compare how participants experienced the different eHMIs in comparison to the buses without eHMI composite average scores of the clear positive (communicative, cooperative, attentive, and predictable) and negative attitudes (dangerous, obstructive, and irritating) to the bus were calculated.
Looking at the positive attitudes investigated, as presented above the Text-eHMI evoked more positive attitudes than the bus without an eHMI. The Auditory Alert also evoked more positive attitudes than the bus without an eHMI. A look at the emotional fingerprint (
Figure 6) suggests that communicativeness was what mostly accounted for the differences in positive attitudes towards the eHMIs in comparison to the buses without eHMI. A Friedman test showed that communicativeness was differently rated across the eHMIs
χ2(4) = 44.9,
p < 0.001. Post hoc comparisons using the Durbin–Conover test with Bonferroni correction showed significantly higher ratings of communicativeness for Auditory Alert and Text-eHMI in comparison pairwise to all other conditions (
p = 0.01 allover).
Text-eHMI as presented previously was the only eHMI that evoked fewer negative attitudes than the bus without an eHMI. Overall, the negative attitudes towards the buses persisted in being relatively low independent of the existence of an eHMI investigated in the study. Although, a Friedman test showed that how dangerous the bus was experienced differentiated between the eHMIs, χ2(4) = 12.8, p = 0.012. Specifically, using a Bonferroni corrected Durbin–Conover test, the shuttle with Purple Light was experienced as more dangerous than with Text-eHMI (p = 0.02). Predictability was also significantly rated differently across the eHMI, χ2(4) = 23.5, p < 0.001. Especially Eyes was experienced as less predictable than No eHMI (p = 0.01) and Text-eHMI (p = 0.01). So was also Purple Light in comparison to Text-eHMI (p = 0.01).
Further analysis of the collected data using Principal Component Analysis (PCA) with Varimax rotation reveals three distinct clusters of descriptors: Predictability, Endangerment, and Practicality. PCA is a statistical technique that reduces data complexity by identifying underlying components that explain the most variance. Varimax rotation enhances interpretability by maximizing the separation between these components, making clusters more distinct. In a Kansei Engineering context the PCA is primarily used to detect trends related to the above-mentioned dimensions of the Osgood’s Semantic Space. In this context, a relatively small number of user responses is typically sufficient [
43,
46]. This clustering provides deeper insights into participants’ perceptions. Notably, predictability plays a crucial role in shaping trust—participants tend to perceive the automated bus as safe and reliable when they also find it predictable.
A closer examination of the clusters (see
Figure 7) reveals that the descriptors “obstructive”, “dangerous”, and “irritating” form one group, directly opposing “cooperative” along the same principal component. This suggests a dichotomy in perception—participants tend to view the bus as either dangerous or cooperative, with little overlap. In contrast, “predictable” and “advanced” are positioned at a 90-degree angle to this dimension, indicating that perceptions of predictability are independent of the danger-cooperation axis. Meanwhile, descriptors like “communicative” and “attentive” form a third cluster positioned at a 45-degree angle, suggesting a semi-dependent relationship with both of the previously mentioned dimensions.
This reinforces the importance of predictability in fostering a sense of safety and trust in automated vehicle interfaces. By illustrating how perceptions of danger, functionality, and communication shape user experience, this analysis provides a more nuanced understanding of public acceptance of automated transport.
The following section presents a summary of 27 participants’ free-text responses for each eHMI, analyzed using what [
47] call codebook thematic analysis to identify recurring patterns and themes for the respective eHMI.
4.1. Purple Light
Many respondents reported that the lights were weak and difficult to see, particularly as they were presented in daylight. The lights were not clearly visible, neither at a distance nor at close range. “Difficult to see during the day; I would have liked to see the same test in an evening/night environment”. Several participants found the meaning conveyed by the lights to be unclear and not intuitive, which is illustrated by the following citations: “I do not understand what the purple light means”. “I did not think it was obvious what the purple lights were supposed to convey. It did not feel intuitive”.
4.2. Eyes
Participants thought that cyclists might have difficulty seeing the system’s “eyes”, particularly when approaching from behind, due to poor positioning and small size. “I couldn’t see the eyes as a cyclist”. Another opinion was that “It was hard to understand initially but might improve with experience”. Some found the system useful for conveying emotional feedback, but it required cyclists to adapt their behavior, which was not always clear or relevant. “It gave feedback on how the bus perceived the cyclist”. Some participants laughed and called it cute or sweet when it cried. It should also be taken into account that, in a university environment predominantly populated by students, some participants might deliberately attempt to trigger the ‘crying eyes’ display for amusement. Such behavior could have the opposite effect of that intended, leading to unnecessary braking events that may be uncomfortable for both the safety operator and the passengers. This was pointed out by two different participants while observing the eHMI.
4.3. Auditory Alert
The system was thought to be reassuring for cyclists and pedestrians, as auditory communication might help raise awareness of nearby automated shuttles. One respondent highlighted that it makes people realize when they are getting too close, suggesting that the sound is “clear and quite pleasant”. The system would need calibration to ensure the sound is noticeable without being disruptive, with concerns about its effectiveness in noisy settings or its potential to irritate nearby individuals. One participant pointed out that the alert was too late to be effective. “The sound came too late to be useful”. Some worry about the annoyance factor over time, and the possibility of cyclists not hearing it due to headphones was noted.
4.4. Text
Of the four concepts presented, participants found the eHMI Text most useful, pointing out that the system should be clear and easily recognizable, similar to the standard systems on regular buses. It should provide information of the shuttle’s destination and, ideally, include time indicators, which would enhance clarity. Familiarity with the area helps, but time indicators are still useful. There were also recommendations for signals about changes in speed or upcoming actions, so passengers can understand the shuttle’s manoeuvres.
Additionally, the participants recommended that simultaneous audio and text information were provided to ensure that all users can access the necessary details. In general, the system’s interface resembles that of regular buses, which increases comfort and trust in the vehicle.
In the final question, where participants were asked what they considered important in the design of an interaction system for automated shuttles (see
Figure 8), they responded that the system needs to be easy to understand, simple, clear, and predictable. In contrast, responses such as the system needs to be creative, innovative, exciting, and motivating were ranked lower. This is in line with previous responses and in favor of the Auditory Alert and Text-eHMIs. More on this in the discussion.
5. Discussion
The participants favored eHMI systems that were familiar and resembled features they had previously encountered in traffic. This preference may possibly partly stem from their ease of recognition, which enabled participants to understand the system’s planned movements effortlessly. The system’s clear and straightforward functionality, such as displaying the bus’s destination through text, could be another probable explanation. It was clear from the analyses that Text was clearly preferred both in comparison to the other eHMIs and to the bus without an eHMI. This aligns with the findings of [
17], who also reported higher preference and comprehensibility for text-based eHMI designs. Auditory Alert was also preferred in comparison to the Purple Light, Eyes and No eHMI designs but also yielded no difference in negative impressions compared to those. Participants with more experience with buses reported a slightly lower overall impression of the Auditory Alert. Qualitative analysis of the interviews showed that the Auditory Alert has some room for improvement in both the sound itself and its triggering mechanism. It was experienced as attentive but also irritating. Participants expressed a desire to hear the sound earlier, which would provide a preventive warning before they approached the bus closely enough for it to stop. This highlights the need for further exploration of sound concepts as eHMIs, particularly in enhancing their effectiveness and intuitiveness. Nevertheless, auditory communication as an eHMI shows potential, in line with [
21], even though the specific sound used in this study may not have been optimal.
The emotional fingerprint and statistical analysis showed that both Text and Auditory Alert were clearly perceived as the most communicative eHMIs. Important to acknowledge is that the Text- and Auditory Alert-eHMI were conceptually different from each other, for instance, in terms of perceptual modality and functional characteristics. This means that they are not necessarily competitors for the best future eHMI. A suggestion is that they have the potential to be used simultaneously, although this was not investigated in this study.
In our study, the eyes were difficult to interpret. It was unclear where they were looking, and their low level of detail made it challenging to recognize them as eyes. This likely contributed to the absence of the “cute” or empathetic effect that anthropomorphic eyes can provide [
7]. Previous research has shown that eyes on vehicles can serve different communicative functions: [
9] used eyes to establish eye contact with pedestrians, creating a sense of being seen, while [
6] used eyes to indicate the intended driving direction of the vehicle.
What participants reported they would like to see or not see on AVs does not necessarily represent the best alternatives. This type of study cannot replace longitudinal studies that examine effects over time, and where people encounter shuttles equipped with an eHMI in real traffic. However, it can provide an indication of the types of eHMIs that should be explored further. The participants found the systems they were familiar with to be more useful and easier to understand. It remains unclear whether this is due to the more innovative eHMI systems having a function that is too abstract or simply because the participants were not accustomed to them. Several participants also highlighted the importance of turn signals, a feature not included in the scenarios. For instance, one participant stated that if the shuttle had used its turn signal before leaving the stop and entering the bike lane, its behavior would have resembled that of a “regular bus”, making it easier for individuals to understand how to react appropriately.
We acknowledge that the sample size of 28 participants is relatively small. However, such sample sizes are not uncommon in previous Kansei Engineering studies. While the data collected may not yield statistically robust results, it is often sufficient to support meaningful insights during early-stage product development. The eHMI concepts evaluated in this study fall within that category, making the findings relevant and informative despite the limited sample size.
In this study, all participants experienced interactions with the shuttle while cycling in the real world, but the eHMI systems were observed exclusively with participants in their role as pedestrians in the VR environment. The systems were partially designed to function for other road users, such as drivers and bus operators who share the road with the shuttles. However, the systems are not necessarily intended to communicate with operators of larger vehicles. For instance, sound effects may be challenging to detect from within a noisy vehicle. Similarly, the eyes may be difficult to notice once the vehicle has already passed. Additionally, projected purple marking on the ground may be particularly hard to see for drivers of vehicles with bright headlights.
There was criticism regarding the purple light projection, and it was for instance regarded as more dangerous than the other eHMIs. Visibility would have been improved if the scenario had not taken place during daylight, as the projected light on the ground turned out to be difficult to discern in daylight conditions. However, it is not practical to employ different communication strategies based on weather and light conditions. The purple light indicates the area covered by the bus’s sensors, allowing other road users to avoid entering that zone. The higher the speed, the larger the programmed safety area gets. The light becomes progressively weaker as the safety zone increases in size. At the speed the bus travels along the bicycle lane (9 km/h), the sensor range in front of the bus is approximately 2 m. No lighting solution has yet been identified that can effectively illuminate this distance in daylight. At 13 km/h, the sensor range extends to approximately 4 m. According to [
48], visible light consists of wavelengths with varying detectability. Purple has the shortest wavelength and is the most difficult to perceive, particularly when competing with other light sources. This study therefore cannot establish if light projection is a suitable eHMI but it is nevertheless a challenging eHMI to implement. Ref. [
14] reported similar findings. They found that projected light was difficult to perceive, particularly from a distance, and suggested that it could potentially be more distracting than helpful. The projection drew attention away from assessing the vehicle’s speed and the surrounding traffic situation. These findings support our results, indicating that projected light as an eHMI not only faces practical visibility limitations but may also introduce cognitive distractions that could reduce safety rather than enhance it.
It is essential to establish a standard for potential eHMIs. For instance, if an eHMI with ground projection were to be used, one might reasonably assume that a vehicle without such a projection does not require maintaining a safe distance. However, this does not mean the vehicle is inherently safe. Therefore, it may be more prudent first to rely on other communication methods to manage interactions between VRUs and AVs. Several participants in the study highlighted that indicators and brake lights should be more visible and could serve as effective communication tools. Predictability, as noted from the PCA, was also important for people to perceive the shuttle as safe and trustworthy. Hence, increasing the predictability for VRUs to understand what to expect from the shuttle in the next few moments is crucial and would enhance interaction.
To determine whether a concept functions effectively in practice, longitudinal studies on the use of eHMI are necessary. These studies are crucial for understanding if the concept works, whether there is a learning curve, and how it facilitates interactions. However, such studies require a well-designed eHMI to ensure clear insights can be obtained. The timeline for this development remains uncertain. Furthermore, due to current laws and regulations in Sweden, it is currently impossible to test lighting or dynamic eHMI in traffic environments [
49].
Overall, functionality was rated higher than creativity by the participants. This is important to note since eHMI solutions should work for all the intended users, and an external HMI that functions for people who have difficulties learning new things would probably also be applicable to a larger group of people.