How Do eHMIs Affect Pedestrians’ Crossing Behavior? A Study Using a Head-Mounted Display Combined with a Motion Suit

: In future tra ﬃ c, automated vehicles may be equipped with external human-machine interfaces (eHMIs) that can communicate with pedestrians. Previous research suggests that, during ﬁrst encounters, pedestrians regard text-based eHMIs as clearer than light-based eHMIs. However, in much of the previous research, pedestrians were asked to imagine crossing the road, and unable or not allowed to do so. We investigated the e ﬀ ects of eHMIs on participants’ crossing behavior. Twenty-four participants were immersed in a virtual urban environment using a head-mounted display coupled to a motion-tracking suit. We manipulated the approaching vehicles’ behavior (yielding, nonyielding) and eHMI type (None, Text, Front Brake Lights). Participants could cross the road whenever they felt safe enough to do so. The results showed that forward walking velocities, as recorded at the pelvis, were, on average, higher when an eHMI was present compared to no eHMI if the vehicle yielded. In nonyielding conditions, participants showed a slight forward motion and refrained from crossing. An analysis of participants’ thorax angle indicated rotation towards the approaching vehicles and subsequent rotation towards the crossing path. It is concluded that results obtained via a setup in which participants can cross the road are similar to results from survey studies, with eHMIs yielding a higher crossing intention compared to no eHMI. The motion suit allows investigating pedestrian behaviors related to bodily attention and hesitation.


Introduction
Worldwide, 22% of traffic fatalities concern pedestrians [1], and over 90% of car accidents are attributable to driver error [2]. Automated vehicles (AVs) could reduce the number of accidents substantially. However, the implementation of AVs will happen gradually over several decades (e.g., [3]), which means that AVs and conventional vehicles will likely share the same roads.
In this type of mixed traffic, it may be unclear to pedestrians and other road users whether an approaching vehicle is driven manually or driving automatically. Pedestrians sometimes rely on cues from the driver, such as hand gestures and eye contact [4,5], and the absence thereof could debilitate pedestrian safety. External communication devices (i.e., external human-machine interfaces, eHMIs) on AVs might be a suitable replacement for the communication signals of drivers to pedestrians and aid pedestrians in perceiving the intention of the vehicle. 2

of 18
The literature on AV communication and the efficacy of eHMIs is rapidly expanding. Researchers have investigated the efficacy of eHMIs as communicative replacements for humans by means of computer surveys, lab experiments, and field tests.
Fridman et al. [6] conducted an online survey in which they presented 30 eHMI concepts and asked respondents whether they thought it was safe to cross the road based on the information provided by the eHMI. They found that text displays stating "Walk" or "Don't Walk" were considered clear, whereas green or red headlights were regarded as relatively ambiguous. Chang et al. [7] conducted a survey in which they tested five types of eHMIs, including a text display, lights, and a projection. In this study as well, participants found a text display to be the easiest to interpret. In Ackermann et al. [8], participants rated an LED light strip as highly unambiguous compared to eHMIs in the form of a display or projection that provided textual or symbolic advice. An online survey conducted by Bazilinskyy et al. [9] showed that participants tended to apply an egocentric perspective, in the sense that an eHMI with the text "Walk" or "Don't walk" was regarded as clearer and more persuasive than "Will Stop" or "Won't stop". Other online or lab-based surveys about the effectiveness of eHMIs have been conducted by Deb et al. [10], Hagenzieker et al. [11], Dey et al. [12], and Zhang et al. [13]. Typically, in surveys, the participant is shown a picture, an animation, or a recorded video from an on-road setting after which questions are asked, such as whether the participant would feel safe to cross in front of the car. An advantage of surveys is that they allow for a high number of repetitions with variations of eHMIs. A disadvantage, however, is that they require participants to imagine how they would act or feel, an approach that may have limited validity.
Others have used lab setups that immerse the participant in a traffic scenario [14][15][16][17]. De Clercq et al. [18] investigated the effect of eHMIs on the crossing intentions of pedestrians in a virtual reality environment presented via a head-mounted display (HMD). The participants stood on a curb and watched a platoon of oncoming AVs, which were devoid of or equipped with one of four eHMIs. The presence of an eHMI, indicating whether the AV would stop or not, significantly increased participants' perceived safety compared to a situation in which an eHMI was absent. A text-based eHMI was found to be the clearest overall. Using a similar setup, Ackermans [19] found that an eHMI consisting of a light animation made participants feel safer to cross compared to no eHMI. Weber et al. [20], also using an HMD, found benefits of eHMIs in terms of correct recognition rate of the vehicle's intention as well as response times. Due to their high visual field of view, HMDs offer a higher level of perceptual fidelity than survey studies. In De Clercq et al. [18] and Ackermans [19], participants were tasked to press a button when they would intend to cross or recognize the approaching vehicle's intention. In the on-road study by Walker et al. [17], participants used a physical box with a slider to indicate their willingness to cross, whereas in a simulator study by Mahadevan et al. [14], a virtual slider was used. The button and slider approach both allow performing analyses on how the crossing intention varies as a function of the distance between the pedestrian and the approaching vehicle. Although this method yields insights that questionnaires are unable to offer, the participants' behavior is measured in a subjective binary (i.e., button press/not pressed) or continuous manner (e.g., across the range of the slider), rather than an objective manner (e.g., forward gait).
Yet another method of evaluating eHMIs is to perform a field test on an actual road that is closed off from traffic. Clamann et al. [21] let participants stand on a curb while a van with eHMI was approaching. The authors extracted measures such as the moment participants turned their face to the vehicle, and the moment participants began to cross the road. The results showed no significant effect of the type and presence of the eHMI. Similar results were obtained by Palmeiro et al. [22]. They found that the presence of a sign "self-driving" did not significantly affect participants' critical gap times, that is, the last moment participants felt safe to cross. A limitation of both Clamann et al. [21] and Palmeiro et al. [22] is that participants were not permitted to step onto the road because of ethical and safety reasons; in the former study participants were physically constrained by a rope around their waist, whereas in the latter participants were asked to step back when they did not feel safe to cross anymore. Furthermore, field studies like these face a challenge of repeatability, as there tend to Information 2019, 10, 386 3 of 18 be fluctuations in the speed of the oncoming car, as well as in weather conditions [22]. Additionally, there are coding/timing challenges because pedestrian behavior is extracted from video recordings and needs to be synchronized with the AV's GPS signal.
In an overview paper, Cefkin et al. [23] described multiple research methods to examine eHMIs for highly automated vehicles used in the Renault-Nissan-Mitsubishi Alliance Innovation Lab. Among the methods were observations of stop intersections and on-road tests using a Wizard-of-Oz AV in public environments with naïve pedestrians (see also [24]). Although these approaches arguably have the highest possible level of fidelity, as pedestrians are exposed to AVs in a naturalistic manner, they too are affected by several limitations. One disadvantage is that the state of the pedestrian and AV (e.g., speed, distance) are unknown, and data need to be obtained from annotations of video recordings or interviews after the encounter. Furthermore, results vary depending on traffic conditions. Cefkin et al. [23] pointed out that in one of their studies, pedestrian traffic was limited and, therefore, only a small number of AV-pedestrian encounters could be recorded.
Summarizing, previous research on the efficacy of eHMIs has various strengths and weaknesses. One weakness is that researchers had to rely on imagined rather than actual crossing behavior. In this respect, the works of Deb et al. [10], Feldstein et al. [25], Lee et al. [26], and Schmidt et al. [16] provide promising leads. In Deb et al. [10], participants walked with a head-mounted display while encountering different types of eHMIs (e.g., an upraised hand, a colored beacon, an image of a pedestrian). Based on analyses of video recordings of the participants' walking behavior, Deb et al. [10] classified the participants' behaviors as hesitation, confusion, and stopping. A similar approach was used by Lee et al. [26] and by Schmidt et al. [16]. In Lee et al. [26], participants wore an HMD, and the dependent variable was whether participants crossed before the AV started to decelerate, during deceleration of the AV, or when the AV had stopped. However, Deb et al. [10], Lee et al. [26] and Schmidt et al. [16] did not measure the pedestrians' motion in a quantitative manner, for example, in terms of walking speed as a function of elapsed time.
We investigated whether results from survey-based and HMD-based methods replicate when using a more realistic experimental setup in which participants could cross the road. We used two eHMI concepts from De Clercq et al. [18] which corresponded to the extremities of ambiguity. That is, we selected a textual eHMI as research shows that text (i.e., "walk"/"don't walk") is generally regarded as clear and unambiguous (see, e.g., [6]). Furthermore, we selected front brake lights, a concept which is regarded as ambiguous because it may be unclear to the pedestrian whether he or she should apply an egocentric perspective (i.e., a green light on the AV means that the pedestrian can cross) or an exocentric perspective (i.e., a green light on the AV means that the vehicle will continue driving) (e.g., [9,13]).
We conducted the experiments in a virtual environment where the participants were immersed using an HMD and where a schematic representation of the participants' body (i.e., an 'avatar') was present, generated via a motion suit. This type of simulation resembles the setup of Feldstein et al. [25,27] who developed a pedestrian simulator where participants could move around freely and motions were recorded through a marker-based system. Furthermore, in Feldstein et al. [25,27], a schematic representation of the body was present in the virtual environment. We sought to examine how the eHMIs affect participants' crossing behavior as measured using the motion suit signals. We expected that the text-based eHMI would be regarded as more persuasive than the front-brake light eHMI, which, in turn, was expected to evoke higher pedestrian responsiveness than no eHMI at all. In addition to this replicative aim, we extracted information about body posture while crossing. Finally, we examined whether our setup, in which participants were able to step onto the road, would yield a higher level for self-reported realism compared to a similar setup in which participants remain static in the environment.

Participants
Twenty-four participants (six females, 18 males) with a mean age of 25.4 years (SD = 2.5, min = 21, max = 30) partook in the study. Participants were recruited among students, PhD candidates, and postdocs at the faculty of Mechanical, Maritime and Materials Engineering of the TU Delft. In response to the question how often the participant commuted to work or school by foot in the last 12 months, five respondents reported "never", six reported "less than once a month", four reported "once a month to once a week", four reported "1 to 3 days a week", one reported "4 to 6 days a week", and four reported "daily". Participants' nationalities were as follows: 15 Dutch, two German, two Chinese, one British and Turkish, one American, one Spanish, one Indian, and one Ukrainian. No incentive was offered and people were allowed to participate regardless of the driving side in their country of origin. All participants were living in the Netherlands. The study was approved by the Human Research Ethics Committee of the TU Delft, and each participant provided written informed consent before the start of the experiment.

Experimental Design
Participants were immersed in a virtual environment, similar to the one used in De Clercq et al. [18], via a head-mounted display. The participant was standing on a curb in front of a zebra crossing at a two-way urban road, as shown in Figure 1. In each trial, a platoon containing five cars would come driving around a corner to the far left of the participant, pass the participant, and then turn left around a corner to the right of the participant. The third car in the platoon was the stimulus vehicle. Two types of vehicles were used, namely a Smart Fortwo (small vehicle) and a Ford F150 (large vehicle). The occurrence of these types was randomized. The design of the research was within-subject, consisting of three independent variables. Twenty-four participants (six females, 18 males) with a mean age of 25.4 years (SD = 2.5, min = 21, max = 30) partook in the study. Participants were recruited among students, PhD candidates, and postdocs at the faculty of Mechanical, Maritime and Materials Engineering of TU Delft. In response to the question how often the participant commuted to work or school by foot in the last 12 months, five respondents reported "never", six reported "less than once a month", four reported "once a month to once a week", four reported "1 to 3 days a week", one reported "4 to 6 days a week", and four reported "daily". Participants' nationalities were as follows: 15 Dutch, two German, two Chinese, one British and Turkish, one American, one Spanish, one Indian, and one Ukrainian. No incentive was offered and people were allowed to participate regardless of the driving side of their country of origin. All participants were living in the Netherlands. The study was approved by the Human Research Ethics Committee of the TU Delft, and each participant provided written informed consent before the start of the experiment.

Experimental Design
Participants were immersed in a virtual environment, similar to the one used in De Clercq et al. [18], via a head-mounted display. The participant was standing on a curb in front of a zebra crossing at a two-way urban road, as shown in Figure 1. In each trial, a platoon containing five cars would come driving around a corner to the far left of the participant, pass the participant, and then turn left around a corner to the right of the participant. The third car in the platoon was the stimulus vehicle. Two types of vehicles were used, namely a Smart Fortwo (small vehicle) and a Ford F150 (large vehicle). The occurrence of these types was randomized. The design of the research was withinsubject, consisting of three independent variables. The first independent variable was the type of eHMI placed on the front of the stimulus vehicle, consisting of three levels: (1) No eHMI, (2) Front Brake Lights (FBL), and (3) Text eHMI, as depicted in Table 1. The eHMIs were the same as the experiment of De Clercq et al. [18]. In our experiment, the presence of eHMIs on all other cars (i.e., the non-stimulus cars) was randomized.
The distance between the front of the stimulus vehicle and the back of the second vehicle was the second independent variable, which was varied between 20, 30, and 40 meters. The distances between the other cars (i.e., first and second, third and fourth, fourth and fifth) were approximately 13 meters. The first independent variable was the type of eHMI placed on the front of the stimulus vehicle, consisting of three levels: (1) No eHMI, (2) Front Brake Lights (FBL), and (3) Text eHMI, as depicted in Table 1. The eHMIs were the same as the experiment of De Clercq et al. [18]. In our experiment, the presence of eHMIs on all other cars (i.e., the non-stimulus cars) was randomized.     The yielding behavior of the cars was the third independent variable. The cars had a speed of 50 km/h while approaching the participant. In the yielding conditions, the stimulus car would start braking with a deceleration of 3.5 m/s 2 at approximately 30 meters from the zebra crossing and halt Note: For the No eHMI (Baseline) condition, the vehicles looked the same in their yielding and nonyielding state. The distance between the front of the stimulus vehicle and the back of the second vehicle was the second independent variable, which was varied between 20, 30, and 40 meters. The distances between the other cars (i.e., first and second, third and fourth, fourth and fifth) were approximately 13 meters.
The yielding behavior of the cars was the third independent variable. The cars had a speed of 50 km/h while approaching the participant. In the yielding conditions, the stimulus car would start braking with a deceleration of 3.5 m/s 2 at approximately 30 meters from the zebra crossing and halt 4 meters before the zebra crossing, as can be seen in Figure 2. If an eHMI was present on the stimulus vehicle, it would change state upon braking. The fourth and the fifth car would start yielding at 40 and 50 meters from the zebra crossing, respectively, and come to a stop a few meters from one another behind the third car. After standing still for 5 seconds, the cars would pick up speed again, drive past the participant, and turn left around the corner.

No eHMI, yielding and nonyielding
Note: For the No eHMI (Baseline) condition, the vehicles looked the same in their yielding and nonyielding state.
The yielding behavior of the cars was the third independent variable. The cars had a speed of 50 km/h while approaching the participant. In the yielding conditions, the stimulus car would start braking with a deceleration of 3.5 m/s 2 at approximately 30 meters from the zebra crossing and halt four meters before the zebra crossing, as can be seen in Figure 2. If an eHMI was present on the stimulus vehicle, it would change state upon braking. The fourth and the fifth car would start yielding at 40 and 50 meters from the zebra crossing and come to a stop a few meters from one another behind the third car. After standing still for five seconds, the cars would pick up speed again, drive past the participant, and turn left around the corner. Each participant completed a total of 18 trials. Each trial consisted of a unique combination of the independent variables (3 eHMI conditions × 3 gap distances × 2 yielding conditions). The order in which the combinations were presented to the participants was randomized and unique for each participant. A video of trials where cars did and did not yield is available through the link in the supplementary materials. The experiment lasted about 20 minutes per participant. Each participant completed a total of 18 trials. Each trial consisted of a unique combination of the independent variables (3 eHMI conditions × 3 gap distances × 2 yielding conditions). The order in which the combinations were presented to the participants was randomized and unique for each participant. A video of trials where cars did and did not yield is available through the link in the Supplementary Materials. The experiment lasted about 20 minutes per participant.

Participant's Task
Participants were instructed to cross the road once they felt it was safe enough to do so. They were informed that the first two cars in each of the platoons would never yield and were told not to cross in front of these two cars. Hence, the only crossing opportunity for participants was between the second car in the platoon and the stimulus vehicle or after all cars had passed. Participants were not instructed or trained about the meaning of the eHMIs. Due to space limitations of the physical environment, participants were instructed to walk to the third zebra stripe in the virtual environment. If standing on the third zebra stripe in the virtual environment, participants had one meter of unobstructed space as a safety margin around them in the physical environment.

Materials and Equipment
The experiment was run on a desktop computer with an Intel Core i7-6700 CPU (@3.4 GHz) processor, 16 GB RAM, MSI H110M Pro-D (MS-7996) motherboard, NVIDIA GeForce GTX 1070 4 GB graphics card, and a Windows 10 Pro 64-bit operating system. The participants' motion was recorded using a wireless set-up of the Xsens Link Motion Tracking Device (Enschede, The Netherlands) in combination with version 0.3b of MVN Analyze [28]. The recorded accelerations, of head, thorax, pelvis, and extremities were integrated to estimate full-body motion. The motion data were transferred from the MVN software to an avatar in the virtual environment that was built using Unity version 5.5.0f3 64-bit. The scripts and avatar used in Unity were developed by Xsens and obtained via the Unity Asset Store. The wireless transmitting device of the Xsens sent its data via an Asus RT-AC68U router to the desktop. An Oculus Rift CV1 was used to visually and audibly immerse the participant in the virtual environment. A 1-meter extension of the HDMI and USB cables of the Oculus Rift was made using a DeLOCK 1.4 HDMI and a DeLOCK USB 3.0 extension cable.

Procedure
Participants provided written informed consent before the start of the experiment. After being briefed about the goal of the experiment, namely "to investigate whether crossing intentions of human pedestrians can be detected from body motion", the participants completed a questionnaire containing demographic questions and statements about pedestrians and motorists from Papadimitriou et al. [29]. Following the questionnaire, participants were familiarized with the Oculus Rift and Xsens. After putting the Xsens onto the participant and when a successful calibration was obtained, participants put on the Oculus Rift and were allowed to familiarize themselves within the virtual environment for a few minutes.
After the familiarization, the experiment was initiated. After each trial, participants verbally indicated their discomfort using the single-item misery scale (MISC). They also indicated their feeling of fear and their ability to predict the behavior of the oncoming cars on a scale from 1 (i.e., strongly disagree) to 10 (i.e., strongly agree). If participants indicated a MISC rating of 4 or higher, the experiment would be paused or aborted. Once all trials were completed, a final questionnaire was administered to measure the fidelity of the experimental environment through the use of the Virtual Reality Presence Questionnaire (VRPQ) of Witmer et al. [30].

Dependent Variables
A total of six objective and subjective measures were analyzed. The first objective dependent variable was the participants' forward (i.e., towards the zebra crossing) gait velocity as a function of elapsed time. The forward gait velocity was extracted from the pelvic sensor of the Xsens. Studies on affective body language show that stimuli of negative valence reduce gait velocity compared to when participants were confronted with stimuli of positive valence [31,32]. We used gait velocity as an index of safety perception, where we assumed that situations that were perceived by participants as ambiguous or unsafe would result in lower average gait velocity compared to situations that were perceived as unambiguous or safe.
Furthermore, using the position data from the same pelvic sensor, we computed the second dependent variable, namely the time at which the participants left the curb (Moment of Leaving Curb; MLC). In line with the above, we expected participants to leave the curb earlier when they perceived a situation that was unambiguous/safe compared to situations that were perceived to be ambiguous/unsafe.
We derived the participants' thorax angles from the T8 sensor of the Xsens as the third dependent variable. The thorax angle is expressed relative to the axis towards the zebra crossing. We expected participants to rotate their upper body earlier (i.e., to initiate forward motion) when confronted with a situation that was safer.
After each trial, the experimenter inquired the participant's wellbeing through a Misery Scale (MISC) rating. Participants reported their MISC ratings by naming an integer value between 1 and 12. The value 1 reflected that participants experienced no problems, 2 slight discomfort, 3 and 4 slight and mild nausea, and 5 and higher indicated more severe symptoms of sickness.
Furthermore, participants responded to the inquiry about whether they experienced a feeling of fear when considering crossing the road by stating an integer value between 1 and 10. The value 1 represented "strongly disagree" and 10 "strongly agree".
Lastly, participants responded to the inquiry whether it was difficult to predict the behavior of the oncoming vehicles by reporting an integer between 1 and 10. The value 1 represented "strongly disagree" and 10 "strongly agree".

Data Reduction
The motion data of the participants were recorded both in MVN Analyze and Unity. The MVN motion data were recorded at a frequency of 240 Hz, while the Unity recordings varied based on the rendering speed during the trial, and was, on average, above 40 Hz for each participant. The MVN data were filtered before they were exported to .mvnx format using the HD processor of MVN Analyze. In order to synchronize the position of the participants in the virtual environment recorded in Unity to their position recorded using MVN Analyze, the Unity motion data were interpolated to a frequency of 240 Hz. Next, the two datasets were cross-correlated to compensate for any time delay. After the cross-correlation, the MVN data were low-pass filtered with a zero-phase 10th order Butterworth filter using a cut-off frequency of 8 Hz. According to Schreven et al. [33], the optimal cut-off frequency for filtering human motion data is about 8 Hz. By means of the two filters (i.e., the one during exporting and the one in Matlab), we ensured that the signal was not contaminated with high-frequency sensor noise, while still capturing rapid limb motions of the participant.
We tested the effect of the presence of eHMIs on the participants' forward gait velocity, MLC, and their thorax angle. We performed paired-sample t-tests to compare the forward gait velocities between the eHMIs conditions at every time sample, as well as the thorax angles at every time sample. This approach has been inspired by Manhattan plots in molecular genetics research [34]. In a Manhattan plot, the x-axis shows the location on a chromosome and the y-axis depicts the common logarithm of the p-value. For the statistical tests depicted in the Manhattan plot, we used a significance level of 0.005 [35].
Despite the stringent alpha value used in the Manhattan plots, the results from t-tests per time sample should be interpreted with some caution due to a risk of false positives. Therefore, we also performed paired t-tests for a single key performance indicator: the moment participants' left the curb (MLC). Here, we applied a Bonferonni corrected significance level of 0.05/3 = 0.017.
Additionally, we tested the effect of the eHMIs on the subjective responses of the participants per condition through paired-sample t-tests and investigated whether learning behavior occurred through linear tests of within-subject contrast. Lastly, we compared the subjective responses of our participants to the VRPQ to the responses of the participants in De Clercq et al. [18] through two-sample t-tests to investigate potential differences in subjective presence between their experimental methodology and ours.
Because the results of t-tests may be affected by outliers, we repeated the analysis using non-parametric signed-rank tests. The results for these tests, which did not alter our conclusions compared to the t-tests, can be found in the Supplementary Materials.

Data Quality Assessment
The condition "40 meters, yielding, No eHMI" was presented to each participant as the condition "30 meters, yielding No eHMI" due to an error in our script. Subsequently, no comparison could be made for conditions "40 meters, yielding, Text eHMI" and "40 meters, yielding, FBL". The data of condition "40 meters, yielding, No eHMI" were, therefore, removed from the analysis. Furthermore, in 23 of the total of 216 trials in which the cars did not yield it was not possible to correlate the data recorded in MVN Analyze to the data recorded in Unity because the participants hardly moved during those trials.
Per yielding condition, and for each inter-vehicle distance and eHMI type, we examined whether participants left the curb too early (i.e., before the second car had passed). All participants left the curb during yielding conditions. However, if participants left the curb too early for a particular combination of inter-vehicle distance and eHMI type, their data were excluded for every eHMI of that yielding condition and inter-vehicle distance. For example, when participants crossed too early during the "20 meters, yielding, No eHMI condition", their data for "20 meters, yielding, Text" and "20 meters, yielding, and FBL" were also excluded. Figure 3 shows the mean forward gait velocities of participants during the conditions "20 meters, yielding" and "20 meters, nonyielding". It can be seen that participants initiated forward motion already before the second car had passed and the third car started braking. If the car yielded, the presence of an eHMI on the third vehicle stimulated pedestrians to start already crossing as soon as the second car was passing, whereas in the condition where no eHMI was present they waited longer. A significant difference was observed for a duration of 1.03 seconds between the Text eHMI and No eHMI. Furthermore, a significant difference was found for 0.45 seconds, and between FBL and No eHMI. For the nonyielding trials, no significant differences were found between the eHMI conditions.

Data Quality Assessment
The condition "40 meters, yielding, No eHMI" was presented to each participant as the condition "30 meters, yielding No eHMI" due to an error in our script. Subsequently, no comparison could be made for conditions "40 meters, yielding, Text eHMI" and "40 meters, yielding, FBL". The data of condition "40 meters, yielding, No eHMI" was, therefore, removed from the analysis. Furthermore, in 23 of the total of 216 trials in which the cars did not yield it was not possible to correlate the data recorded in MVN Analyze to the data recorded in Unity because the participants hardly moved during those trials.
Per yielding condition, and for each inter-vehicle distance and eHMI type, we examined whether participants left the curb too early (i.e., before the second car had passed). All participants left the curb during yielding conditions. However, if participants left the curb too early for a particular combination of inter-vehicle distance and eHMI type, their data were excluded for every eHMI of that yielding condition and inter-vehicle distance. For example, when participants crossed too early during the "20 meters, yielding, No eHMI condition", their data of "20 meters, yielding, Text" and "20 meters, yielding, and FBL" were also excluded. The number of participants included in the analysis therefore differed, as indicated in the figure legends. Figure 3 shows the mean forward gait velocities of participants during the conditions "20 meters, yielding" and "20 meters, nonyielding". It can be seen that participants initiated forward motion already before the second car had passed and the third car started braking. If the car yielded, the presence of an eHMI on the third vehicle stimulated pedestrians to start already crossing as soon as the second car was passing, whereas in the condition where no eHMI was present they waited longer. A significant difference was observed for a duration of 1.03 seconds between the Text eHMI and No eHMI. Furthermore, a significant difference was found for 0.45 seconds between FBL and No eHMI. For the nonyielding conditions, no significant differences were found between the forward velocities in the presence or absence of eHMIs. In case the cars did not yield, participants' forward velocities increased, similar to the yielding condition before the second car had passed in front of them. However, once the second car passed, the average forward velocities decreased to about zero, that is, participants halted their forward motion.

Thirty Meters Condition
Similar to the 20 meters condition, also in the 30-meters condition participants started walking before the second car had passed and the third car started braking, as shown in Figure 4. However, once the second car had passed and the third car advanced further, participants' mean forward velocity in the No eHMI and FBL yielding conditions did not increase further, whereas in the Text yielding condition their forward speed did increase.
nonyielding". Bottom: p-values from paired-sample t-tests. None = No eHMI, Text = Text eHMI, FBL = Front Brake Lights. t = 0 is when the third vehicle in the platoon started braking in the yielding conditions or was at the same point of braking in the nonyielding conditions. In case the cars did not yield, participants' forward velocities increased, similar to the yielding condition before the second car had passed in front of them. However, once the second car passed, the forward velocities became negative, that is, participants stepped back.

Thirty Meters Condition
Similar to the 20 meters condition, participants started walking before the second car had passed and the third car started braking, as shown in Figure 4. However, once the second car had passed and the third car advanced further, participants' mean forward velocity in the No eHMI and FBL yielding conditions did not increase further, whereas in the Text yielding condition their forward speed did increase. For the nonyielding conditions, participants' velocity decreased once the second car had passed, which again reflects the participants their inability to cross. For the yielding condition, significant differences were observed for a 0.30-second period between the Text eHMI and FBL. For the nonyielding conditions, no significant differences were observed in forward velocity.
A depiction of separate forward velocity curves for each participant for both the 20 meters and 30 meters yielding condition can be found in Figure S5 and Figure S6.

Moment Leaving Curb (MLC)
In the condition "20 meters, yielding", significant differences in MLC were observed between when no eHMI was present and when either a Text eHMI or FBL was present. For the condition "30 meters, yielding", no significant differences were found between the three eHMI conditions, as can be seen in Table 2.  For the nonyielding conditions, participants' velocity decreased once the second car had passed, which reflects their inability to cross. For the yielding condition, significant differences were observed between the Text eHMI and FBL. For the nonyielding conditions, no significant differences were observed in forward velocity.
A depiction of separate forward velocity curves for each participant for both the 20 meters and 30 meters yielding conditions can be found in Figures S5 and S6.

Moment Leaving Curb (MLC)
In the condition "20 meters, yielding", significant differences in MLC were observed between No eHMI and when either a Text eHMI or FBL was present. For the condition "30 meters, yielding", no significant differences were found between the three eHMI conditions, as can be seen in Table 2.

Thorax Angle
The mean thorax angles relative to the x-axis (i.e., the crossing direction of the zebra) are depicted in Figure 5 ("20 meters, yielding" and "20 meters, nonyielding" conditions) and Figure 6 ("30 meters, yielding" and "30 meters, nonyielding" conditions). Initially, participants, on average, had their thorax rotated towards the approaching cars on the left. Participants started to rotate their upper body toward the zebra crossing before the second car had passed when an eHMI was present on the third car. In the nonyielding conditions, participants also slightly rotated their upper body before the second car had passed, but refrained from further rotation. No significant differences between the eHMI conditions were observed.

Thorax Angle
The mean thorax angles relative to the x-axis (i.e., the crossing direction of the zebra) are depicted in Figure 5 ("20 meters, yielding" and "20 meters, nonyielding" conditions) and Figure 6 ("30 meters, yielding" and "30 meters, nonyielding" conditions). Initially, participants, on average, had their thorax rotated towards the approaching cars on the left. Participants started to rotate their upper body toward the zebra crossing before the second car had passed when an eHMI was present on the third car. In the nonyielding conditions, participants also slightly rotated their upper body before the second car had passed, but refrained from further rotation. No significant differences between the eHMI conditions were observed.

Self-Reported Predictability of Car Behaviour
Participants often stated that they experienced no fear at all, either because they did not cross in the nonyielding conditions or because the car stopped in the yielding conditions. Accordingly, no meaningful comparisons between the eHMI conditions could be made for the fear responses. Table 3 shows the means and standard deviations of the participants' difficulty to predict the behavior of the oncoming cars. Participants found it more difficult to predict the car behavior for the no eHMI condition compared to when a Text eHMI was present. Significant differences between Text and No eHMI were observed for the 20 and 30 meters yielding conditions and the 30 meters nonyielding condition. Significant differences between FBL and Text were found for the nonyielding conditions only. Table 3. Descriptive statistics and results from paired-sample t-tests of the comparison between the subjective responses of participants' difficulty to predict the behavior of oncoming vehicles when an eHMI was either present or absent.

None
Text

Self-Reported Predictability of Car Behavior
Participants often stated that they experienced no fear at all, either because they did not cross in the nonyielding conditions or because the car stopped in the yielding conditions. Accordingly, no meaningful comparisons between the eHMI conditions could be made for the fear responses. Table 3 shows the means and standard deviations of the participants' difficulty to predict the behavior of the oncoming cars. Participants found it more difficult to predict the car behavior for the no eHMI condition compared to when a Text eHMI was present. Significant differences between Text and No eHMI were observed for the 20 and 30 meters yielding conditions and the 30 meters nonyielding condition. Significant differences between FBL and Text were found for the nonyielding conditions only. Table 3. Descriptive statistics and results from paired-sample t-tests of the comparison between the subjective responses of participants' difficulty to predict the behavior of oncoming vehicles when an eHMI was either present or absent.

None
Text Front Brake Lights  Figure 7 shows the participants' mean responses to the three statements over the course of the experiment. A downwards trend is visible for the participants' difficulty to predict the behavior of oncoming cars as well as their feeling of fear when considering crossing the road. A slight upward trend is visible for participants' self-reported MISC rating. A linear test of within-subject contrasts showed the following: participants' difficulty to predict the behavior of the oncoming vehicle: F(1,23)  Figure 7 shows the participants' mean responses to the three statements over the course of the experiment. A downwards trend is visible for the participants' difficulty to predict the behavior of oncoming cars as well as their feeling of fear when considering crossing the road. A slight upward trend is visible for participants' self-reported MISC rating. A linear test of within-subject contrasts showed the following: participants' difficulty to predict the behavior of the oncoming vehicle:  Lastly, we compared our VRPQ responses to the results from [18], to investigate whether the implementation of a motion suit and virtual avatar enhanced participants' virtual immersion. Table  4 shows the mean (SD) response rates of the participants of this study and that of De Clercq et al. [18] to the factors of the VRPQ. No significant differences were found between responses in our study and that of De Clercq et al. [18] for any of the four presence factors. A detailed overview of the responses of our participants to each question of the VRPQ can be found in Table S3. Here, itemspecific effects can be observed, where compared to [18], our study scored relatively high regarding 'able to examine objects', 'sense of moving around', and 'lack of inference of control devices' but relatively low on 'identifying sounds', 'visual display quality', 'experienced delay', and 'time to adjust to the environment'.  Lastly, we compared our VRPQ responses to the results from [18], to investigate whether the implementation of a motion suit and virtual avatar enhanced participants' virtual immersion. Table 4 shows the mean (SD) response rates of the participants of this study and that of De Clercq et al. [18] to the factors of the VRPQ. No significant differences were found between responses in our study and that of De Clercq et al. [18] for any of the four presence factors. A detailed overview of the responses of our participants to each question of the VRPQ can be found in the Supplementary Materials (Table S3). Table 4. Subjective evaluation of our study compared to De Clercq et al. [18]. Results from two-sample t-tests, including descriptive statistics.

Discussion
Previous literature has shown that participants consider a text-based eHMI to be less ambiguous than front brake lights (e.g., [6,9,18]). In these prior surveys and virtual reality studies, participants were not able to cross the road. Herein, we investigated the effect of eHMIs on participants' crossing behavior by visually and audibly immersing participants in a virtual environment where a schematic representation of the participant's body was present..
We hypothesized that the Text eHMI would be more persuasive than front brake lights (FBLs), and FBLs more persuasive than baseline with No eHMI. Our survey results in Table 3 are consistent with the existing literature, with No eHMI yielding the highest difficulty ratings followed by FBL and then Text. After a Bonferonni correction, the differences between eHMI conditions were significant in 5 out of 12 cases. The objective results, operationalized as forward velocities of the pedestrians, are consistent with these self-reports, with the Text eHMI yielding the highest forward velocities followed by FBL and no eHMI. Of course, these findings only apply to first-time exposures to the eHMIs. It is likely that after training or repeated exposure, participants will get to know the meaning of the front brake lights (see also [18]). The meaning of a front brake light may be confusing as participants may not know whether they should apply an egocentric or allocentric perspective [9]. An additional explanation for the efficacy of the Text eHMI is that it was larger and more salient than the FBL. A similar effect of stimulus size can be found in Ackermann et al. [8], where participants rated large street projections in front of the vehicle as more recognizable than a relatively small text display on the grill.
Significant differences in forward velocity between Text/FBL and no eHMI were found for yielding vehicles in the 20 meter gap condition. The higher average forward velocity for the eHMI conditions can be largely explained by the fact that participants started crossing sooner when an eHMI was present (see Table 2). This effect can also be seen from the graphs depicting the forward velocities for each participant (see Figures S5 and S6 in the Supplementary Materials). The fact that the effects were strongest for the condition with 20 meters inter-vehicle distance can be explained by the fact that crossing through a 20 meter gap was too dangerous without indication from an eHMI that the vehicle will stop. Crossing through a 30-meter gap, on the other hand, was feasible without an eHMI. Thus, the effect of an eHMI was relatively small in the 30-meter condition because some participants started to cross directly after the second car had passed. These findings indicate that a motion suit allows for extracting patterns that are not evident from self-reports.
In non-yielding conditions, no differences in forward velocity between eHMIs are to be expected because participants were unable to cross the road. Participants showed a slight forward motion in the case of nonyielding vehicles, which may point to hesitative behavior. We found clear effects on thorax motion as a function of elapsed time, with participants rotating their upper body towards the target cars, and straight ahead if they were crossing. However, we did not observe significant differences in thorax angle between the eHMI conditions. This lack of significant effect could be explained by the fact that thorax angle is subject to more inter-and intra-individual variability as compared to forward velocity.
An innovation of our setup was that participants could not only walk through the virtual environment but could also see a dynamic representation of their body while having their movements recorded. Although a similar principle has been presented by Doric et al. [36] and Feldstein et al. [25,27], no motion data were presented in those studies to investigate the effects of eHMIs on pedestrian crossing behavior. The implementation of an avatar in our study was expected to yield a compelling sense of presence. Petkova and Ehrsson [37], for example, found that some participants experienced a full-body ownership illusion in virtual reality. However, we did not find significant improvements in participants' subjective experience by utilizing a motion-tracking suit and implementing a virtual representation of the participants' bodies compared to a prior study without a motion suit by De Clercq et al. [18]. Although this comparison should be interpreted with caution, as [18] was conducted a year before the present study at a different university and used a fundamentally different measurement of willingness to cross, it does suggest that providing a person with a virtual body does not strongly enhance experienced fidelity. Although participants in our study reported high scores for being in control and able to move around, relatively low scores were obtained for quickly adjusting to the environment, visual display quality, and being able to identify sounds. Thus, it seems that the motion suit offers benefits for presence but at the same time may cause some usability issues. The lack of overall improvement in presence could be explained by the fact that the avatar was a robot-looking genderless avatar. Another factor of importance could be tactile feedback. Petkova and Ehrsson [36] found that the full-body illusion was evoked only if being stimulated by synchronous tactile-visual feedback. It is possible that, if stepping down the sidewalk onto the road could be felt by participants, this would yield a more compelling sense of presence compared to the current setup in which participants walked on the flat lab floor. Lee et al. [38] presented a cave-like simulator, argued to be the largest 4K-resolution pedestrian simulator in the world, to be used for eHMI research. Such a solution is also expected to give a high sense of presence, as participants can walk in a virtual world while being able to see their body without the need for a head-mounted display (although glasses for stereoscopic vision can be used as an option).
In conclusion, we confirmed that eHMIs influence pedestrians' actual crossing behavior compared to a baseline condition. The usage of a motion suit allows researchers to investigate subtle interaction patterns such as body angles and hesitative behaviors. Nuanced conclusions can be derived from such recordings compared to the discrete or binary information from survey studies (e.g., [6,9]) and virtual reality studies (e.g., [18,19]).
The present results allow for critical thinking about the value of high-fidelity setups for evaluating eHMIs. If a researcher's goal is merely to evaluate which type of eHMI is clearest, then an immersive virtual reality setup such as the present one may not be needed. Eisma et al. [39] found that asking people on a scale from 0 to 10 whether the eHMI is clear yielded results that correlated nearly perfectly (r = 0.99) with objective results measured using a response key. In turn, the present study found that pedestrians' forward velocity gave results that are similar to previous research using a response button [18]. However, if one's goal is to evaluate how people cross, then a high-fidelity setup such as the present one can be valuable.
For future research, we see merit in utilizing a motion-tracking suit in more complex traffic scenarios involving pedestrian-vehicle interaction. For example, it would be worthwhile to test the effectiveness of eHMIs in situations where participants have to distribute their attention, such as situations that involve bidirectional traffic flows, other pedestrians, and a mix of autonomous and conventionally driven vehicles. Additionally, the 3D recording of pedestrians in crossing situations could be beneficial to perception and modeling research (e.g., [40,41]), where the goal is to have self-driving vehicles detecting the posture of pedestrians and infer their crossing intention.