Efficient Paradigm to Measure Street-Crossing Onset Time of Pedestrians in Video-Based Interactions with Vehicles

: With self-driving vehicles (SDVs), pedestrians can no longer rely on a human driver. Previous research suggests that pedestrians may benefit from an external Human–Machine Interface (eHMI) displaying information to surrounding traffic participants. This paper introduces a natural methodology to compare eHMI concepts from a pedestrian’s viewpoint. To measure eHMI effects on traffic flow, previous video-based studies instructed participants to indicate their crossing decision with interfering data collection devices, such as pressing a button or slider. We developed a quantifiable concept that allows participants to naturally step off a sidewalk to cross the street. Hidden force-sensitive resistor sensors recorded their crossing onset time (COT) in response to real-life videos of approaching vehicles in an immersive crosswalk simulation environment. We validated our method with an initial study of N = 34 pedestrians by showing (1) that it is able to detect significant eHMI effects on COT as well as subjective measures of perceived safety and user experience. The approach is further validated by (2) replicating the findings of a test track study and (3) participants’ reports that it felt natural to take a step forward to indicate their street crossing decision. We discuss the benefits and limitations of our method with regard to related approaches.


Introduction
Highly (SAE Level 4) and fully (SAE Level 5) automated vehicles no longer require a driver [1]. With self-driving vehicles (SDVs) and human road users sharing the road, a "mixed traffic" transition period will emerge, demanding pedestrians interact with both SDVs and conventional vehicles (CVs) [2]. The related complexity could negatively affect pedestrian safety [2]. In today's traffic, pedestrians rely on a set of elaborate communication strategies when a CV approaches to decide whether it is safe to cross, including vehicle speed [3][4][5], distance of the vehicle [6], and eye contact with the driver [5,7]. While pedestrians can rely on traffic lights at signalized crossings, right of way can be ambiguous at unsignalized crossings, where human drivers frequently fail to yield to pedestrians. As a consequence, pedestrians are more risk-averse and seek more eye contact with the driver at unsignalized crossings [8,9]. As a substitute to communicating with a human driver, equipping SDVs with an external Human-Machine Interface (eHMI) has been proposed, to provide information to surrounding traffic participants [10]. An eHMI may be particularly important to reduce pedestrians' uncertainty at ambiguous crossings [11]. Preceding studies showed that pedestrians feel uncomfortable when encountering a driverless vehicle [12][13][14]. Limiting the scope to pedestrians' crossing decisions, previous research shows that the presence of an eHMI has positive effects on perceived safety [12,13,[15][16][17][18], calmness [18], trust [12], comfort [19,20], user experience [12], and crossing decisions [13,17,[21][22][23]. It can be argued that the necessity of an eHMI is demonstrated, but the type of information and means of conveying this information need to be further examined to reach the goal of a standardized eHMI. While subjective measures such as pedestrians' perceived safety can be assessed with a questionnaire after each trial (e.g. [12,13,24]), the assessment of eHMIs' effect on traffic flow poses a challenge. Traffic flow is an objective measure that can be quantified. The sooner a pedestrian initiates street crossing, the less time s/he has to wait on the curb and the less time a quickly approaching vehicle has to remain stopped, resulting in faster traffic for both the pedestrian and the approaching vehicle. In addition, to the improved time efficiency associated with a smooth flow of traffic, there are also environmental benefits such as decreased emissions and fuel consumption [25].
In the following, we will give an overview of the preceding applied methods to measure pedestrians' street crossing decision. Then, we will explain the motivation for our method.

Previously Applied Research Methods to Capture Pedestrian Crossing Decisions
In the following, we provide an overview of methodologies applied in preceding studies to capture pedestrians' street crossing decisions, discussing their benefits and limitations.
One approach is to capture the decision-making process of street crossing in terms of a function of the distance between the pedestrian and the approaching vehicle (e.g., [26][27][28]). For example, in a field study by Walker et al. [26], participants were instructed to express their feeling of safety to cross the road at any moment of time between 0 ("not at all willing to cross") to 100 ("totally willing to cross") on an input device that they hold in their hand while a vehicle approaches. While this approach is promising to form a better understanding of the underlying factors influencing a street crossing decision, we believe that it is not suitable to capture traffic flow. Pedestrians' street crossing is an actual behavior that has a binary character-either a pedestrian is waiting on the curb or crosses the street. Thus, traffic flow cannot be measured on a continuous scale.
A further approach is to measure the binary crossing decision (yes/no), i.e., whether pedestrians would be willing to cross the street in front of an approaching vehicle (e.g., [15,24,29,30]). For example, Song et al. [15] conducted an online survey in which pedestrians watched videos of a vehicle approaching from an ego-perspective and had to decide after each trial whether they want to cross (pressing the space key) or let the vehicle pass (not pressing the space key). We argue that this approach does not give any indication regarding traffic flow, since it fails to produce a relationship with the point of time participants would initiate street crossing.
Another approach is to compute crossing onset time (COT) by capturing the time a pedestrian decides to cross in relation to the vehicle's action. We argue that this is the only approach that can draw conclusions about eHMI effects on traffic flow. Regarding COT, the preceding methods can be divided into unnatural approaches, requiring participants to indicate their decision to cross in an explicit manner via pressing a button [11,17,22,23] or raising their hand [31], and natural approaches which allow participants to indicate their decision to cross with the actual behavior of taking a step forward [12,21,32,33]. We believe that methods requiring participants to imagine how they would act or feel make their decision explicit, which might limit their validity. We argue that, in terms of ecological validity, the natural behavior of stepping forward constitutes the best approach to measure COT. For example, in a test track study by Faas et al. [12], pedestrians watched an approaching vehicle coming to a stop at an intersection and had to cross the street as soon as they felt safe to do so. The vehicle encounters were video recorded for later analysis to estimate the time gap between the vehicle coming to a stop and the pedestrians' COT. Street crossing can be seen as an unreflective skillful action [34]. When crossing a street, pedestrians often act adequately, yet without deliberation. Street-crossing decisions are not guided by explicit reasoning, but constitute a form of embodied intelligence or cognition. Bodily processes or so-called ''gut-feelings'' might be of enormous importance for street crossing decision making [35]. It can be argued that pedestrians make their decision to cross unconsciously as soon as they feel that it is safe to cross, which is usually as soon as they are sure that the vehicle intends to yield for them. Their embodied nature makes individuals' street crossing decisions sensitive to aspects of the situation [34], such as the presence of a visible driver or an eHMI. However, to date, only a few test track studies [12,33] and VR studies [21,32] have assessed COT by allowing pedestrians to take a step forward.

Proposed Concept to Capture Street Crossing Onset Time (COT)
In this paper, we propose a parsimonious, safe, and reproducible paradigm for video-based lab studies that can capture COT in a natural way to test the efficacy of eHMI concepts for SDV and pedestrian interaction. We present a method in which participants indicate their COT by actually stepping off a "sidewalk" onto a "crosswalk". We conducted the experiment in a lab environment where participants were immersed using two large TV screens for a panoramic street view. With adhesive tape, we sketched a sidewalk and a crosswalk onto the floor. Under the sidewalk, we hid two force-sensitive sensors to capture COT. When the participant stepped onto the sidewalk, the videos were triggered and the COT timer was started. The COT was recorded when the participant stepped off the sidewalk to enter the crosswalk, with force-sensitive resistor sensors making data analysis time-efficient.
For the experiment, we contrasted three eHMI variants (no eHMI, status eHMI, status+intent eHMI) to address the research question of which information an eHMI should communicate. We used two light-based eHMI concepts adapted from Faas et al. [12]. The status eHMI is a steady bluegreen light indicating the automated driving mode, as recommended by the SAE [36]. For the status+intent eHMI, an additional slowly flashing blue-green light (adapted from [37]) indicated the SDV's intent to yield as soon as the vehicle was braking, thus resembling the frontal brake light concept of previous eHMI studies [13,18,24,38]. We put the encounters with a driverless SDV in relation to encounters with a CV steered by a driver. We conducted three measuring points to study the stability of eHMI effects. The results of the study are published in Faas et al. [39]. The study showed that pedestrians benefit from an eHMI communicating SDVs' status, and that additionally communicating SDVs' intent adds further value. These eHMI effects last (acceptance, user experience) or even increase (COT, perceived safety, trust, learnability, reliance) with time.
The present paper focuses on the description and validation of the applied research method. For the present paper, we specifically re-evaluated the data of the first measuring time of the longitudinal study of Faas et al. [39], since we argue that our method is able to compare the efficacy of eHMI variants with one measuring time only. Furthermore, the present paper includes additional procedures that were not reported in Faas et al. [39] to validate the applied research method. To this end, we compared participants' responses in the lab study of Faas et al. [39] with participants' responses in the test track study of Faas et al. [12] to investigate potential differences attributed to the applied experimental methodology. Additionally, we analyzed participants' self-reported naturalism in the study setup. In this paper, we provide a detailed description of our method to allow others to adopt it. We validate our method by showing that it is able to detect significant eHMI effects on COT (and thus traffic flow) and subjective measures of perceived safety and user experience. Our approach is further validated by replicating findings of a test track study. Finally, participants reported that it felt natural to take a step forward to indicate that they would cross the street. We conclude that our paradigm allows relative comparisons of eHMI variants.

Participants
Thirty-four pedestrians (19 male, 15 female) in the age range of 22 to 69 years (M = 41.5, SD = 15.8 years) took part in the study. A third-party agency recruited the participants. For screening, potential participants specified which modes of transportation they use during a typical work week by distributing the percentage out of 100% among driving, public transit, biking, walking, and other. Those, who distributed at least 20% to walking, received an invitation to participate in the study. All participants were living in the San Francisco Bay Area, CA, USA. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the RD Ethical Clearing Committee of Daimler AG. Figure 1 gives an overview of the study procedure, including the independent variable, that is, vehicle type. . The driverless self-driving vehicle (SDV) (automated mode) is equipped with no eHMI, status eHMI or status+intent eHMI (test conditions 1, 2, 3). Both the SDV steered by a driver (conventional mode) and the conventional vehicle (CV) are either yielding (test conditions 4, 5) or non-yielding (filler test conditions 4b, 5b). In a randomized order, each participant experienced all seven test conditions once for habituation (wave 1) and once for data collection (wave 2). The bottom row represents the dependent variables (DVs) assessed in wave 2. The crossing onset time (COT) data were recorded by an Arduino Uno through the logs of two force-sensitive resistor sensors. For the subjective measures, participants filled in questionnaires after each trial. While perceived safety was measured for all yielding vehicle trials, we applied the user experience scales only after trials with a driverless SDV. Three eHMI test conditions were contrasted without a driver, i.e., self-driving ( Figure 2):

Independent Variable
1. Driverless SDV without eHMI: there is no indication whether the vehicle is in automated mode, i.e., self-driving, or conventional mode, i.e., steered by a driver; 2. Driverless SDV with status eHMI: steadily emitting blue-green lights on each fake Lidar sensor indicates that the vehicle is in automated mode. The design follows the recommended practice of the SAE [36]; 3. Driverless SDV with status+intent eHMI: additionally to the "status" message, the "intent" signal was turned on when the approaching car started to brake, thus resembling the frontal brake light concept of previous eHMI studies [13,18,24,38]. To communicate the SDV's intent to yield, a light above the windshield flashed with a frequency at 0.5 Hz and a sinus cycle from 30% to 100% light intensity. The design follows the recommendation of Faas et al. [37]. The video of the status+intent eHMI test condition is available through the link in the Supplementary Materials.
4. The driverless SDV was shown to always yield to pedestrians. We chose a driverless setup to resemble a future automated vehicle on its way to pick up a passenger. Furthermore, to realize a mixed traffic environment, we incorporated encounters with vehicles steered by a visible driver. The human-driven vehicles ( Figure 3) were either yielding (test conditions 4, 5) or nonyielding (filler test conditions 4b, 5b): SDV steered by a driver: yielding; 4b. SDV steered by a driver: non-yielding (filler test condition); 5. CV steered by a driver: yielding; 5b. CV steered by a driver: non-yielding (filler test condition).
This study was designed to examine participants' responses when the car was yielding. Thus, the responses to the non-yielding vehicles were not analyzed (test conditions 4b, 5b). The nonyielding vehicles were included to ensure that participants would not habituate to all cars stopping for them, which might lower their attention and COT. We deliberately chose not to include any nonyielding driverless SDV encounters. While it can be argued that human drivers differ in their driving style, vehicle automation is programmed to adhere to traffic laws, thus always yielding at a pedestrian crossing. To provide a mixed-traffic environment, a visible driver steered the self-driving vehicle (SDV; top row) respectively a conventional vehicle (CV; bottom row). They were either (a) yielding to let the pedestrian cross first (test condition 4, 5); or (b) non-yielding (test condition 4b, 5b) so that the pedestrian has to wait for the vehicle to go first and crosses the empty street afterwards safely.

Materials and Equipment
The experiment took place at the lab facilities of Mercedes-Benz Research and Development North America in Sunnyvale, CA, USA. We immersed participants with two large TV screens (25.5 inches (width) by 44 inches (length)) displaying the real-life video clips. The TV screens were set up at an angle of 60 degree to create a panoramic view. With adhesive tape and a mat, we sketched a "sidewalk" and a "crosswalk" onto the floor ( Figure 4). Under the "sidewalk", we fixated two forcesensitive resistor sensors with the dimensions 44.45 × 44.45 nm (1.75 × 1.75 in). On the "sidewalk", we sketched two footprints at the same level as the force-sensitive resistor sensors. An Arduino Uno analog-to-digital converter was used to read the variable resistance of the force-sensitive resistor sensors. A 1k resistor was used to create a voltage divider. The software Arduino IDE (version 1.8.9) was used to code the data. A timer was added to display the elapsed time. When participants stepped onto the footprints (respectively putting force on each sensor), the COT timer started and the video clips were triggered starting with a three-second countdown. To provoke natural behavior, the participants' task was to cross the street when they felt safe to do so by entering the "crosswalk". When participants stepped off the "sidewalk" (respectively removing force on either sensor), the COT timer stopped. For the real-life videos with the SDV, we created a Wizard-of-Oz setup [40]. On the roof of a silver Mercedes-Benz S-Class (Series W222), we mounted fake Lidar sensors similar to those of SDVs currently test-driving on public roads (e.g., [41,42]) as a reminder of the vehicle's ability to drive automated (see [43]). On the fake sensors, we attached LED light stripes to simulate the eHMIs. To create the deception of a driverless vehicle (test conditions 1, 2, 3), the driver controlling the vehicle wore a seat costume (adapted from [14]). For the videos in conventional driving mode (test conditions 4, 4b), the driver steering the vehicle was visible. For the videos with the CV and a visible driver (test conditions 5, 5b), we used three silver sedan models, namely a Chevrolet Impala, a Dodge Charger, and a Kia Optima. The occurrence of these models was randomized. All videos were cropped to a length of 15 s. Five observers who were not associated with the study checked the videos to ensure that they all displayed the same driving behavior.

Real-World Crossing Scenario
For the traffic scenario, we chose an intersection that requires pedestrians to cross an expressway exit lane while a vehicle approaches. The crossing has no traffic lights, but the request "YIELD" is written onto the street. In a preceding workshop, this traffic scenario was identified to be ambiguous for pedestrians. Workshop participants reported that, while the law states designated priority to pedestrians, the norm is that some approaching vehicles do not stop. In ambiguous traffic scenarios, communication strategies with the driver become especially prominent [9].
The video clips were recorded on a sunny day on a public highway. The camera perspective was from the viewpoint of a pedestrian standing on the sidewalk waiting to cross the road (see Figures 2  and 3). Specifically, the approaching vehicle was exiting Central Expressway to enter North Mary Avenue in Sunnyvale, CA, USA. Figure 5 shows the traffic scenario from a bird's eye-view.

Video Flow
The experiment employed seven test conditions with yielding (test conditions 1, 2, 3, 4, 5) and non-yielding (test conditions 4b, 5b) vehicles in a within-subjects test design. Test conditions were randomized according to a Latin Square. Table 1 shows an overview of the video flow. The left TV screen showed the street with the approaching vehicles and the right screen showed the crosswalk. To allow time for participants to focus their attention back to the TV screens, each test condition started with a 3 s countdown on the left screen. Then, the video of the corresponding test condition was triggered. In each video, a vehicle approaches with a constant speed of 25 mph.
For the yielding videos (test conditions 1, 2, 3, 4, 5), the vehicle approached with a constant speed for 3 s, decelerated to come to a stop at the intersection for 8 s, and waited for the pedestrian to cross for 4 s. After participants stepped off the "sidewalk", we provided visual feedback on their crossing decision through a street crossing video from an ego perspective on the right screen. On the left screen, the vehicle was waiting for the pedestrian to cross.
For the non-yielding videos (test conditions 4b, 5b), the vehicle slightly decelerated to make a right turn, but did not yield to the pedestrian. If participants succeeded by waiting for the car to pass to enter the street, a video was triggered on the right screen showing a street crossing from an ego perspective, while on the left screen the road was empty. If participants entered the crosswalk while the vehicle was still approaching, a red screen with the message "not safe to cross!" (left screen) and a video of a passing car (right screen) was triggered. In this case, the test condition was repeated.

Procedure and Participants' Task
Prior to the experiment, participants provided written informed consent. Participants completed a demographic questionnaire. Then, participants were introduced with the definition of high driving automation (SAE Level 4). Participants were told that the SDV they will encounter "has both an automated and a manual driving mode. The vehicle can thus either be self-driving or be controlled manually by a driver." Next, the three eHMI concepts were explained to participants. Subsequently, participants' understanding of the eHMI concepts was tested by asking them "What does the light signal indicate?" Following, participants were familiarized with the study setup by the experimenter going through the participants' task. Participants were shown an example scenario with the status+intent eHMI (test condition 3). First, they were asked to imagine that the mat is a "sidewalk".

Participants' Task Left Screen Right Screen
Participant is ready for the next trial and asked to step on the "sidewalk". Then, namely "The next slide lets you know that at this time, you can step on the sidewalk to begin the scenario. When you step on the sidewalk, please make sure your feet are aligned with the footprints. Once both feet are on these footprints, the scenario will begin." Participants were told that in each scenario a vehicle will be approaching, but not all vehicles are going to yield. The participants' task was "to safely cross the road at an intersection as a pedestrian while different vehicles approach. As soon as you feel safe to cross, please do so. You must cross for all scenarios. To cross, just step off the sidewalk as if you're going to enter the crosswalk." Thus, with each trial, participants indicated their COT by stepping of the "sidewalk" to enter the "crosswalk" (see Table 1). The field of view was panoramic in the way that pedestrian had to bend their head to the left to observe the approaching vehicle and step forward to initiate street crossing.
Subsequently, the room's light was dimmed to allow a better contrast for the participant to see the contents of the TV screens clearly. Participants encountered two waves consisting of seven trials, each with vehicles that yielded to the pedestrian in five trials (test conditions 1, 2, 3, 4, 5) and nonyielding vehicles in two trials (test conditions 4b, 5b). Participants experienced one wave for habituation. After habituation, the second wave followed for data acquisition. We assessed participants' COTs and subjective measures for all yielding vehicle trials. The crossing onset data were recorded by an Arduino Uno. After each trial, participants filled in a questionnaire to indicate subjective measures of perceived safety and user experience (see Figure 1).
After all trials, participants were asked to rate the naturalism of our paradigm. We informed participants that the encountered vehicle had not been driving automated at any time. Total testing time was about 30 min per participant.

Dependent Variables
In this paper, we report the following objective measure: • Crossing Onset Time (COT): After each yielding vehicle trial (test conditions 1, 2, 3, 4, 5), we determined COT. COT indicates the time in seconds between the vehicle yielding and the pedestrian stepping off the "sidewalk". Hence, to calculate the COT, we have subtracted the time between the pedestrian entering the "sidewalk" and the vehicle yielding (3s countdown + 3s vehicle approaching at constant speed). We used COT as an index of traffic flow. Shorter times indicate an earlier crossing decision. The earlier pedestrians cross when it is safe to do so, the more efficient the traffic flows. We excluded extreme cases from data analysis, defined as more than three times the interquartile range (IQR) greater than the upper or lower quartile (2 values of N = 1 participant excluded). Furthermore, we report the following subjective measures, all measured on a scale from −3 (very negative) to +3 (very positive): • Perceived Safety: After each yielding vehicle trial (test conditions 1, 2, 3, 4, 5), participants reported their perceived safety with four items (based on [44]) with semantic differentials answered on a 7-point scale ranging from −3 to +3 ("anxious-relaxed", "agitated-calm", "unsafe-safe", "timid-confident"). Reliability was excellent, with Cronbach's α = 0.90 to 0.96; • User Experience (UX) Qualities: After each driverless SDV trial (test condition 1, 2, 3), participants completed the short version of the User Experience Questionnaire (UEQ-S) [45]. The scale consists of two dimensions: pragmatic quality (PQ) and hedonic quality (HQ). Participants reported their user experience with semantic differentials ranging from −3 (negative) to +3 (positive). The reliability of all subscales was good to excellent, with Cronbach's α = 0.80 to 0.94; • Naturalism: In the post-experiment interview, participants rated the items "How immersive was the study setup?" and "How natural was it to take a step forward to indicate that you would cross the street?" (based on [33]) on a scale from −3 ("not at all") to +3 ("extremely").

Data Analysis
We used repeated measures ANOVAs to test the effect of vehicle type (test conditions 1, 2, 3, 4, 5) on COT and perceived safety. As an additional analysis, we performed cluster analyses to categorize the participating pedestrians into groups according to their COT obtained for each yielding test condition. To classify pedestrians into groups, we used Ward's method in combination with squared Euclidean distances (see [46,47]). As a hierarchical procedure, the Ward's method successively merges cases into clusters such that the variance within a cluster is associated with the smallest possible increase (see [46,47]).

Next, we used repeated measures ANOVAs to test the effects of eHMI type (test conditions 1, 2, 3) on UX qualities (HQ and PQ).
Finally, we compared the subjective responses to the PQ scale of our participants and the participants in the test track study of Faas et al. [12] to investigate potential differences attributed to the applied experimental methodology. For this purpose, we used the data of the no eHMI, status eHMI, and status+intent eHMI test conditions that were assessed with N = 30 participants at an intersection traffic scenario on a test track in Immendingen, Germany. We believe that this comparison is valuable, although the experiments differ regarding participants' nationality (U.S. vs. German) and traffic scenario (exit lane vs. four-way intersection). The study participants of this lab study and the test track study did not differ regarding age, t(57) = −0.37, p = 0.714, or gender, χ 2 (1) = 0.04, p = 0.838. We chose the PQ scale for the following comparison, since it is the only standardized questionnaire that has been applied in both studies. We used two-sample t-tests to investigate whether pedestrians' subjective PQ ratings of the three eHMI variants (no eHMI, status eHMI, status+intent eHMI) differ among experimental methodology (lab study vs. test track study).
For all ANOVAs, the data were checked for sphericity using Mauchly's test, and, where violated, Greenhouse-Geisser and Hyunh-Feldt corrections were applied (as recommended by [48]). Where needed, we used Bonferroni-corrected post-hoc t-tests.
To account for pedestrians' individual crossing strategies [12], we performed cluster analyses, classifying pedestrians into groups according to their COT obtained for each yielding test condition. A dendrogram graphically illustrates the formation of clusters at the individual fusion stages ( Figure  7a). To determine the number of clusters into which pedestrians can be meaningfully clustered, we computed a structogram (Figure 7b). The stuctogram graphically illustrates that the fourth cluster contributes significantly less to the variance than the first three clusters. Because of the considerable drop in the Sum of Squared Errors (ΔSSE), it seems reasonable to assume a solution with three clusters. Figure 8 shows the individual COT for each participant sorted by the three derived clusters from cluster analyses. Visual inspection suggests the following description of the three clusters: The first cluster (N = 7) includes early crossers who cross before the vehicle comes to a stop and are strongly influenced by the test conditions, particularly by the presence of a status+intent eHMI. The second cluster (N = 20) describes intermediate crossers who initiate crossing at about the same time as the vehicle comes to a stop. They are slightly influenced by the test conditions and constitute the biggest cluster. The third cluster (N = 7) includes late crossers who wait for the vehicle to come to a stop before crossing the street. These late crossers are slightly influenced by the test conditions. In summary, pedestrians initiated street-crossing the soonest with a status+intent eHMI. Compared to a CV or SDV steered by a driver, pedestrians initiated crossing at the same time if the driverless SDV was not equipped with an eHMI and sooner if it was equipped with an eHMI displaying the SDV's status and intent (see also: Faas et al. [39]). The significant effect of status+intent eHMI seems to be carried by a cluster of pedestrians, who are likewise characterized by a tendency to cross the street early, also with human-driven vehicles.

Perceived Safety
On perceived safety, the one-way repeated measures ANOVA found a significant effect of vehicle, F(2.59, 85.56) = 8.65, p < 0.001, ηp 2 = 0.21. Figure 9  , and, thus, also safer than without eHMI, p < 0.001, 95% CI [0.69-2.51], drawing the following pattern: status+intent eHMI > status eHMI > no eHMI. Regarding human-driven vehicles, participants feel equally safe with an SDV steered by a driver (M = 1.06, SD = 1.46) and a CV steered by a driver (M = 1.06, SD = 1.51), p = 1.000. Compared to an SDV steered by a driver or a CV steered by a driver, participants felt less safe encountering a driverless SDV without eHMI, all ps < 0.01. However, if the driverless SDV was equipped with a status eHMI or a status+intent eHMI, participants felt as safe as with a human driven vehicle, all ps > 0.05.
In summary, pedestrians felt safest with a status+intent eHMI. With any eHMI, pedestrians felt as safe as with human-driven vehicles. However, if the driverless SDV is not equipped with an eHMI, pedestrians felt less safe than with human-driven vehicles (see also: Faas et al. [39]). (1) no eHMI (2) Figure 9. Mean perceived safety scores for all yielding test conditions. Error bars: ±1 SE.
Based on Hinderks et al. [49], the UX scores can be interpreted as bad (PQ) and below average (HQ) for no eHMI, below average (PQ) and good (HQ) for the status eHMI and excellent (PQ, HQ) for the status+intent eHMI (see also: Faas et al. [39]).

Comparison of Participants' PQ Ratings in This Lab Study and a Test Track Study
We compared the PQ responses of this lab study to the PQ results of the test track study of Faas et al. [12] to investigate whether the different experimental methodologies lead to different results. We used two-sample t-tests to investigate whether pedestrians' PQ ratings of the three eHMI variants (no eHMI, status eHMI, status+intent eHMI) differ among experimental methodology (this lab study vs. test track study of Faas et al. [12]). Levene's test for equality of variances was not violated for any t-test. Table 2 and Figure 11 show the results. For no eHMI, participants' PQ ratings were significantly lower in this lab study compared to the test track study of Faas et al. [12], t(62) = −2.10, p = 0.40, r = 0.26. However, both mean scores lead to the same interpretation of a bad user experience according to the benchmarks of Hinderks et al. [49]. Accordingly, for the status eHMI there was a trend implicating that participants' PQ ratings were lower in this lab study compared to the test track study of Faas et al. [12], t(62) = −1.71, p = 0.92, r = 0.21. For the status+intent eHMI, we found no significant difference between the studies, p = 0.822.

Self-Reported Naturalism
After all trials, participants rated the naturalism of the experiment on a scale from −3 ("not at all") to +3 ("extremely"). The mean score to the question "How immersive was the study setup?" was M = 0.62 (SD = 1.37), suggesting a fair immersion. The mean score to the question "How natural was it to take a step forward to indicate that you would cross the street?" was M = 1.82 (SD = 1.03), suggesting good validity.

Discussion
This paper presents an innovative method to study SDV-pedestrian interactions in a safe, reproducible, and a natural manner for video-based eHMI studies. We developed a cost-efficient concept that allows participants to show natural behavior (i.e., entering a street). Participants make an actual street-crossing decision; that is, they are instructed to take a step off a sketched "sidewalk" to enter a sketched "crosswalk" to measure COT as a means to assess traffic flow. In the following, we discuss how the eHMI effects, which have been brought to light by our approach, validate its application. Furthermore, we discuss our method with regard to related approaches as well as the limitations and further improvements of our methodology.

Validation
We showed that our method is able to detect statistically significant eHMI effects that are comparable to a real-life study on a test track, and further displays a good level of self-reported naturalism.
The results of the eHMI study, yielding significant and meaningful results, validate the use of our approach. We found that, compared to human-driven vehicles, pedestrians feel less safe encountering a driverless SDV if it has no eHMI. However, pedestrians feel as safe if the driverless SDV is equipped with an eHMI displaying its status and, eventually, intent. When comparing the eHMI variants, all subjective measures (perceived safety, HQ, PQ) revealed the same pattern: status+intent eHMI > status eHMI > no eHMI. On COT, we found that pedestrians make earlier (thus more efficient) crossing decisions with a status+intent eHMI than with no eHMI. The significant effect of status+intent eHMI seems to be carried by a cluster of participants, suggesting individual crossing strategies among pedestrians (comparable to different lane changing strategies among drivers, see for example, [50]). Thus, providing pedestrians with information on SDVs' automated status and imminent intent supports a feeling of safety and HQ. Pedestrians perceive an eHMI to be useful information (PQ), supporting them in their decision to cross the road as observed in earlier COTs (for a textual discussion, see Faas et al. [39]).
The approach is further validated by the fact that the study outcomes confirm previous research showing eHMI effects on perceived safety [12,13,[15][16][17][18] and crossing onset [13,17,[21][22][23], suggesting that our method is as suitable as other approaches to detect eHMI effects. This becomes particularly clear as our method replicates the findings of a test track study by Faas et al. [12]. Both studies compared the effect of light-based eHMI concepts on PQ at an ambiguous crossing traffic scenario. Both studies revealed the same significant pattern regarding pedestrians' rating of PQ: status+intent eHMI > status eHMI > no eHMI. Thus, both studies showed that communicating an SDV's intent adds further benefit for pedestrians over just displaying the automated status. However, in the current lab study (Faas et al. [39]) pedestrians rated the no-eHMI test conditions as significantly worse, and the status eHMI test condition as slightly worse, than participants of the test track study (Faas et al. [12]). We believe that the worse ratings emerged because, in the lab study, a vehicle without an eHMI could mean a real disadvantage, potentially representing a non-yielding vehicle. On the contrary, in the test track study (Faas et al. [12]) all vehicles yielded, so the participants' safety was guaranteed. Further, a lab study is more controlled than a test track study. Thus, while showing the same pattern of eHMI ratings (status+intent eHMI > status eHMI > no eHMI), the lab study produced more variance in participants' ratings, leading to a more differentiated evaluation of the eHMIs variants.
Finally, participants reported that it felt natural to take a step forward to indicate their streetcrossing decisions (M = 1.82 on a scale from −3 to +3), suggesting a good validity.

Benefits with Regard to Related Approaches
The benefits of our method are its natural approach to assess COT in a parsimonious, reproducible, and safe manner.
Most previous approaches assessed crossing decisions in an unnatural manner, instructing participants to indicate their decision via pressing a button [13,15,17,22,23,29,30], a slider [26][27][28], or raising their hand [31]. Those approaches make the participants' crossing decisions explicit, creating an intermediary step that may affect their behavior. Participants have to transfer their implicit crossing decision to an explicit motor decision with their hand. Furthermore, participants may have to look at the button or slider, so they cannot observe the approaching vehicle at all times. For example, in the study of Walker et al. [26], 29% of the participants reported that they were not able to use the slider naturally, thus not able to indicate their feeling of safety valid. Since street-crossing can be seen as an unreflective skillful action, which is a form of embodied intelligence or cognition [34,35], we argue that COT should be measured in a natural way, by actually stepping off a sidewalk onto a crosswalk. Our approach allows participants to show natural street-crossing behavior (i.e., entering a street) if they feel safe to cross. Thus, with our method, participants are closer to the processes that take place in real-world traffic situations, which improves ecological validity.
Only a few test track studies [12,33] and VR studies [21,32] allowed participants to indicate their decision to cross in a natural manner via the actual behavior of making a step forward. However, test track and VR studies require high-priced apparatus and materials as well as time-consuming data analysis. For example, the required resources for an eHMI study on a test track include a test track location, a real vehicle, a light setup (e.g., LED stripes), and a driver steering the vehicle, possibly in a seat costume. These resources are required for several days. For later analysis, videos of each vehicle encounter need to be visually analyzed to extract the crossing onset measure (e.g., [12,33,37]). Similarly, to conduct and analyze VR studies, researchers need technologically advanced software and hardware (for an overview, see [51]). Participants might suffer simulation sickness [52]. Compared to previous studies on a test track or in VR, our approach requires only a few materials. Video-based studies are cost-efficient in comparison. The material required for our approach include two TV screens, adhesive tape, two force-sensitive resistor sensors, an Arduino Uno analog-to-digital converter, and a laptop with the software Arduino IDE. For our real-world eHMI video clips, we needed a vehicle, fake Lidar sensors with LED light stripes, and a seat costume to create the illusion of a driverless vehicle. If researchers do not have access to those materials, future studies could use animated videos instead, just as VR studies do (e.g., [17,21,32]). An advantage of animated videos is that they allow researchers to have absolute control of any variable they might want to manipulate. However, their physical accuracy is lower than real-life videos [53]. Data analysis of our approach is as time-efficient as the Arduino Uno records COT in real-time.
Furthermore, video-based studies allow for flexibility and variety in eHMI test conditions. Researchers need to conduct only one video of an approaching vehicle and can use animations to create eHMI variants. The study is reproducible.
Lastly, one advantage of video studies is the possibility of incorporating non-yielding vehicle encounters while ensuring participants' safety. In contrast, test track studies need to meet high ethical standards and safety provisions, limiting their representativeness for complex urban traffic scenarios. For example, to guarantee participants' safety, non-yielding vehicle encounters should not be incorporated. Our approach allows participants to experience safety critical situations without actually endangering them. Although non-yielding vehicle encounters are not of research interest, they prevent participants from habituating to all cars stopping for them, which might lower their attention and, thus, the validity of the study.

Limitations and Recommendations
While our approach is promising, we acknowledge that there are limitations that require further attention. The first one refers to the absence of a real safety risk. The fact that participants cannot be harmed ensures participants' safety, but it also limits the realism of our approach. Since pedestrians do not have to fear any real risks from non-yielding vehicles, they might behave in a riskier manner than in normal traffic. The second limitation refers to participants' fair evaluation of the approach's immersiveness (M = 0.62 on a scale from −3 to +3), which might be rooted in the participants' constrained field of view. While real-life videos from the perspective of a pedestrian exhibit a high level of physical accuracy, their operationalization is not as good as experiencing a traffic situation in a real environment [53]. Thus, our method is suitable for relative comparisons (i.e., detecting differences between eHMI concepts) but not to establish the true value of COT for a certain eHMI concept. However, this limitation applies to all research studies that use simulation. To make the setup more realistic, future studies could setup the "sidewalk" with a real curb so that participants need to take a step down onto the "crosswalk" compared to the current setup with a flat lab floor (suggestion made by Koojman et al. [21]). Moreover, the use of VR glasses instead of TV screens may increase the participants' degree of immersion. However, despite these limitations, our approach proved its sensitivity to detect eHMI effects on pedestrians' COT, perceived safety, and user experience.

Conclusions
This paper introduces a novel paradigm to study SDV-pedestrian interaction that is relatively easy to implement and can find a balance between a natural and parsimonious study setup. We propose the use of two TV screens and a simulated sidewalk with hidden force-sensitive resistor sensors as the input device. We believe that street crossing behavior should be grasped by the actual action of stepping off a sidewalk onto a street. We propose that the study design shows clear advantages, as opposed to an artificial design with participants watching videos on a screen in a sitting position and/or indicating their crossing decision with a button or slider. We believe that this experimental design can be valuable and effective for future video studies examining vehiclepedestrian interaction.
Within the presented approach, it was possible to demonstrate the need for an eHMI for the communication between SDVs and pedestrians in an ambiguous traffic scenario. The eHMI concepts revealed significant differences in terms of COT, perceived safety, and User Experience (for a textual discussion, see Faas et al. [39]). Further, we validated our method's efficacy by showing that its results are not only comparable, but more differentiated than the results produced by a test track approach. Furthermore, our method displays a good level of self-reported naturalism. Thus, the presented method is validated as a suitable tool to make relative comparisons between eHMI concepts. We conclude that the method can be applied in future studies comparing eHMI concepts from a pedestrians' point of view.