How Can Autonomous Vehicles Convey Emotions to Pedestrians? A Review of Emotionally Expressive Non-Humanoid Robots

In recent years, researchers and manufacturers have started to investigate ways to enable autonomous vehicles (AVs) to interact with nearby pedestrians in compensation for the absence of human drivers. The majority of these efforts focuses on external human-machine interfaces (eHMIs), using different modalities, such as light patterns or on-road projections, to communicate the AV's intent and awareness. In this paper, we investigate the potential role of affective interfaces to convey emotions via eHMIs. To date, little is known about the role that affective interfaces can play in supporting AV-pedestrian interaction. However, emotions have been employed in many smaller social robots, from domestic companions to outdoor aerial robots in the form of drones. To develop a foundation for affective AV-pedestrian interfaces, we reviewed the emotional expressions of non-humanoid robots in 25 articles published between 2011 and 2021. Based on findings from the review, we present a set of considerations for designing affective AV-pedestrian interfaces and highlight avenues for investigating these opportunities in future studies.


Introduction
The introduction of autonomy in vehicles promises to increase the level of convenience and comfort for riders [1].However, the absence of human drivers in fully autonomous systems induces an interaction void around what used to be traditional communication strategies between drivers and other road users, such as eye contact and body gestures [2].The responsibility for communicating internal states and intentions in typically short, dynamic traffic scenarios is to a large extent, if not entirely, delegated from drivers to the self-moving vehicles themselves.
Researchers and manufacturers have been dedicated to developing additional channels to assist autonomous vehicles (AVs) in conveying their intent and awareness to surrounding road users, especially to pedestrians, who are considered as one of the most vulnerable yet frequent interaction subjects [2].Existing means that aim to support a safe and intuitive AV-pedestrian interaction are various, ranging from display technologies such as LED lighting patterns [3,4] and on-road projections [5,6], to borrowing anthropomorphic features such as moving eyes that follow the pedestrian's position [7,8] or displaying a smiling expression on the front of AVs [6,8].
While the existing solutions have been validated to compensate for the lack of driverpedestrian communication in different ways, emotions, a vital dimension in human-human interaction [9], have thus far been mostly disregarded in this growing area of research.Different from pragmatic channels, emotions have a unique role in affecting perception, empathy, decision-making, and social interactions [10].In fact, imbuing emotions into social robots is not rare in human-robot interaction (HRI).Prior work describes the ability to convey emotions as one of the indicators of socially interactive robots [11].Previously designed robots that are capable of expressing emotions vary in functionalities, appearances, as well as how they articulate emotions in interactional contexts.We reviewed a sub-domain of these robots -emotionally expressive robots that have a non-humanoid arXiv:2403.04930v1[cs.HC] 7 Mar 2024 form -aiming to translate the design paradigms for their emotional expressions into cues that AVs may employ in the affective dimension, as AVs are in essence self-moving, non-humanoid social robots.
The contribution of this paper is twofold.First, we systematically review 25 articles focusing on designing and evaluating emotional expressions for non-humanoid robots in the past ten years.We summarise emotion models, output modalities, evaluation measures, as well as how users perceived the emotional expressions in these studies.Second, based on the findings from the review, we propose a set of considerations for designing affective AV-pedestrian interfaces by adding emotions as an additional communication dimension.Our findings contribute to enhancing AV-pedestrian interaction and increasing social acceptance for the deployment of AVs in urban environments.

Background 2.1. Emotion in Social Robotics
Social robots have been utilised in many domains including healthcare, education, domestic environments, and public spaces [12].To bring the interactions with humans onto a more natural and engaging level [13][14][15], many of these robots have been integrated with the capability of displaying emotions.Eyssel et al. [16] showed that a robot expressing emotional states could make people feel closer to the robot, perceive it as having anthropomorphic traits and intentionality, and experience a more pleasant HRI process.Besides facilitating empathic inferences about internal states and intentions [17], emotions expressed by robots can subsequently elicit affective responses from humans [18][19][20] and impact the immediate environment where the interaction is taking place [21,22].With the increased perceived sociability [19], emotionally expressive robots can effectively establish closer bonds with humans and, more importantly, increase public trust and acceptance [9,20,22,23] in their deployment in our daily lives.

Humanoid Robots vs. Non-Humanoid Robots
Many humanoid robots, such as NAO [24] and Pepper [25], can employ readily available anthropomorphic features like body movement, facial expression or speech to exhibit emotions.While such physical embodiment may increase the recognition level of displayed emotions [17,21], a large class of social robots are tuned to be less-or nonanthropomorphic in order to match the functional requirements at their designated tasks [14,21,26,27].For example, rescue robots are usually small and tank-like [28], and domestic vacuum robots like Roomba are likely to be puck-shaped and able to fit under couches [21].In addition to the utilitarian aspect, implementing emotions using the anthropomorphic features in humanoid robots not only can be expensive and technically complex [14,27,29] but also needs to fulfill users' expectations on their level of anthropomorphism [17,30] and at the same time to minimise the feeling of creepiness known as the "uncanny valley" effect [19,30].Consequently, an increasing body of literature has focused on designing emotional expressions for non-humanoid robots to show affection.On the one hand, non-humanoid robots are anatomically unavailable to express emotions like humans; on the other, this reduces stereotypes of how emotions should be displayed and thus broadens the range of modalities across visual, auditory, and haptic channels that these robots may employ.In the context of this paper, we are interested in understanding what modalities have been used to encode emotions in non-humanoid robots and how well the resulting emotional expressions are identified and perceived by users.

Current AV-Pedestrian Interaction
With human drivers replaced by autonomous control systems, one important challenge for the social acceptance of AVs is to communicate their intent and awareness to nearby pedestrians and other vulnerable road users (VRUs) [2,6].A major direction in exploring effective communication strategies is the development of external human-machine interfaces (eHMIs) [2,6,31].Current solutions to AV-pedestrian eHMIs are manifold.For example, vehicle-mounted LED lighting patterns have been utilised to indicate vehicle modes, the awareness of a nearby pedestrian, or the intention to yield or move [2][3][4].Studies also investigated the use of eHMIs to convey messages by displaying pictograms [32] and texts [33,34].On-road projections have been investigated as a way to leverage traffic metaphors like crosswalks or stop signs [5,8,35].In one implementation, the road infrastructure was updated to collect data from AVs and to convey the information to pedestrians via a smart road [36].Some researchers also experimented with anthropomorphic features to restore current driver-pedestrian interaction patterns.An example for this is the implementation of moving eyes that follow the position of pedestrians at crosswalks [7,8].Other studies tried a printed hand that waved to indicate yielding [4] and a sign indicating a smile to inform pedestrians that it is safe to cross [6,8].
Those studies focused on the communication of intent and awareness through the use of operational cues, akin to the way traditional street signage does, which is designed to evoke immediate, and sometimes emotional, responses from users and further coordinate actions among interaction subjects.However, enabling AVs to express emotions as a communication strategy has not yet been addressed as a primary focus of research.This gap was also identified in a recently published design space for the external communication of AVs [37].The authors pointed out the need for "affective messages" (i.e.messages related to emotions) since such messages are highly important in interpersonal communication even if they do not necessarily carry meaning [37].

Why to Ascribe Emotions to AVs
The mass deployment of AVs in daily traffic environments could be hindered by their social acceptance related to not only technical aspects but societal factors as well [6].Using eHMIs to communicate intent and awareness reduces public scepticism by improving pedestrians' understanding of the machine's decisions and maneuvers, and therefore fostering safe interaction.Yet, overcoming people's psychological barrier towards AVs' deployment is not easy, and various forms of discrimination against AVs are still continuously witnessed.For example, local residents harassed and attacked (e.g.thrown rocks at) Waymo's self-driving cars during a public trial because of feeling uncomfortable or scared around them [38].Volvo's driverless cars were reported to be "easy prey" on the road and were bullied by other drivers with slamming on brakes or aggressive driving to force them into submission [39].Even being one of the most vulnerable road users, pedestrians were also found to take advantage of AVs' rule-abiding nature by crossing the road with impunity once they discovered that the cars were self-driving [40,41].Though intelligent agents, AVs are often considered as mindless machines following programmed rules or even, more generally, a piece of "creepy" technology that breaks the status quo that people are comfortable with [42].
Following findings from studies of social robots in other domains, a promising avenue to address some of these issues is to equip AVs with social traits, such as the ability to express emotions.This has the potential to shift people's perception of AVs as purely algorithmically driven agents towards intelligent social actors.Indeed, concepts for increasing AVs' sociability have surfaced in recent years.In 2014, Google announced a driverless car prototype that was intentionally designed like an adorable "Marshmallow Bumper Bots" with headlights like wide eyes and a front camera like a button nose, aiming to resemble a living being or a friend [42,43].Taking an even greater leap, in August 2021, Honda released a lively AV bot serving as both transportation and smart companion, conceptualised for the year 2040 [44].It has a large frontal face with animated emotional facial expressions and fenders with covered wheels like pet animal legs, radiating "the cute character of a playful puppy" [44].These efforts are striving to make AVs likable social agents and improve their acceptance by evoking people's empathy.In line with these concepts, increasing AVs' emotional expressiveness is likely to enrich their social characteristics [11,19,45] and improve their acceptability [9,19].
Apart from influencing perception and empathic understanding, emotion in HRI is known for its function to regulate and guide human decision-making and behaviour, biasing the interaction process away from negative or harmful results [10].For example, a majority of in-car affective human-machine interfaces (HMIs) use expressive cues such as emotional music, ambient light, or empathic speech to regulate drivers into an emotionally balanced state and thus promote safe driving behaviour [46].Similarly, AVs expressing emotions through external affective cues may help regulate the traffic climate [47], especially when other road users disagree with AVs' decisions (e.g.road rage towards the AV's maneuver [39]).Close to the strategy of using emotions or affects, some existing AV-pedestrian eHMIs have attempted to convey courtesy through textual messages like "Thank You" and "You're Welcome" [33], and "Please" [48], facilitating the cooperation between AVs and pedestrians.Hence, along with adjusting people's preconceptions, AVs' expressiveness should help regulate and guide the interaction process and eventually contribute to their functioning.This kind of approach follows Picard's definition of affective computing, suggesting that it is not about making machines look "more emotional" but about making them more effective [10].
The rich literature in affective robotics indicates that emotion encoding in AV-pedestrian interaction is possible.Indeed, humans have the innate tendency to attribute liveliness, emotions, intelligence, and other social characteristics to moving objects [13,[21][22][23]29,49].A close example to AVs are drones, also referred to as unmanned aerial vehicles (UAVs).These are small plane-like flying robots that show rich kinematics and functionalities [9].As personal drones are becoming more popular and ubiquitous in our daily lives [9,19], emotion encoding in human-drone interaction (HDI) has also been studied in recent years [9,19,50], especially for the purpose of increasing their acceptability [9,19].Through a systematic review of how drones and other non-humanoid social robots have displayed emotions to people surrounding them, we aim to investigate plausible emotions or affects as well as possible output modalities that could be considered for designing affective AV-pedestrian interfaces.

Method
To gain an understanding of current emotionally expressive non-humanoid robots, we reviewed relevant articles from 2011 to 2021 using a systematic search strategy.

Database Selection
To identify the most relevant publishers, we first queried Google Scholar as it covers a broad search across various sources of publications.We used the Publish or Perish software (version 7) [51] to request Google Scholar in order to extract the search results into a CSV file, as Google Scholar does not provide the functionality to download the results as a whole.We searched for "emotion" AND "robot" from 2011 to 2021.We extracted the most relevant 1000 results as it was the maximum number of results available for retrieval.We then counted the number of results per publisher, resulting in four top publishers (IEEE = 291, Springer = 176, Elsevier = 72, ACM = 69) followed by Google Patents (31), MDPI (25), and SAGE (22) while others were below 20.
We then searched within the databases corresponding to the top four publishers, i.e.IEEE Xplore Digital Library, SpringerLink, ScienceDirect (for Elsevier), and ACM Digital Library, using the same query string and time frame as in the first step.We checked the number of results and then excluded SpringerLink as it yielded over 10,000 results under the "Article" and "Chapter and Conference Paper" content types, which would have made a subsequent review impossible to achieve.This left us with three databases for the detailed article search and analysis (ACM = 1,427, IEEE = 1,103, ScienceDirect = 3,329).

Keyword Search Procedure
To identify potential article candidates for the review, we conducted a keyword search within each of the selected databases.Three main keywords were used: "emotion", "robot", and "non-humanoid".We also included synonyms that are commonly used to describe a non-humanoid appearance for robots: "non-anthropomorphic", "appearance-constrained", and "appearance constrained".We combined these keywords using AND/OR operators and utilised the advanced search feature in each database.We selected a time frame of the last ten years because: (1) 87% of the total results fell within the last ten years, and (2) we were interested in understanding recent trends in this growing discipline.The time frame for the final search was from 01/01/2011 to 16/07/2021.The search yielded a total of 225 results including research articles, posters, books, and other kinds of publications across the three databases (ACM = 77, IEEE = 77, ScienceDirect = 71).

Article Selection
We chose articles published in conference proceedings and journals that: (1) were written in the English language, (2) used a non-humanoid robot, (3) designed emotional expressions for the robot, and (4) evaluated the emotional expressions with empirical user studies and presented the evaluation results.In a first step, the lead author screened the articles.This process involved reading each article's title, abstract, and full-text to see if it met the selection criteria.In this process, we removed duplicated articles as some articles were published jointly by different publishers.We further excluded articles that proposed approaches for robots to sense or recognise users' emotions rather than express emotions, as well as articles that included a robot capable of displaying emotions but offered little information in how the emotional expressions were designed.The screening process resulted in the final collection of 25 articles that are included in our review.As a second step, two of the authors then reviewed and discussed the results, mapping them out on a digital whiteboard.No further changes were made to the article selection in that step.

Research Questions
The review presented in this paper is based on the following research questions.

•
What emotions are commonly expressed by non-humanoid robots?• How are the emotions displayed?• What measures are used to evaluate the emotional expressions?• What are the user perceptions of the emotional expressions?

Review of Emotionally Expressive Non-Humanoid Robots
This section presents the systematic review of the 25 articles in response to the research questions in Section 3.2.

Overview
An overview of the reviewed articles is provided in Table 1.The non-humanoid robots used in these articles varied in morphologies and functionalities :::::: (Figure ::: 1).Eleven (44%) articles adopted readily available robots developed in previous work [13][14][15]22,52] or commercially available [9,18,20,28,50,53], while other thirteen (52%) articles designed and prototyped robots for the specific purpose of investigating emotional expressions [17,19,21,23,27,29,45,49,[54][55][56][57][58].The remaining one article implemented the design from another robot using Lego robot parts [26].Two articles developed animations instead of using a physical embodiment to simulate robots (a car seat in [23] and a drone in [19]).Regardless of whether the robots had a designated functionality per se, four robots had a perceivable utilitarian function during the evaluation of emotions, including two voice assistants [20,52], one drone [9], and one car seat [23].Another two robots served as companions during the studies reported in the articles [18,45].The robots were ascribed with a set of distinct expressive behaviours corresponding to specific emotions.These expressive behaviours were encoded through a variety of output modalities, supporting inferences of the robots' internal states and broadening the range of feasible cues applicable to designing affective interfaces.Twenty-three (92%) articles investigated the encoding of multiple emotions and evaluated the effectiveness of one or more modalities in displaying the emotions, while the remaining two articles simply tested the perception and impact of a single emotion [53,56].To support a more interactive evaluation process, robots in eight (32%) articles were able to express emotions based on the behaviours of participants [13,15,18,20,21,45,52,56].
The purpose of robotic emotion design in fifteen (60%) articles was to explore the use of a modality or multiple modalities.Eight of these articles aimed to address the feasibility or effectiveness of the chosen modality/modalities in developing the emotional expressions [17,19,21,27,28,50,54,55], while the other seven of the fifteen articles focused on providing design strategies for using the modality/modalities to encode emotions [9,14,23,26,49,57,58].The remaining ten (40%) articles had a main purpose other than evaluating modalities.Six of them examined the influence of emotional expressions on user cognition [52,53] or behaviour [18,29,45,56].Two articles evaluated the efficacy of their proposed approach for the robot expressing emotions dynamically [13,15].As for the remaining two articles, one [20] aimed to understand user preference for the robot's personality traits, and the other [22] explored various contextual factors influencing user perception of the emotional expressions.

Emotion Models
To understand what emotions are commonly expressed by non-humanoid robots, this section presents the emotion models that guided the selection of emotions in the reviewed articles.We identified three emotion models, which are used to structure this section.
Two streams of models are based on previous literature, namely categorical models and dimensional models [13,22,26,49].We further added emotional personas as a third model since we found that some articles developed emotional personas for the robots and selected emotions in accordance with the personas to articulate corresponding personalities.
The most employed categorical emotion model is Ekman's six basic emotions [59] including anger, disgust, fear, happiness, sadness and surprise.These emotions are regarded as essential for human-human communication, easy to understand, and widely recognisable across cultural backgrounds [13,17,19,22].Besides using a validated psychological model, two studies referred to cultural conventions.Hieida et al. derived anger, joy, pleasure, and sadness from a popular Japanese idiom, ki-do-ai-raku [50], while Cauchard et al. [9] chose emotional states (brave, dopey, grumpy, happy, sad, scared, shy, and sleepy) from personalities found in Walt Disney's Seven Dwarfs and Peyo's Smurfs.

Dimensional Models
Some studies argued that discrete emotions were unable to cover a comprehensive space of emotions since human emotions comprise not only basic emotions but also subtle variations within each category [26,58].For instance, joyous, content, and jubilant describe different levels of happiness [13].Consequently, dimensional approaches were utilised in nine (36%) articles.This includes three articles [13,28,49] that conducted multiple studies using both categorical emotion models and dimensional emotion models.Two mostly adopted dimensional models are Russell's circumplex model [60] and Mehrabian's model [61].Russell's circumplex model positions emotions on a two-dimensional space: valence and arousal, where valence refers to the positive or negative connotation of the emotion [22], and arousal means the intensity of the emotion.Mehrabian's model distributes emotions in a three-dimensional space: pleasure-arousal-dominance (PAD), also known as Whittaker et al. [20] Olly "Big Five" personality theory colour, movement, sound valence-arousal-dominance, in which the dominance dimension measures the controlling or submissive nature of the emotion [19].For example, although both anger and fear are negative emotions, the former is perceived as dominant while the latter is submissive.
Instead of being presented with emotions directly, participants in these two articles were faced with stereotype emotional personas [20] in which typical emotions were expressed and discerned.Cauchard et al. [9] integrated selected emotional traits into three representative personas (adventurer, anti-social, and exhausted) for drones.For example, the behaviours of an adventurer drone showed a combination of happiness and bravery.In the work of Whittaker et al. [20], a voice-assisted home robot was assigned to three distinct personas (buddy, butler, and sidekick) which were derived based on the well-known "Big Five" personality traits [62] and differed in perceived emotions via speech, intonation, colour, and movement when responding to people's commands.

Output Modalities
In order to answer how the emotions are displayed, this section demonstrates a variety of output modalities across visual, auditory, and haptic channels in the articles for creating affective interfaces for non-humanoid robots to manifest emotions.We classified the modalities into the sensory categories with reference to how these modalities were sensed by users during evaluation.

Visual Modalities
Twenty-three (92%) articles utilised visual modalities including movement, colour, and facial expression.Many of the robots presented in the articles used a combination of visual modalities.
The most employed modality was movement, which was found in twenty-two articles.
Six articles used movement to encode emotions based on suggestions in prior work [18,20,22,27,49,55].For instance, for a robot placed in a natural play scenario with young children, Boccanfuso et al. [18] used movements that were previously reported to be indicative of emotions in children.Tan et al. [49] designed shape-changing movements by reviewing biological motion studies that had demonstrated relations between emotions and shapechanging parameters such as velocity and orientation.In addition to referencing previous findings directly, four articles created emotional movements based on validated knowledge systems including the Laban's movement framework [63] which was adopted in three articles [23,26,50] and the Interaction Vocabulary [64] used in one article [9].Furthermore, metaphorical mappings were utilised in four articles [14,17,28,52].For example, Löffler et al. [17] used conceptual metaphors such as "joy is up and active" and "anger is hot fluid in a container" to develop movement patterns.Shi et al. [52] designed emotional movements for text boxes on a smart-phone based voice assistant using affective human body expressions.Besides prescribing movement-emotion mappings in advance, four articles explored how users specified the relationships between emotions and movements [21,28,54,57].Despite the non-humanoid form of the robots, six articles designed emotional body expressions [13,14,26,45,50,56].Except for the body expressions for drones in [50] which still remained in the realm of mechanical movements, the other five articles followed anthropomorphic or zoomorphic behaviours such as stretching the "neck" for a lamp-like robot to show curiosity [45] or borrowing emotional behavioural patterns from dogs [14].
Coloured lights were used in seven articles.Three articles [17,27,28] corresponded one emotion to one colour, e.g.anger-red, sadness-blue, calm-white, etc.Among them, the two articles by Song and Yamada [27,28] generated the encoding based on prior research while the work of Löffler et al. [17] referred to conceptual metaphors like "joy is light and warm" and "fear is darkness".Additionally, four articles [18,20,22,56]  with various shapes and colours for their small theatre robots and identified the mappings with emotions using online questionnaires, while the animated eyes on a phone mounted on the robot in [53] simply showed paying attention and conveyed an apologetic state in a scolding scenario.
In both articles by Frederiksen and Stoy [53,56], NLUs were used to show single emotions (e.g.alerting audio signals to express fear [53] and augmented naturally occurring sounds of the robot to convey remorse [56]).
Two articles utilised music that was validated to evoke emotions before [15,18].
Ritschel et al. [15] used previously proposed melodies to show emotions and intentions and personalised the timbre dynamically according to user preferences.Boccanfuso et al.
[18] produced a set of synthesised music to enhance the robot's emotional expressions in the play environment with young children (e.g. a happy state was conveyed with a piece of music with frequent and smooth changes in a moderate to high pitch).
Vocalisations were found in two articles [18,20].Whittaker et al. [20] implemented humanoid speech and intonation to articulate three distinct personas in the voice-assisted home robot, while Boccanfuso et al. [18] added non-linguistic child vocalisations, such as giggle and crying, simply to augment the main sound cue (i.e.music).

Haptic Modalities
Haptic modalities, including haptic movements and textures, were used in four (16%) articles.In one of those articles, the robot was covered in a naturalistic fur to mimic furry animals and invite user touch only to assist in the evaluation of the primary modality (i.e. breathing behaviours) [54].The other three articles used haptics as main cues [55,57,58].In the work of Sato et al. [58], they investigated how users mapped combinations of haptic movements (e.g.tap rapidly/slowly) and textures (e.g.aluminium, clay, etc.) to a list of discrete emotions.Chase and Follmer [55] combined haptic movements with visual movements to test the perceived pleasure-arousal-dominance (PAD) for properties like stiffness and jitter.Kim and Follmer [57] assessed perceived PAD in a swarm of small haptic devices by changing parameters including the number of robots, force types, frequency, and amplitude.

Evaluation Measures
All of the reviewed articles included user evaluations to assess the quality and impact of the emotional expressions.This section discusses the evaluation measures in terms of use scenarios, experimental tasks, and evaluated aspects.

Use Scenarios
Eleven (44%) articles created use scenarios for robots displaying emotions in the evaluation.Three of the articles embedded the emotional expressions into the robots' tasks [9,20,52].For instance, the drone in [9] displayed emotional profiles during different flying tasks.The two voice assistants in [20,52] conveyed emotional states during tasks activated by users' spoken commands, e.g.setting a reminder [52] and playing a music [20].
The car seat in [23] showed expressive movements when greeting its driver.Peng et al.
designed plots for the robot actors in the robot theatre to show contextual emotions [29].
The conversation companion robot in [45] responded to humans' vocalics during a conflict conversation between couples.The robot in [56] was placed in a scolding scenario in order to convey remorse.Two articles intentionally designed triggers for the emotions in order to help with users' understanding, such as a scenario where the robot showed positive emotions after finishing its task successfully [26] and a high volume explosion sound to evoke the robot's fearful reactions [53].Using a more natural context, Boccanfuso et al.
placed the robot into an unstructured play environment with young children for eliciting their affective responses [18].Though the robot in [13] expressed emotions without contexts in the first two studies, it responded to users' speech in the third one.Apart from these eleven articles, the robots' emotional expressions in other articles were devoid of any use case, presumably with the purpose of evaluating the design without bias [21,22].

Evaluated Aspects
Twelve articles (48%) gauged the recognition level of the emotional expressions, that is, the rate that an emotion was correctly recognised through its expression.In nine articles, participants matched presented expressions with a list of emotion names [9,14,17,19,23,26,27,49,58], whereas the other three articles asked participants to rate how much they thought the presented expressions matched the prescribed emotions on scales [13,15,22].
Two articles [21,50] asked participants to rate each emotional expression on a scale of pairs of opposite adjectives like "tired vs. energetic" and "hasty vs. leisurely".Other traits such as emotional and cognitive engagement [52], perceived urgency [57], and the impact of emotions on surroundings [19,56] were also measured.Additionally, some articles assessed the level of anthropomorphism of their robot.For instance, two articles employed HRI metrics for gauging the perceived anthropomorphism, likeability and safety of the robots [55,57].Similarly, social human character traits like friendliness, cooperativeness, sociability, etc., as well as participants' comfort level with the robot were assessed in one of the articles [45].
Three articles analysed participants' behaviours during their interaction with the robots using video coding [18,45] or direct observation [21].Through the analysis of video coding data, Boccanfuso et al. [18] characterised different play patterns and affective responses of young children, and Hoffman et al. [45] collected verbal references to the robot during simulated couple conflicts.In the direct observation procedure in [21], think-aloud was used for participants to express feelings and thoughts during their interaction with the moving robot.Moreover, nine articles presented and discussed participants' comments on their experience with the emotionally expressive robot, either from interviews or openended questions in questionnaires [9,15,[19][20][21][22]29,45,54].

User Perceptions
To discuss user perceptions of the emotional expressions in the reviewed articles, we summarise common findings reported including the recognition level of emotions and other important aspects associated with user perceptions.

Recognition of Emotional Expressions
Basic emotions that are relatively obvious and universal [9,49] were best recognised by participants, including happiness [9,14,19,22,26,29], sadness [13,15,19,22], and anger [22,29].However, sadness was at the same time found to be the least recognisable emotion in three articles [14,26,29].Two other negatively valenced emotions, fear [13,19] and disgust [19,49], were also the most difficult to identify.This is in line with research in psychology showing that some emotions are more easily recognised than others [19] and that negative emotions tend to be recognised slower [19] and are less consistently interpreted correctly [23] compared to positive emotions.Furthermore, emotions like surprise, coolness, and affirmation that are believed to be more abstract and need more contexts to interpret [22,57] were also rated low in terms of recognition [15,22,23].
When combining emotions with modalities, we found that the interpretation of emotional expressions was mostly the intuitive comprehension of the emotion-expression relationship.For example, in most studies where movement was used, participants were likely to associate high speed or high frequency with emotions with high arousal such as excitement or anger [9,21,22,26,27,49,57], while a low speed or an avoidance behaviour were usually interpreted as less intense emotions like sadness or fear [22,26,28].Similarly, participants were able to understand intuitive, conventional mappings for other modalities.For instance, a falling sound or slow tempo were usually interpreted as conveying sadness [15,17,27], and bright and fast-changing colours were commonly associated with joy [17,20,22].

Sociability
Many articles reported that participants attributed liveliness, internal states, and sociability to the emotionally expressive robots.For example, participants in one article [54] referred to the emotional furry robot as having "a rich inner life" and reminding them of pets.In the storytelling section in another article [29], children interpreted motivations, intentions and emotions from the performance of the robots in the robot theatre.In [45], the companion robot was perceived as friendly, warm, and capable of forming social bonds and attachments.The drone with facial expressions in [19] was described as an agent with autonomy, consciousness, and cognitive and behavioural abilities.
Nevertheless, disengagement was found when robots displayed certain emotional expressions.Harris and Sharlin [21] reported that nearly half of their participants showed boredom when presented with slow and repetitive movements.For the voice-assisted home robot in [20], the "sidekick" persona which had a low amplitude voice and used slow movements also failed to engage some participants emotionally.Besides, Boccanfuso et al. [18] suggested that negative emotions that elicited frustration and annoyance might cause disengagement in children.In the investigation of emotional and cognitive engagement of the emotionally expressive voice assistant, Shi et al. [52] found that emotions with positive valence and high arousal might help robots to establish emotional connections with humans.

Contexts
Context was regarded as an important factor influencing users' interpretation of emotional expressions.In [26], the recognition rate of emotions improved dramatically when emotions were displayed within an appropriate context compared to displayed alone.
Tan et al. [49] also suggested that adding a use scenario could help users identify and disambiguate emotions.In order to interpret relatively more abstract emotions such as fear and surprise, participants in [57] tended to combine the haptic expressions with other factors such as motion paths and the contact locations of the haptic stimuli to obtain more contextual information.Hoggenmueller et al. [22] discussed a range of contextual aspects that impacted users' comprehension, including spatiotemporal context, interactional context, as well as contexts related to users' background.
Additionally, participants were found to create narratives for the emotional expressions in several articles.For instance, participants in [19] tended to develop stories to make sense of the emotions, such as speculating about the cause of fear and surprise.Children who watched the theatre play performed by multiple robots in [29] generated conjectures about robots' social relationships according to the display of emotions.After analysing participants' verbal descriptions of the expressive behaviours of the robot, Bucci et al. [54] concluded that narratives made about the motivation and situation of the robot could heavily influence the perception of emotional expressions.

Considerations
The reviewed studies show rich evidence for designing affective interfaces for nonhumanoid robots to communicate emotions.This serves as a foundation and offers guidance for adding an emotional dimension to AV-pedestrian interfaces.In this regard, we aim to provide preliminary considerations around core elements for designing affective AV-pedestrian interfaces.First, we draw on findings from the review to provide a set of considerations for designing emotional expressions for AVs as social robots.Then, based on both the review and current AV-pedestrian communication strategies, we present a set of considerations that take a broader range of factors into account for building affective AV-pedestrian interfaces.

Considerations for Designing Emotional Expressions
Based on findings from the review of emotional expressions of non-humanoid robots, we propose five considerations for imbuing AVs with emotions with regard to what emotions to communicate, and how to communicate emotions.
Include Basic Emotions: More than half of the reviewed articles employed basic emotions, which were mostly derived from Ekman's six emotions [59].Particularly, the user evaluation in these articles showed that happiness, sadness, and anger were most easily recognised [9,[13][14][15]19,22,26,29].In general, such emotions are regarded as easy to understand, recognisable across different cultural backgrounds, and important for intuitive human-robot interaction [13,17,19,22].Hence, we suggest that basic emotions or basic emotion models should be considered when deciding what emotions to attribute to AVs.
Use Negative Emotions for a Reason: In some articles where negative emotions were triggered for a reason, these emotions demonstrated important contributions in affecting user behaviours, e.g.defused conflict situations when showing fear [45] or remorse [53], and evoked sympathetic behaviours when displaying sadness [18].Indeed, robot emotions can cause humans to mirror the emotional state of the robot [19,22,53] or to reflect on their own behaviours [19,45,56].However, when negative emotions were displayed without reasons, users speculated about the cause of the emotions [19], involuntarily interpreted them as positive valenced [22], or even sometimes concerned about their own safety under aggressive emotions [21,23,55].Therefore, providing reasons for the display of negative emotions can be essential for the intended user interpretation and the subsequent influence on user behaviours.
Provide Contexts for Abstract Emotions: Some reviewed articles showed that emotions such as surprise, disgust, coolness, and affirmation, though some included in Ekman's six basic emotions, are more abstract in connotation than those universally recognisable emotions (e.g.happiness and sadness) and thus need more contexts for users to interpret them as intended [15,22,23,57].When displayed without context, these emotions can seem to be ambiguous [49], and the interpretation can be more greatly biased by the user's cultural background and previous experiences [22].Hence, the expected user perception of abstract emotions in AVs can hardly be isolated from an appropriate context, a context relating but not limited to the task that the AV is performing [26], the immediate surroundings [22], and cultural norms [30].
Combine Multiple Modalities: Several studies compared the recognition rate of emotions between using single and multiple modalities [17,27,28].Results showed that people recognised the multi-modal expressions more easily and were more confident in their judgement.
Indeed, when multiple modalities are combined together to convey a certain emotion, they tend to serve as an "amplifier" to each other [28] and reassure people of their interpretation [17,27].Even in some studies where only a single modality was tested, some participants still referred to partial cues other than the primary modality to support their inferences [19,57].Hence, using multiple modalities can be beneficial to clarifying the emotional state of the AV and help increase users' confidence in making fast and safe decisions accordingly.
Employ Intuitive Encoding: The review showed that emotions were best interpreted when the encoding followed a conventional and intuitive assignment of expression-emotion relationships, such as using colourful lights or uplifting music for happiness [15,17,18,20,22] and slow movements or dark colours for sadness [17,22,27,28].Nonetheless, such "intuitive" mappings can be culture-specific [17,28,29,56].For instance, some studies argue cultural differences in mapping colour to emotion [29].Besides, there are also conceptual models or universal associations that are very little dependent on culture or can be found in many languages [17,29].Overall, employing encoding that is intuitive to users is important in AV-pedestrian interaction, as such interaction requires immediate decisions in dynamic traffic situations.Similar concerns can be found in existing AV-pedestrian eHMIs.For example, a participant in [4] commented that during the crossing scenario, they had to frequently look at a sheet which specified the mapping between multiple LED colours and vehicle states, and they further pointed out that it was unrealistic to bring a sheet in real life.Thus, it should be carefully considered to use encoding rules that might require a learning process.

Considerations for Building Affective AV-Pedestrian Interfaces
Drawing on the review and taking into account current AV-pedestrian communication strategies, this section presents a set of considerations around a broader range of factors that may contribute to designing affective AV-pedestrian interfaces.
Align with AV's Primary Functionality: AVs' primary function (i.e.transportation) is likely to influence people's interpretation of their social traits during the interaction.In the review, only four robots had a perceivable primary function during user evaluation [9,20,23,52].
However, as shown in [21] and [22], when interacting with a robot, people are likely to speculate about the functionality or "purpose" of the robot and interpret its emotions accordingly.This suggests that people would expect the emotional expressions of a robot to be coherent with its functionality.In the example of a voice-assisted home robot [20], people preferred the smart helper to show conscientiousness and agreeableness through its expressive cues.Therefore, the affective interface should account for its interplay with AVs' major utilitarian purpose, i.e. serving as a secondary function [22] and facilitating the operation of the primary one.
Understand Pedestrian's Expectations: Pedestrians' expectations of AVs' emotions can differ from passengers sitting inside the vehicle or other road users.Various interaction contexts have been found in AV-pedestrian interaction, such as interaction contents, road types, other vehicles or road users, etc [67].If AVs are to use emotions to support the interaction with pedestrians, it is important to consider the legitimacy of the emotions in those interaction contexts from the perspective of pedestrians, that is, to understand what emotions pedestrians expect to see.Similar considerations have been reported in human-drone interaction (HDI).For instance, in the design of emotional drones, Cauchard et al. [9] left aside emotions that did not seem to be applicable to HDI, such as disgust.Herdel et al. [19] also stated the concern about whether users would envision certain emotions (e.g.fear) to appear in drones.Besides, emotional patterns in existing driver-pedestrian interaction may provide a vital reference for conjecturing pedestrians' expectations of AVs' emotions, as it is essentially the drivers' role that the autonomous systems take over when interacting with pedestrians socially.
Refer to Existing eHMIs: Although there is no empirical evidence yet for which modality is effective for an affective AV-pedestrian interface, efforts made in current eHMIs for communicating AVs' intent and awareness offer various solutions, such as vehicle-mounted displays [3,4,7,32,34], on-road projections [5,8,35], smart-road interfaces [36], and wearable devices [31].These interfaces provide a range of feasible modalities across visual, auditory, and haptic channels to support the potential communication of emotions.Nevertheless, many previous studies show that regardless of the presence of eHMIs, pedestrians still greatly rely on changes in the movement of AVs (e.g.speed) [2,34,[68][69][70].Thus, the affective interface may prefer movement cues to encode emotions (e.g.movement-related "gestures" [31]), or use emotions to amplify the intention of movements (e.g.display a happy face to show friendliness when the vehicle is yielding).Overall, designers should refer to existing AV-pedestrian eHMIs and their corresponding findings to understand the usability of different types of interfaces when designing for the affective dimension.
Make It a Reciprocal Process: Current eHMIs for conveying intent and awareness to pedestrians are mostly designed in a proactive manner.For example, in a crossing scenario, most visual displays and on-road projections provide information about the deceleration progress of the vehicle [6,34,35] or inform pedestrians when it is safe to cross [5,32].However, it is also important for AVs to be responsive to pedestrians' intentions [68].A recent study by Epke et al. [68] used human gestures and eHMIs to form a bi-directional communication between pedestrians and AVs.The study found that participants preferred the case where an approaching AV acted (i.e.displaying "I SEE YOU" or yielding) in accordance with pedestrians' hand gestures.More importantly, the communication of emotions is itself a reciprocal process in which emotional responses can be evoked between the two subjects [19].Hence, AVs should have the capability to express emotions not only proactively but also in response to the behaviours of pedestrians or contingencies in their surroundings.

Table 1 .
Articles in the Review of Emotionally Expressive Non-Humanoid Robots [18]enmueller et al.[22]developed colour patterns with animation effects, e.g.blur and fade green and yellow colours for showing disgust, and Boccanfuso et al.[18]used bright and colourful lights with high intensity to convey happiness.For generating the mappings between multiple colour properties and emotions, Whittaker et al. conducted an online user study to decide colour patterns for different emotional personas while the other three articles combined multiple visual properties of coloured lights to reflect single emotions.For example,