Modelling User Preference for Embodied Artiﬁcial Intelligence and Appearance in Realistic Humanoid Robots

: Realistic humanoid robots (RHRs) with embodied artiﬁcial intelligence (EAI) have numerous applications in society as the human face is the most natural interface for communication and the human body the most e ﬀ ective form for traversing the manmade areas of the planet. Thus, developing RHRs with high degrees of human-likeness provides a life-like vessel for humans to physically and naturally interact with technology in a manner insurmountable to any other form of non-biological human emulation. This study outlines a human–robot interaction (HRI) experiment employing two automated RHRs with a contrasting appearance and personality. The selective sample group employed in this study is composed of 20 individuals, categorised by age and gender for a diverse statistical analysis. Galvanic skin response, facial expression analysis, and AI analytics permitted cross-analysis of biometric and AI data with participant testimonies to reify the results. This study concludes that younger test subjects preferred HRI with a younger-looking RHR and the more senior age group with an older looking RHR. Moreover, the female test group preferred HRI with an RHR with a younger appearance and male subjects with an older looking RHR. This research is useful for modelling the appearance and personality of RHRs with EAI for speciﬁc jobs such as care for the elderly and social companions for the young, isolated, and vulnerable. contradict the therefore, it is likely that, although displayed less than subjects with higher support the of Series


Introduction
Numerous scholars suggest that emotionally responsive artificial intelligence (EmoAI) in human-robot interaction (HRI) reduces negative perceptual feedback as people feel an affinity towards realistic humanoid robots (RHRs) that can simulate empathy [1][2][3][4][5][6][7][8]. The EmoAI approach is founded on behaviours in human sociology as communication, personality, and comprehension help promote understanding and empathy during human-human interaction [9][10][11][12]. Thus, people empathise more with RHRs than non-anthropomorphic robots as humans feel an innate association with machines that look human, owing to the psychological drive to socialise and form relationships with other humans [13].
In support, RHR Sophia's ability to discuss its visual and functional limitations with humans helps people empathise with the RHR's preternatural qualities [14,15]. This design consideration is significant to the progression of RHRs as research in robotic AI has predominantly focused on simulating human cognition and continually neglects the significance of EmoAI in promoting natural HRI, which heightens the potentiality of adverse feedback owing to emotionless robotic AI [16]. Masahiro Mori's [17] uncanny valley (UV) hypothesis accounts for the negative psychological stimulus propagated by RHRs upon observation, as the more human-like the RHRs appear, the higher the potential for humans to feel repulsed by their appearance.
Quality aesthetics is a foundation for reducing the UV in RHR design [18,19]. However, although the aesthetics approach is a viable first stage for increasing the authenticity of a RHR's appearance, it neglects the importance of naturalistic functionality and movement. Per the UV, poor functionality supersedes good aesthetical design in the same way that quality aesthetics become secondary to inadequate functionality [20,21]. This condition is significant as RHRs are developed in different configurations. For example, waist up models such as the Robot 'C' series by Promo-bot Russia, launched in 2020, are designed as front desk assistants and do not require lower body functions. However, human likeness also applies to other facets of RHR design such as movement. For instance, although Atlas of Boston dynamics bears little aesthetical human resemble, the robot has highly humanlike dexterity and movement. Thus, the term 'realistic' is applicable to different factions of RHR engineering outside of appearance, that is, realistic movement, speech, and AI.
Therefore, the consideration of both appearance and functionality is crucial in developing greater human-like RHRs and reducing the UV [22]. However, cultural background influences the UV as observed in many studies conducted by Eastern scholars, which often rate lower levels of the UV than in Western cultures [23][24][25][26][27][28]. Furthermore, many scholars argue that children are less susceptible to the UV as they are naturally more curious and accepting of RHRs than adults owing to a lack of media influence and risk perception [29][30][31][32][33]. Thus, exposing children to RHRs at a young age is a methodology for ethically reducing the UV as it builds a foundation of understanding before media influence [34,35].
Following these findings, developing emotionally responsive RHRs with higher degrees of visual and functional human-likeness has the potentiality to increase affinity and reduce the UV, which may prove essential in enculturating RHRs into society and developing RHRs with higher degrees of human likeness. Nevertheless, little practical research evaluating user preference and the influence of personality and appearance in automated RHRs with embodied artificial intelligence (EAI) exists in HRI. Therefore, this study explores a significant gap in current HRI, EAI, and RHR research and critically investigates this area by outlining the processes utilised in developing EAI personalities for RHRs, substantiated by the results of an HRI experiment measuring the influence of appearance and personality type in HRI.

Embodied Artificial Intelligence and Emotional Artificial Intelligence
Until recently, the primary focus of AI research was on creating algorithms that can compute data that humans are incapable of calculating [36]. However, a shift in AI research and development towards emotionally responsive systems with human-like personalities has become a critical factor in developing AI systems that interact naturally with humans [37][38][39]. Thus, EmoAI applications are better suited for situations that involve human communication and sensitivity compared with traditional forms of robotic AI [40][41][42]. For example, EmoAI systems such as 'Cogito' monitor emotional cues in the user's speech during telephone conversations to respond to users naturally and make human-computer interaction (HCI) more engaging [43]. Comparatively, an EmoAI system named 'Surrportiv' implements natural language processing (NLP) to converse with users. It adjusts the tonality and empathy of the speech synthesis output to help tackle situations such as user stress and anxiety [44].
A similar EmoAI support system named 'Butterfly' utilised in business environments monitors staff stress levels, collaboration, creativity, and confliction between co-workers. The system offers help and guidance to negotiate stressful work situations by sending employees text messages [45]. However, restricting the interactive capabilities of an EAI in HCI to a single stimulus such as speech or text neglects many fundamental communicative processes observed in natural human-human interaction such as facial expressions (FEs), attention, gesturing, and eye contact [46]. Thus, embodying This proposition is essential for RHR development as market predictions indicate a substantial rise in the manufacturing, sales, and development of RHRs with EAI over the next five years [71].
For example, industrial robots have the potential to generate feelings of anxiety in the workplace as they cannot provide the emotional support or understanding of human co-workers [92][93][94]. Therefore, a shift towards emotional industrial robots such as 'Sawyer' and 'Baxter' that emulate human FEs are reducing the anxiety between workers and machines in factory environments [95]. These ethical factors are crucial in the sociological integration of humans and RHRs as, according to various studies [96,97], assimilation is a critical aspect in human-robot integration because, the fewer the distinctions between humans and machines, the higher the potential for acceptance [98,99].

AI and Natural Language Processing System Design
The RHRs developed in this study are named Baudi and Euclid, and discussed in press releases [100][101][102]. The RHRs implement Amazon Lex Deep Learning (DL) AI and Amazon Polly speech synthesis (SS) software to converse with people naturally. The personalities and interests of the RHRs reflect their age (appearance and SS) to measure how participants respond to different personality types during HRI. Thus, Baudi's interests are music, food, and travel, and Euclid's interests are in art, poetry, and literature. The DL algorithm in amazon Lex permits operators to access Amazon's large cloud-based databank of information relating to these topics and add additional information to the data sets, depicted in Figure 2. The conversational AI systems function using a series of intents, which are actions the user wants to perform; sample utterances, which are examples of things people may say to convey the intent; slots and containers, which are data banks of information relating to specific words; and topics and fulfilments, providing an appropriate answer/s using the information provided by the user.
Moreover, Amazon Lex is a commercial product designed for business, and the UI is mostly inaccessible compared with other open-source AI applications. This configuration makes developing conversational AI for general-purpose communication problematic as the slots defined by the program are mainly for specific types of business such as restaurants, music shops, and book shops. Therefore, mapping together these functions to create a personality depends on selecting and adapting slots and instances, linking them together, and building an EAI personality type and interests around those features. Furthermore, Amazon Lex limits the maximum amount of intents to one hundred questions per chatbot. This limitation is also problematic as it reduces the scope of potential questioning and has a high potential to produce incorrect or repetitive responses. However, this issue is negotiable by increasing the number of slots, sample utterances, and fulfilments per inquiry.
Thus, configuring a separate chatbot for different elements of the robot's personality, emotions, and interests provides a broader range of interaction during HRI. This approach is crucial in designing an EAI personality as Amazon Lex is restrictive owing to its closed-software status and UI design. However, the quality and accuracy of the DL AI system are highly effective for developing conversational AI for HRI. Furthermore, the Amazon Lex AI system integrates automatic speech recognition (ASR), natural language understanding (NLU), dialogue manager (DM), natural language generation (NLG), and SS into a unified system. These components are of significantly greater quality than many open-source third-party applications and function seamlessly on a cloud-based system, which reduces system delay and system load.
Finally, Amazon Lex is compatible with Amazon Rekognition, ML robotic vision software to enhance the sensory capabilities of the EAI system for RHR design. However, Amazon Rekognition is expensive and difficult to calibrate and is outside of the scope of resources for this study, but may provide a useful enhancement for future studies. Therefore, although Amazon Lex has accessibility issues, the system speed, extensive databases, intuitive DL, cloud-based operating system, highly human-like SS models, and optional machine vision libraries supersede similar conversational AI frameworks such as Googles Dialogflow and IBM Watson.

Human-Robot Interaction Experiment
The objective of the HRI experiment is to measure how participants perceive the authenticity, appearance, likeability, and personality of the RHRs. This research is significant to the field of HRI as there is a lack of up-to-date practical study into user preference implementing automated RHRs. The experiment starts with the participant seated in front of a raised table between two RHRs shown in Figure 3 and two computer screens.
A galvanic skin response (GSR) electrode is attached to the middle and index finger of the right hand of test subjects to measure skin conductance. The Affdexme facial expression analysis (FEA) system in the eye of Baudi and a webcam in situ to Euclid can detect and track the test subject's FEs and rate of attention. A second camera system positioned to record the test procedure from a wide-angle is set to monitor the test environment for post-experiment analysis.
The RHRs activate, and the heads and eyes automatically turn towards the participants using the face tracking system developed in Arduino, Microsoft Kinect, and Microsoft Studio. The Pololu microcontrollers activate, and the RHRs start to blink, raise eyebrows, and make subtle cheek movements. The participants press the start button on the handheld controllers and say 'Hello' to initiate the AI system. The first stage of the practical HRI experiment divides into four 5-min procedures founded on the evaluation time of the 1950 Turing test, shown in Table 1.  The aim of Sections 1 and 3 of the first stage of the HRI experiment is to assess how the performance of the EAI systems and how participants physiologically and psychologically respond to the RHRs using the biometric sensors. The participants speak to the robots using their voice, and when finished, they press a button on a hand-operated controller to notify the AI system that the conversation has finished and there is a short pause to allow the AI program time to respond.
The manual handheld control system provided greater cohesive data input than implementing time/speech-controlled ASR as stammering, pauses, mumbling, and incomplete sentences had a notable impact on generating incorrect and please repeat responses. This approach is vital in this study as gaining utilisable and concise data are key for assessing and enhancing the EAI personalities.
Sections 2 and 4 are the gamification elements of the HRI experiment and require the participants to engage with the robots on a much more interactive level. The game implemented in this study is called the animal guessing game and participants are requested to ask the RHRs a series of questions pertaining to the identity of the animal the RHRs are 'thinking' of; for example, if Euclid is thinking of a giraffe, the participants may ask the robot questions such as, 'does it have four legs', 'does it live in the sea', or 'can it fly', and the robot responds accordingly with yes or no answers until the participant can correctly identify the animal. Throughout the first stages of the experiment, the correct, incorrect responses, and 'please repeat' AI responses are recorded in Amazon Lex statistics; these data are then used to improve the robots' AI after each interaction.
The second stage of the experiment is a computerised questionnaire in which participants discuss the HRI experience and personal preference and evaluate the appearance, movement, AI, and speech of the robots. During the questionnaire stage, the robots continue to track the head movements of the participant, perform FEs using the Pololu, and speak to participants if requested to aid in answering the questionnaire. This novel interactive evaluation approach permits test subjects to engage with the RHRs while filling in the survey, compared with detaching the participant survey evaluation from the HRI test. Numerous studies across HRI implement mixed methodological approaches to help discern trends in large data sets and support and question participant taxonomies [3,[103][104][105].
In accordance, this study employs a mixed methodology approach to data analysis, using both quantitative and qualitative methods. This study implements a GSR system and two camera-based FEA applications to measure the participant's emotional responsivity on a fluctuating scale during the HRI examination, which exports into graphs and data fields (.CSV, .TXT, and .PDF). These data are quantitative and reinforced using qualitative inquiries of the HRI survey to verify outcomes. The results of quantitative dichotomous questions are statistically analysed by employing Cronbach's Alpha to examine internal consistency, where lower than 0.60 is unacceptable, between 0.60 and 0.70 is undesirable, between 0.70 and 0.80 is respectable, and >0.90 is excellent [106]. Standard deviation and the margin of error are also factored: 0.0 exact comparison, <1.20 acceptable, and >2.0 unacceptable [107], shown in Supplementary Materials Tables S1-S5.

Population Sample, Recruitment, and Participant Restrictions
The population sample size for the primary survey is 20 test subjects; based on the 20 human interrogators implemented in Ishiguro's total Turing test (TTT) [108] for androids, Harnad's TTT [109], Harnad & Scherzera's robot Turing test (RTT) [110], and Schweizer's truly total Turing test (TTTT) [111] for RHRs. The population sample sizes of THT and HTT are incompatible with this study as these approaches do not evaluate appearance, movement, speech, or AI. Similarly, the sample size of the most compatible and recent research [2] proved unsuitable for this study as the experiment neglects AI, robotic vision, tracking, speech synthesis, and biometric data gathering. Although the population sample for this study is relatively small, the 50-question questionnaire and biometric devices collect large sums of data to yield confirmable results. Gay's [112]  Participants were required to have 20/20 vision or corrected vision to evaluate the authenticity of the RHRs and be aged over 18 years to satisfy the terms of the ethics agreement. Recruitment was open to all individuals of different backgrounds, gender, cultures, abilities, and nationalities. However, the ability to speak fluent English is a condition of this examination for the NLP to function accurately. Per the limitations of the Turing test, a selective recruitment and screening procedure ensured participants had sufficient background knowledge of the field of robotics and AI for effective analysis in mixed method data research [113]. Applications from students and professionals within robotics and AI and engineering were given preference over applicants in lesser-related areas. A short discussion over e-mail regarding the participants' background, interests, and prior experience of robotics and AI was conducted before acceptance.

Participants Profiles
Per the ethical requirements for approval, all test subjects were over 18 years of age. The total number of experiments conducted was 21; this is owing to one participant not completing the HRI test because of a strong regional accent, which was unreadable by the NLP system. The subject was made aware of this issue and continued using type-written responses. However, this approach did not generate speech synthesis responses from the RHRs, which is vital for the HRI experiment, as Baudi's speech synthesis accent is slow and low in tone and Euclid's intonation is higher with a faster speech rate. Thus, as the test subject was unable to evaluate the RHRs' speech synthesis during HRI, the results were withdrawn from the dataset.
The number of completed experiments is 20, equivalent to the sample of the TTT and TTTT; this includes GSR, FEA, and a complete HRI questionnaire. Table 2 indicates the background of each test subject, gender, age, and previous experience of those taking part in HRI research. As there are no similar HRI studies that implement two automated RHRs with conversational AI, all data are extracted, grouped, and analysed to a high degree for future research. The gender ratio for this study is 11 (55%) males and 9 (45%) females; as there is one more male test subject than female, participant responses and results that are examined by gender group require significant differences in outcomes to be considered viable. The age ranges group into two equitable series of 18-27 (10: 50%) and 28-56 (10: 50%) to provide a comparative analysis between a higher and lower age group. However, as there is a greater diverse age range in the 28-56-year-old group, only outcomes with significant differences in results are considered viable. The analysis of the population sample indicated a high percentage of test subjects of Western ethnicities with English as a native language 18 (90%) compared with other ethnic backgrounds with English as a second language (10%). Therefore, ethnic and cultural diversity is an issue in this data set. However, the study was open to people of all cultures and backgrounds, and no preference was given to individuals from specific cultures or ethnicities. Ninety-five percent of participants indicated no previous experience in taking part in HRI experiments.
This result is not surprising per the accessibility to RHRs and the narrow recruitment factors of HRI studies. However, one participant (5%) indicated previous experience in HRI within higher education. All participants underwent a background check, as outlined in the Turing test recruitment procedure. During recruitment, potential participants were asked to disclose their current/previous level of study and background/current experience to establish their expertise in robotics, AI, and relevant fields for eligibility in the HRI experiment.
Thus, 6/20 (30%)-1:PRO (professional), 1:PG (post-grad), 4:UG (undergrad)-participants had previous or current experience in the field of robotics and AI; this percentage is high owing to the related nature of the study and the targeted online recruitment of AI and robotics students and staff members from the university. Further, 5/20 (30%)-2:UG, 3:PG:-subjects had experience in areas relating to computing and AI; this includes computer programming, application design, and AI games programming. Further, 5/20 (25%)-4:PRO, 1:UG-participants had experience in engineering and electronics with knowledge of robotic system design. Morover, 2/20 (10%)-1:PRO, 1:UG-individuals work or study in film and media, with experience in animation and programming.
Two out of twenty (10%)-2:PRO-subjects are professionals in areas relating to human psychology and physiology, to which both participants indicated an interest and personal study of AI. A further 8/20 (40%) of participants are current university students studying undergraduate degrees, 4/20 (20%) of subjects are undertaking postgraduate degrees at either Master's or Ph.D. Lastly, 8/20 (40%) of candidates are professionals within their field with significant personal and industrial experience in AI, robotics, and related areas, as shown in Table 2.
Questionnaire categories: The RHR and EAI user preference and design questionnaire consist of 10 open-ended questions followed by 15 quantitative questions measured on five-point Likert scales covering the following: likeability, human-likeness, and competence, shown in Table 3, based on previous studies [105,114,115]. The qualitative and quantitative HRI questionnaire results are comparatively analysed against the GSR, FEA, and AI analytic data for a comprehensive cross-analysis of all data fields.

HRI Questionnaire
Question 1. Which robot do you think looks most human-like, Baudi or Euclid?
A total of 18/20 (90%) cited Euclid as the more human-like RHR. Of this dataset, 14/18 mentioned that Euclid's skin appeared more human-like, 3/18 conferred Euclid's mouth as the most human-like, and 1/18 cited Euclid's eyes as more aesthetically realistic than Baudi's. Moreover, 1/20 (5%) of results explained that Baudi's hair implants made the RHR look human-like. One out of twenty (5%) of test subjects cited Baudi's skin tone as more human-like than Euclid's. A key theme in the results of Q1 is the aesthetical quality of the silicone skins, which many test subjects testified as having a significant impact on the RHRs' visual authenticity. Both silicone skins were commissioned from the same artist to minimise visible irregularities between the RHRs and aesthetical and material quality. Euclid's skin wrinkles were a recurring theme in participant responses.
RHRs are typically young in appearance and designed without skin imperfections or blemishes [33,100], which is not an accurate representation of the natural tonal variance and ageing of human skin. This aesthetical consideration is significant to RHR design as, although Baudi's skin is not flawless, the RHR has no skin wrinkles and fewer skin imperfections. Therefore, as Euclid is much older in appearance with a greater range of facial defects, the RHRs' appearance further blurs the line between human beings and RHRs. More male (M) subjects cited Euclid as humanlike than females (F): F8, M10 and Baudi: F1, M1. There was little difference between the higher and lower age groups, H = ages (18)(19)(20)(21)(22)(23)(24)(25)(26)(27) and L = ages , Euclid: L9, H9 and Baudi: L1, H1. Question 2. Which robotic voice did you prefer/understand better, Baudi or Euclid?
In total, 13/20 (65%) of test subjects preferred the voice of Baudi. Of that dataset, 8/13 explained that Baudi's speech was slower and easier to understand than Euclid's and 5/13 stipulated that Baudi's accent was easier to understand than Euclid's. Moreover, 7/20 (35%) cited Euclid's speech synthesis as the most understandable. Of those results, 3/7 cited the pitch and tone of Euclid's voice as the most human-sounding. A key theme in the HRI results considered the American accents of the robots, as the speech synthesis program implemented in this study only outputs American accents. This configuration appeared to impact natural speech understanding of non-American listeners. A recurring theme in the test results was the preference of the American accent as 2/20 stated the American accents were off-putting, and 4/20 suggested that they liked the American accents.
However, 4/20 of test subjects explained that the American accents were difficult to understand at times. A higher number of male test subjects preferred the speech synthesis of Baudi owing to the speech rate, depth, and tone: M9, F4 compared with Euclid M2, F5. Age-related statistics indicated that the lower age group preferred the speech synthesis accent of Baudi: L9, H4 compared with Euclid: L1, H6. Question 3. Did you feel empathy/emotion towards Euclid or Baudi?
Here, 8/20 (40%) explained that they did not feel empathy or emotions towards the RHRs because the appearance, movement, and AI did not feel entirely human. However, out of that dataset, 3/6 advocated that, although they did not feel any emotion, they felt relaxed, happy, and calm during HRI with the RHRs, counter to the UV. Moreover, 5/20 (25%) cited that they felt no emotion or empathy because the robots were just heads without a body, which made them feel inhuman.
This outcome is justifiable with the current configurations of the RHRs as time limitations prohibited the design, development, and production of a robotic body and implementing a still manakin type body was difficult owing to spatial constraints. Further, 5/20 (25%) of test subjects suggested that they felt empathy towards Baudi owing to his unnatural appearance, which resembled human disability. A further 2/5 explained that Baudi's eyes played a role in instigating an emotional response. Moreover, 2/20 (10%) felt that Euclid's realistic appearance made them feel awkward and intimidated, which coincides with the UV. However, this sample is not sufficient in size to support the existence of the uncanny valley.
A higher number of female subjects felt an emotional response towards the RHRs compared with male subjects-F6, M1-and a greater number of males indicated no emotional responsivity-F3, M10. Out of that dataset, F5 cited an emotional response when interacting with Baudi and F1, M1 cited an emotional response during HRI with Euclid. Similarly, there was a notable difference in the age-related statistical analysis. L8, H5 cited no emotional response and L2, H5 suggested an emotional interaction.
Out of that dataset, L2, H3 mentioned feeling an emotional response when interacting with Baudi and L0, H2 with Euclid. Question 4. How did you feel when Baudi made eye contact with you?
Here, 8/20 (40%) of test subjects stated that they felt nothing resembling an emotional response when Baudi made eye contact with them. Of this dataset, 5/8 indicated that they did not feel the same way as when making eye contact with other humans. A further 3/8 suggested that the eye contact interaction was not consistent to make it feel like the RHR was looking at them naturally.
Moreover, 12/20 (60%) stated that they felt an emotional response when making eye contact with Baudi, and 7/12 suggested that it felt as if the robot was looking back at them. Of these results, 3/12 cited Baudi's eye aesthetics as making the eyes appear creepy. Another 2/12 explained that the pupil dilation function made the eyes look humanlike. Gender statistical analysis indicated a higher percentage of female participants felt an emotional response to Baudi's eyes than male test subjects-F7, M1 compared with no emotional response F2, M10. However, the results of the age-related statistical analysis indicated little correlation between an emotional response L3, H5 and no emotion L7, H5.
Question 5. How did you feel when Euclid made eye contact with you?
Here, 13/20 (65%) suggested that they did not feel anything when Euclid made eye contact with them. A further 8/13 argued that making eye contact with Euclid was like looking into the glass eyes of a doll or the eyes of a painting. Moreover, 3/20 suggested that Euclid's eyes were not real enough to instigate an emotional reaction. Another 2/13 gave no reasoning other than to state that they did not feel anything relating to an emotional response. A further 7/20 (35%) suggested an emotional response to making eye contact with Euclid. Another 4/7 cited that they felt like the robot was looking back at them, or another human was looking at them through a camera. Another 2/7 advocated that Euclid had dead eyes which made them feel unnerved. Finally, 1/7 explained that Euclid maintained constant eye contact with them, which made them feel unsettled. Gender statistical analysis suggested no correlation between male and female test subjects-no emotional response: F5, M8 and emotional response: F4. M3. Similarly, the results of the age-related statistical analysis indicated no discernable relationships between emotional responsivity L6, H7 and no emotion L4, H3. Question 6. Which robot head do you think moved most naturally, Euclid or Baudi?
Here, 16/20 (80%) of candidates cited Euclid's head as moving the most naturally. Of this dataset, 9/16 explained that they felt a greater natural interaction when communicating with Euclid compared to Baudi. A further 4/16 stipulated that Euclid exhibited a more extensive range of FEs.
Moreover, 2/16 mentioned Euclid's mouth as a decisive factor in the RHRs naturalistic movement, and 1/16 of outcomes suggested that Euclid's head moved more naturally. Another 4/20 (20%) explained that they felt Baudi exhibited a more extensive range of natural movement. A further 3/4 of test subjects advocated that Baudi exhibited a broader range of FEs. Finally, 1/4 suggested that Baudi made consistent eye contact with them.
These outcomes suggest that Euclid functioned with a higher range of movement than Baudi. However, both robots are of the same build and implement the same tracking and face detection applications and operate using a single Kinect system paired between the two robots to minimise interference. Therefore, there should be few distinctions between the functionality of the RHRs apart from the eye and mouth components. Thus, it is probable that the appearance and skin movement of the RHRs influenced how test subjects perceived the movement of the RHRs, as both robots instigate eyebrow raises and cheek raises at random intervals using the Pololu microcontroller. Thus, it is probable that, during individual sessions, one robot expressed a more extensive range of these functions owing to the randomisation of the FEs. This configuration may account for the differences in emotive HRI as the RHRs function differently with a wide range of variables during each session. These outcomes support the findings of the literature review that suggest poor aesthetical quality supersedes quality functionality, and vice versa [20,21]. Gender analysis indicated that more male test subjects cited Euclid's movement (M10, F6) as more naturalistic than Baudi (F3, M1). However, age-related statistics inferred little discernible difference between the naturalistic movement of Euclid (L8, H8) and Baudi (L2, H2). Question 7. Which robotic head expressed the greatest range of facial expressions?
Here, 11/20 (55%) advocated that Euclid performed the broadest range of FEs. Another 4/11 suggested that Euclid's eyebrow raises and cheek lifts were prominent, 2/11 explained that Euclid appeared to respond to questions with FEs, 2/11 argued that Euclid's skin wrinkles and skin stretching made FEs appear more realistic, and 3/11 stipulated that Euclid portrayed a wider range of emotive FEs than Baudi. A further 8/20 (40%) suggested that Baudi displayed the broadest range of FEs. Another 5/8 argued that Baudi appeared to synchronise FEs with AI responses more coherently. Another 3/8 claimed that Baudi performed the widest range of FEs. Finally, 1/20 (5%) of test subjects advocated that they could not distinguish which RHR displayed the broader range of FEs.
These outcomes suggest that natural skin aesthetics play a significant role in displaying FEs in RHRs. This approach is vital in developing greater human-like RHRs as typical methods in RHR design neglect skin blemishes and wrinkles, as these features do not fit in with the paradigm of cutting-edge technologies. Gender statistical analysis suggested that a greater number of male participants thought Euclid expressed the broadest range of FEs (F2, M9) and F6, M2 stated Baudi displayed the most FEs. F1 unclassified. Age-related analysis indicated that more subjects in the higher age group thought Euclid displayed the most FEs-EL: 3, EH: 8 and BL: 6, BH: 2. L1 unclassified.
These results correspond with existing literature in FE recognition in human psychology. For example, a comprehensive study measuring facial FE recognition [117] discovered that older adults focus on the mouth area for gauging emotive FEs and younger individuals concentrate on the eyes. This consideration is significant as the robotic mouths are greater articulated and expressive than the robotic eyes, per the results of the literature review (practical versus virtual HRI for evaluating the UV). Thus, 'own age advantage' may account for test subjects in the higher age group recognising more FEs during HRI, as their attention is on the mouths of the RHRs rather than the eyes. Question 8. Which AI did you prefer communicating with during the conversation?
Here, 13/20 (65%) preferred to interact with Euclid's AI. Of this dataset, 4/13 indicated that they felt Euclid had more relatable interests and hobbies than Baudi. Another 7/13 cited that they felt Euclid gave more accurate and fully formed responses than Baudi's and 2/13 advocated that Euclid's personality was more engaging than Baudi's. A further 7/20 (35%) preferred Baudi's AI system, 3/7 of that dataset cited that Baudi gave more accurate and full responses, and 2/7 stated that they had more in common with the interests of Baudi. Moreover, 2/7 explained that they felt Baudi was the more engaging RHR. During HRI, many individuals attempted to deceive the RHRs into giving incorrect responses by asking questions outside the scope of the AI system. For example, one participant asked Baudi, "How many hairs do you have on your head?", which produced an incorrect response of "My eyebrows are made of human hair". Male test subjects: M8, F5 preferred to converse with Euclid, while M3, F4 preferred Baudi, and the higher age group favoured Euclid's AI and the lower age group preferred Baudi's AI: EL: 4, EH: 9 and BL: 6, BH:1. Question 9. Which AI system did you prefer during the guessing game?
Here, 11/20 (55%) preferred Baudi's conversational AI during the guessing game session. Another 8/11 cited Baudi as the more competent RHR. A further 2/8 of candidates advocated that Baudi had a broader knowledge base, and 1/8 stated that Baudi was more engaging. Moreover, 9/20 (45%) cited that they preferred to interact with Euclid during the guessing game session. Finally, 5/9 argued that Euclid produced more correct answers, 3/9 stated that Euclid had a wider knowledge base, and 1/9 explained that Euclid was more naturalistic during HRI.
As with the previous test, several subjects asked unusual or irrelevant questions to deceive the AI system. Male subjects preferred to interact with Baudi (M8, F3) and females with Euclid (M3, F6).
However, there was little notable difference in the age-related factors-Baudi: L5, H6 and Euclid: L5, H4. Question 10. Did you prefer communicating with an older or younger-looking robot?
Here, 12/20 (60%) preferred to communicate with Euclid. Of this dataset, 5/12 explained that they felt more comfortable interacting with an older looking robot because they felt trust towards an RHR with an aged appearance. Moreover, 2/13 suggested they prefer to interact with an older looking RHR as it was less intimidating than a younger model. A further 3/13 suggested that Euclid reminded them of older relatives, which made HRI less intimidating. Another 2/13 stated that as they felt Euclid looked more humanlike than Baudi, Euclid's highly realistic appearance made the communication more naturalistic. Another 8/20 (40%) advocated that they prefer to communicate with Baudi, and 5/8 explained that Baudi appeared less threatening because his appearance was softer.
Moreover, 2/8 indicated that they felt Baudi was more engaging than Euclid and 1/8 suggested that, as they interact more with younger people daily, they felt a greater connection with a young-looking RHR. These results indicate that participant outcomes were closely divided between communication with a younger-looking robot and an older looking RHR. This finding is intriguing as it correlates with the age-related statistics, which infers that the higher age group favoured interaction with Euclid (H8, L4) and the lower age group preferred HRI with Baudi (H2, L6). This outcome is vital to this study as the results suggest that younger test subjects felt a greater connection with a younger-looking RHR, and the older age group felt a greater connection with an older looking RHR. Moreover, although Baudi is designed to look younger than Euclid, neither RHR is assigned a specific age, meaning participants were left to decide how old the robots appeared to them. Interestingly, several subjects interpreted the appearance of the RHRs as threatening or intimidating, depending on how old they looked. This result may relate to how young people perceive older adults and older people see younger individuals in society. Gender factors did not influence decision making; F5, M7 preferred to interact with a robot with an elderly appearance and F4, M4 preferred HRI with a younger-looking robot.

Comparative Analysis of Qualitative and Quantitative Results (All Test Subjects)
Series 1A. Likeability: (Q11, Q14, Q17, Q20, and Q23). On average, Baudi rated 4/5 more likeable than Euclid 1/5. Internal consistency was high in this subset, 0.8 α, indicating high levels of consistency in participant responses. These results confirm the outcomes of Q3, Q6, and Q7 in which candidates cited a more significant emotional response to Baudi during HRI. However, these results contrast with the results of Q10, where test subjects preferred to interact with an older looking RHR (Euclid) over a younger-looking RHR (Baudi). These outcomes collectively conclude that participants found Baudi more likeable than Euclid.
Series 2A. Human-likeness: (Q13, Q15, Q18, Q21, and Q25). Test subjects rated Euclid 4/5 more human-like than Baudi 1/5. Coefficient factors were high in the subset, 0.9 α, suggesting a high level of accuracy in participant outcomes. However, 1/5 (Q15) of test subject outcomes were of an equivalent rating on the five-point Likert scale (3). These results validate the findings of Q1 of the HRI questionnaire, where test subjects cited Euclid as appearing, functioning, and moving more humanlike than Baudi. Therefore, these outcomes collectively conclude that test subjects found Euclid more humanlike than Baudi.
Series 3A. Competency: (Q12, Q16, Q19, Q22, and Q24). Test subjects rated Euclid 3/5 as more competent than Baudi 2/5. However, out of these results, 2/3 (Q19 & Q24) rated equivalent (3) on the five-point Likert scale. Internal consistency was acceptable, 0.7 α, suggesting some variance in the outcomes of participants' responses. These results collate with the findings of Q6-Q8, which cite Euclid as moving and responding to participants more competently than Baudi. However, these results contrast with the outcomes of Q2 and Q13, which suggested Baudi's AI functioned with a higher degree of competency during the guessing game session and clarity in speech synthesis during verbal HRI. These results collectively confirm that participants rated Euclid as the more competent RHR during HRI.
These findings correlate with the results of Q3: emotions during HRI (E: L2, H3 and B: L0, H2). Q6-Q7 eye contact interaction, Euclid: L6, H7 and Baudi L3, H5. In Q14, the lower age group preferred HRI with Baudi and the higher age group with Euclid: E: H8, L4 and B: H2, L6. Therefore, these results collectively substantiate and validate the findings that the younger age group preferred HRI with Baudi, and the more mature age group preferred HRI with Euclid. Series 2C. Humanlikeness: (Q13, Q15, Q18, Q21, and Q25). Euclid rated L4/5 and H 3/5 and Baudi L0/5 H1/5. Coefficient factors were high in the dataset, 0.9 α, suggesting a high level of consistency in participant responses. These results correspond with the outcomes of Q1, where both age groups cited Euclid as the more humanlike RHR: E: L9, H9 and B: L1, H1. In Q4, both age groups suggested Euclid's eye movement as the more aesthetically realistic: E: H8, L6 and B: H2, L2, L1 unclassified.
However, these results conflict with the results of Q5 as the higher age group preferred the eye movement of Euclid, E: H8, L4 and B: H2, L6. These results collectively conclude that both age groups rated Euclid as the most human-like RHR.

Analysis of FEA and GSR Biometric Data Feeds
The following section examines the FEA and GSR biometric data feed extracted from test subjects during the HRI experiment. The FEA reads the following FEs from 0-100%.

1.
Rest, 0-100%: this configuration is the default position of the FEA system.

2.
Frown, 0-100%: the frown function measures negative Fes, which examine the position of the lips, cheeks, and eyebrows.

3.
Smile, 0-100%: similar to the frown function, the FEA system measures the correlation and position of the lips, cheeks, and eyebrows to determine if and to what extent the test subject expresses a positive facial expression.

4.
Disengage, 0-100%: this mode engages when the system is unable to track the user.

5.
Attention, 0-100%: the FEA system measures the frequency of the test subjects eye positions with the camera position to monitor attention rates.
The GSR data combined with the FEA data provide an overview of the type and level of stress experienced by the test subjects at any given state. This combinatory approach permits the verification of positive and negative emotional states in the GSR and FEA data to eliminate interference from other facial movements such as speech, gesturing, and facial tics that may register as a positive or negative facial expression in the FEA application. This method acts as a low pass filtration system and permits greater accurate measurements of frequency, duration, and range of emotive FEs for comparative analysis between the AI results and the findings of the HRI questionnaire.

All Test Subjects: 5-Min Conversation. FEA and GSR Data Analysis
The results of the FEA biometric data analysis indicated that test subjects spent more time in the rested state when interacting with Euclid (Avg 18%). Compared with Baudi (Avg 15%), suggesting participants spent more time in the neutral facial position when interacting with Euclid. Negative Interestingly, although the FEA results were marginal, the range of internal consistency was low across the data set (0.04 α/0.5 α), suggesting very high levels of variance. Therefore, per the results of the FE data analysis, although test subjects expressed a wide range of FEs during the 5-min AI conversation experiment, these did not significantly change in frequency or intensity between the RHRs. However, the GSR data analysis indicated higher levels of stress during HRI with Euclid: total Avg, 17.   . The GSR results contradict the FE results; therefore, it is likely that, although female test subjects displayed less FE than males, female subjects responded with higher electrodermal activity 'stress'. The FEA test results support the outcomes of Series 1B (likeability) and Series 3B (competency), and per the UV, the GSR results support the findings of Series 2B (human-likeness) as both gender groups rated Euclid as humanlike at a rate of (4/5).

All Test Subjects, 5-Min Guessing Game, Biometric Data Analysis
The FEA results suggest that candidates spent more time in a neutral FE state when interacting with Baudi during the 5-min guessing game session (B: 13%, E: 14%   The FEA results correspond with the Series 1B (likeability) and Series 3B (competency), as male subjects rated the RHRs higher than females. However, the FEA data conflict with the outcomes of Series 2B (human-likeness), which cited both gender groups as experiencing similar emotions during HRI. GSR data analysis reinforces the results of the FEA data and questionnaire results, as male subjects discharged higher levels of electrodermal activity than female participants, M: Total Avg, 15  The average level of attention was higher in the lower age group, L: Avg 73% and H: Avg 71% and the lower age group exhibited higher levels of disengaging behaviour (L: 19. Avg 2, H: 22. Avg 2), suggesting similar levels of accuracy in both gender groups. Coefficients were incredibly low across the dataset, ranging between (−0.05 α/0.6 α), indicating a high level of variance in FEA test results. These outcomes support the findings of Series 1C (likeability), where the higher age group rated a greater positive HRI experience with Euclid. Furthermore, the FEA outcomes verify the results of Series 2C (human-likeness), where the lower age group rated Euclid as more human-like, and Series 3C (Competency), where both age groups cited similar positive and negative emotions during HRI.
The results of the GSR data analysis indicated greater levels of electrodermal activity positive and negative stimuli in the higher age group, L: Total Avg, 22.5 µS. (Neg stimuli: Avg 4.4 µS/Pos stimuli: Avg 5.1 µS) and H: Total Avg, 13.2 µS. (Neg stimuli: Avg 6.2 µS/Pos stimuli: Avg 3.4 µS). The GSR outcomes add validity to the findings of the FEA data analysis as they suggest that the lower age group exhibited greater negative emotions during HRI with Euclid compared with the senior group, as shown in Figure 8. The FEA results indicate that the lower age group spent more time in a rested FE state (L: 17%, H: 14%). Negative FEA intensity was proximal in both age groups; however, negative FEA frequency was marginally greater in the higher age group, L: (Avg INTST, 7%, Fq: 101, Avg: 10), H: (Avg INTST, 7%, Fq: 112, Avg: 11). These results suggest that the higher age group experienced similar negative emotions during HRI with Baudi. Conversely, positive FEA was significantly greater in the higher age group, H: (Avg INTST, 9%, Fq: 103, Avg: 10), L: (Avg INTST, 8%, Fq: 97, Avg: 10), indicating that the higher age group expressed greater positive emotional HRI with Baudi.
The average attention level was higher in the lower age group, L Avg 73% and H: Avg 71%, and the higher age group exhibited significantly greater disengaging behaviours (H: 33. Avg 3, L: 19. Avg 2). These results suggest greater concentration during HRI and higher accuracy in the results of the lower age group. Coefficient factors were relatively low in the dataset, ranging between (0.2 α/0.6 α), suggesting a high level of variability in the FEA test results. GSR data analysis indicates that, on average, the lower age group exhibited greater electrodermal activity than the higher age group, L: Total Avg, 16.8 µS. (Neg stimuli: Avg 2.4 µS/Pos stimuli: Avg 4.1 µS) and H: Total Avg, 7.9 µS. (Neg stimuli: Avg 3.1 µS/Pos stimuli: Avg 2.5 µS). However, both age groups discharged similar levels of negative electrodermal activity, and the higher age group exhibited greater positive GSR readings, which support the outcomes of the FEA data analysis, as shown in Figure 9. These results collectively support the findings of Series 2C (human-likeness) and Series 3C (competency), as test subjects cited similar ratings and positive and negative emotions during HRI with the RHRs. However, more significantly, these results support the outcome of Series 1C (likeability), which suggests that the lower age group preferred HRI with a younger-looking RHR. However, male subjects exhibited significantly more disengaging behaviour than female subjects (M: 32. Avg 3, F: 15. Avg 2), suggesting greater accuracy and attention in the results of female subjects. The results of the average attention rate support these outcomes as female participants rated higher levels of concentration than males, M: Avg 71% and F: Avg 72%. GSR data analysis indicated that female test subjects demonstrated higher levels of positive and negative electrodermal activity than male candidates, Males: Total Avg, 9 µS. (Neg stimuli: Avg 3.6 µS/Pos stimuli: Avg 1.5 µS) and Females: Total Avg, 14 µS. (Neg stimuli: Avg 3.1 µS/Pos stimuli: Avg 2.4 µS), as shown in Figure 10. The FEA and GSR results add validity to the results of the 5-min topical conversation session with Euclid, where male subjects expressed greater emotive FE, and female participants expressed greater emotional GSR during HRI. These results support the findings of Series 2B (human-likeness), where both gender groups rated comparable feelings and emotions during HRI.  Figure 11. Interestingly, these results correlate with Baudi's FEA and GSR results during the 5-min topical conversation session. Therefore, it is highly likely that the male test subjects expressed greater positive and negative emotive FE, and female participants dispersed higher levels of positive and negative electrodermal activity during HRI with Baudi. These outcomes are significant as results cite notable differences between the gender groups in FEA and GSR data, which adds greater validity to the results of Series 3B (competency). and Series 3C (competency), where the higher age group rated Euclid as more humanlike (per the UV).
Internal consistency was low, ranging between (0.01 α/0.5 α), indicating high variability in the FEA data. The average attention rate was greater in the higher age group, L: Avg 69% and H: Avg 73%. However, the higher age group exhibited more disengaging behaviours (L: 29. Avg 3, H: 20. Avg 2). These results suggest higher attention levels in the lower age group and greater distraction in the higher age group. The FEA results contrast with the 5-min topical conversation session AI data. A probable cause for this shift is the parameters of the gamification session, which requires concentration and strategy, compared with the 5-min topical conversation session, which is relaxed and unstructured. GSR readings indicate that the lower age group exerted higher levels of positive electrodermal activity than the higher age group, and negative GSR was higher in the lower age group, H: Total Avg, 15 µS.   However, the lower age group exhibited higher GSR and FEA readings in the gamification session than the topical conversation component. These results suggest the lower age group experienced a greater immersive and visceral HRI during the guessing game session compared with the more mature age group, which exerted higher biometric feedback during the verbal communication component. These outcomes are explored and comparatively analysed against the Amazon Web Services (AWS) AI data in the following section.

Topical Conversational and Guessing Game AI Data Analysis
The following section examines the results of the 5-min topical conversation and the guessing game AI sessions recorded in AWS during HRI. The results are categorised and cross-analysed with the HRI questionnaire results and FEA and GSR data to validate findings.

Euclid and Baudi: All Test Subjects, Topical Conversational AI Data
Euclid delivered 283/545 (52%) of questions correctly, Figure 14, and Baudi achieved a lower rating of 241/497 (48%), Figure 15. Euclid produced incorrect responses at a rate of 158/545 (28%) and Baudi 153/497 (31%), and Euclid executed the 'please repeat' command of 144/545 (20%) compared with Baudi of 105/497 (21%). These results do not tally with the findings of Q8, as subjects cited Euclid as providing more correct responses, E: 7/13, B: 3/7. However, the AI test results support the outcomes of Series 3A, as subjects rated Euclid as the more competent RHR (incorrect and repeat responses), and the FEA and GSR, as subjects rated higher levels of positive biofeedback during HRI with Baudi.

Euclid and Baudi: Males and Females, Topical Conversation AI Data
Male test subjects achieved a greater number of correct AI responses than females during HRI with Euclid, EM: 146 (47%) and EF: 136 (55%), depicted in Figure 16, and similarly with Baudi, BM: 133 (45%) and BF: 108 (54%), Figure 17. The AI test results support the outcomes of Q9 as more male test subjects-M8, F5-preferred to converse with Euclid, and similarly in the results of Baudi (M3, F4), as Euclid achieved greater accuracy (E: 283, B: 241). However, males scored a higher rate of incorrect responses than females-EM: 96 (31%), EF: 62 (25%) and BM: 88 (30%), BF: 65 (32%)-and in instigating the 'please repeat' response, EM: 66 (22%), EF: 48 (20%) and BM: 72 (25%), BF: 33 (16%), which suggests male test subjects asked a higher number of questions outside the scope of the AI system. These results support the findings of Series 3B (competency), as male candidates rated Euclid as the more competent RHR, and correlate with the gender-based FEA and GSR biometric data results during the topical conversation session.   Figures 18 and 19. However, the AWS AI findings contradict the results of Series 3C (competency), as the lower age group rated Euclid as the most competent RHR and the lower age group cited Baudi as the more competent RHR. Nevertheless, these outcomes support the findings of the biometric data analysis as the higher age group exhibited greater FEA, GSR, and attention during the topical conversation session.

Euclid and Baudi: All Test Subjects, Guessing Game AI Data
The objective of the AI guessing game was to engage subjects with a greater challenging mode of HRI to examine how the DL AI functioned at a faster rate of processing. The AI results indicate that Baudi achieved a marginally higher accuracy rating than Euclid: B: 705/1235, (57%) and E: 685/1223, (56%). However, Baudi generated a higher sum of incorrect answers than Euclid, B: 257/1235, (21%) and E: 245/1223, (20%), and 'please repeat' replies, E: 292 (24%) and B: 273 (22%), as shown in Figures 20  and 21. These results support the findings of Series 3A (competency), as test subjects rated the RHRs as competent, E: 3/5 and B: 2/5 with (2/5 equivalent ratings of 3). Significantly, the results of Q9 (guessing game) conflict with the findings of Series 3A, yet validate the AWS AI data. Thus, Euclid rated as more competent in movement and Baudi in AI interaction during the guessing game session. Furthermore, the biometric data analysis validates the AI results as Baudi rated higher in positive FEA/GSR than Euclid during this session.  . These AI test scores support the outcomes of Q9, as M8, F3 preferred to interact with Baudi and M3, F6 with Euclid, and validate the outcomes of Series 3B, as male subjects rated Euclid as the more competent RHR. Furthermore, the results of Series 3B (competency) cited high variability in participant responses that parallel with the frequencies of incorrect and 'please repeat' responses indicated in AWS AI analytics.   Figures 24 and 25. However, there was little statistical difference in the age-related results of Q9-Baudi: L5, H6 and Euclid: L5, H4. These results conflict with the outcomes of Series 3C (competency) owing to the high variance in participant outcomes. However, the biometric data analysis indicated higher levels of positive feedback in the lower age group than in the more senior age group, which collates with the AI response data. Significantly, the level of attention was greater in the higher age group during the guessing game session, which validates the AWS AI data. Nevertheless, these outcomes are problematic to substantiate owing to the high level of variance between the results of the HRI questionnaire, FEA, GSR, and AWS AI data.

Conclusions
The results of this study show that the younger age group preferred HRI with a younger looking RHR and the more senior age group preferred HRI with an older looking RHR, reinforced by the FEA/GSR data. These results provide a strong foundation for the appropriation of age in RHR design in fields such as older looking RHRs in care for the elderly and younger looking RHRs as social companions for the young, vulnerable, and isolated. Furthermore, test subjects stated that they felt greater trust towards an older looking RHR compared with a more youthful RHR, as it reminded them of the trust and care of older relatives. Similarly, the gender-based analysis indicated that male test subjects preferred HRI with Euclid (older in appearance) and females with Baudi (younger in appearance). Although there is little scientific evidence to substantiate the reasoning behind this outcome, it is an interesting finding that requires further study.
Upon review of the AWS AI data, male subjects asked more questions outside the scope of the AI system compared with female test subjects. This anomaly was observed during HRI and identified in the outcomes of the questionnaire when test subjects purposefully attempted to deceive the AI system into giving incorrect responses. Thus, more male subjects prompted the RHRs into giving incorrect responses compared with female participants. This outcome coincides with the FEA data analysis as male subjects expressed significantly greater disengaging behaviours and reduced attention levels than female participants. Thus, it is plausible that, as the RHRs are modelled on the male form, male participants felt a need to test the capacity and abilities of the system more than female subjects. This hypothesis may account for the irregularities between the HRI questionnaire and biometric data results as, in the survey, many male subjects cited feeling no emotions during HRI. However, these outcomes contradicted the FEA data that firmly showed male subjects exerted significantly higher levels of positive and negative FE biometric feedback than females.
Finally, another possible method of evaluating RHRs is to compare two identical systems side by side. This method would reduce influential factors such as different ages and gender and narrow the scope of evaluation to specific variables such as static and dilating pupils. However, creating identical RHRs is incredibly difficult owing to the variability in materiality, aesthetics, and application of hand-made silicone skin onto the underlying exoskeleton. This visual irregularity is observable in the RHR 'Sophia', as new generations of the RHR look distinctly different from the first owing to the application and reapplication of the silicone skin.
Nevertheless, as in this study, comparing RHRs with different appearances, genders, and personalities permits the investigation of gender and age in terms of user preference, which yielded highly interesting and intriguing results for modelling user preference in EAI and appearance in future RHR design.

Employing FEA and GSR to Support Questionnaire Results
The combinatory approach to biometric data gathering, camera feed analysis, and AI analytics provided critical insights into the behaviours and emotions of test subjects not cited in the HRI questionnaire. For example, male participants expressed significantly greater positive and negative emotive FE than females during HRI. Yet, the majority of male subjects cited feeling no emotions in the HRI questionnaire. Similarly, the lower age group expressed greater levels of negative FEA and GSR during the 5-min topical conversation and guessing game AI sessions, which correlates with the results of the AI test data, as the lower age group achieved a lower number of correct answers and a higher number of incorrect and 'please repeat' responses during HRI. However, these results conflict with the questionnaire outcomes as there were little discernable differences in participant results, as male subjects stated feeling little to no emotions during HRI. Furthermore, the implementation of the GSR and FEA systems did not restrict the participant's freedom of movement and accessibility to the RHRs.
This outcome is significant for future HRI studies as these non-invasive methods of biometric data gathering permitted test subjects to interact and communicate with the RHRs naturally without any obstruction or interference from the biometric sensors. Employing a mixed-method biometric data gathering approach permitted a more precise analysis of the data feeds, as high and low levels of FEA were confirmable by the GSR data, and a drop/loss of GSR signal was verifiable with the FEA data and camera feed. This method provided consistent and valid biometric data for analysis, as anomalies in the data feed could be attributed to either incredibly higher or low levels of emotional stimulus or as a result of issues with the connectivity of the biometric systems. Finally, an issue with the GSR and FEA mixed-method approach was the delay in the data processing as the GSR lagged the FEA, which resulted in a time offset of approximately ±5 s between the feeds. However, this issue was negatable by examining the Hz frequencies of the devices to determine the processing rate and electrode sensitivity of the systems to help synchronise the data fields. Moreover, the Esense GSR application provides the precise offset of the stimulus to the biometric readings, in seconds and milliseconds, which also helped precisely align the two datasets for comparative analysis.

Ethical and Broader Issues in RHR Design and Application in Society
The outcomes of this study provided critical data regarding how the test subjects perceive the future of RHRs in society. One key area that test subjects highlighted as a suitable application for RHRs with EAI was the care and support industry, as the RHRs provided useful and entertaining content in a manner proximal to human-human communication. In support, 45% of participants stated that, as the RHRs cannot make personal judgements or become offended, HRI was more relaxed, honest, and free-flowing than human-human communication. This outcome correlates with previous studies [118][119][120] and the results of the HRI age-related statistical analysis, which suggested younger test subjects preferred HRI with a younger-looking RHR and the more mature age group with an older looking RHR. This study covered two key areas of RHR and EAI ethics relating to robot rights and the fallibility of human perception in RHR design. Firstly, 75% of candidates stated that RHRs should be treated as machines, regardless of human-likeness and intelligence. However, this approach is susceptible to a fundamental flaw in human perception, because, if an RHR can authentically emulate a human in a manner proximal to the human condition in real-world conditions, then there is no definitive way of determining if the RHR is human or robot without an internal examination.
Finally, 45% of participants stated that EmoAI would be useful in human society. Interestingly, more male subjects cited applications of emotional AI, which correlates with the outcomes of the FEA data, as male subjects exhibited more emotional stress than females. Thus, it is likely that male subjects felt that they could express emotions freely during HRI compared with recounting them in the HRI questionnaire. Similarly, more subjects in the higher age group cited the use of EmoAI in society. These results interlink with the biometric, AWS AI data analytics, and HRI questionnaire results, as the higher age group exhibited greater levels of positive and negative emotional stimulus during HRI.