Questionnaires to Measure Acceptability of Social Robots: A Critical Review

: Understanding user perceptions is particularly important in developing social robots, which tend to have a high degree of interaction with humans. However, psychometric measures of robot acceptability have only recently started to become available. The present critical review outlines the psychometrically validated questionnaires to measure social acceptability factors related to social robots. Using an iterative search strategy, articles were identiﬁed that reported on the development of such questionnaires as well as information about their psychometric properties. Six questionnaires were identiﬁed that provide researchers with options varying in length, content, and factor structure. Two of these questionnaires inquire about attitudes and anxieties related to robots, while two others capture a larger range of attitudes that extends to positive and neutral aspects as well. One of the questionnaires reviewed here was speciﬁc to inquiring about ethical issues related to the use of social robots for therapy with children with autism, and the last one was designed to provide an assessment of expectations of participants prior to interacting with a robot. Overall, the use of robot acceptability measures is still relatively new, and further psychometric work is necessary to provide conﬁdence in the validity and reliability of these scales.


Introduction
With rapid advancements in robot technology, the functions of robots in society are continuously expanding and diversifying. Within robotics, the field of social robotics has been receiving increasing attention [1]. A social robot has been defined as "an autonomous or semi-autonomous robot that interacts and communicates with humans by following the behavioral norms expected by the people with whom the robot is intended to interact" [2] (p. 592). The first social robots were made for entertainment purposes such as robotic toys [3], but later models increasingly permitted reciprocal interaction, such as response by the robot to the user's emotional state [4]. In other instances, robots can acquire a social function by extending their primary design purpose, such as through modifications to an industrial robot, to play an interactive game [5].
Because of the close nature of human-robot interaction in social robotics, understanding human user perspectives is a vital element, right from the early stages of robot development [6]. User studies have investigated the effects of a broad range of specific factors, such as different robot morphologies (e.g., humanoid, animalian, or mechanistic) on user acceptance [7], variation in affective facial expressions [8], cultural distance and language [9,10], communicative non-verbal behaviors such as robots. Apart from psychometric properties, the theoretical rationale as well as the development of items will be reviewed for each questionnaire.

Methods
The literature search was conducted in July to September 2019 using the databases Google Scholar, Scopus, and IEEE Xplore. The first two were chosen for their broad coverage, as our objective was to gather articles from a variety of research fields as long as they were relevant, and IEEE Xplore was selected due to specificity for robotics. Because of the broad exploratory nature of the review, the literature search was conducted iteratively, starting with a variety of different search terms (e.g., "social robot" AND "questionnaire") and gradually narrowing down the search according to relevance of search hits. As acceptance and acceptability were often used interchangeably in the literature, both terms were used in the searches. Other terms included attitude and perception. Additional articles were sourced through searches of the reference list of articles that had investigated acceptability of robots, other literature reviews, or searching for articles that had cited key sources. As the focus was to identify questionnaires, we searched the method sections of articles for references to the original questionnaires, as well as evidence for their psychometric properties. Due to the iterative and diverse nature of our search strategy, our review was more similar to a scoping review than a systematic review [34]. However, since our review did not aim to develop a typology nor did it gather a stakeholder perspective on the results as commonly done for a scoping review [35], our review is most accurately defined as a critical review [34]. All searches were conducted in English.
To maintain the search within the scope of this review, the following exclusion criteria were applied: Firstly, studies were not included if insufficient information was provided about the questionnaire's content or psychometric properties. This includes the Attitudes toward Social Robots (ASOR-5) scale [36], which has been used in research studies already, but for which information about psychometric properties is still forthcoming [37]. Secondly, only questionnaires that were specific to robots were included. General measures of psychological constructs, such as a general credibility scale [38] or questionnaires based on the TAM [23,24], were thus not included. This also applies to later versions of acceptance models such as the Unified Theory of Acceptance and Use of Technology (UTAUT) model [39]. Thirdly, questionnaires were excluded if they involved subjective evaluations of physical and social attractiveness or likeability and were thus related to specific encounters with robots [40][41][42]. This includes the highly cited Godspeed questionnaire [43], which meets the definition of social acceptance but not acceptability. Table 1 lists (in order of number of citations) the social acceptability of robot questionnaires identified in the present search. The most highly cited questionnaire is the Negative Attitudes toward Robots Scale (NARS) [44]. This questionnaire was published in 2006, and there was obviously a longer time window available to cite this study, as opposed to the Frankenstein Syndrome Questionnaire (FSQ) [45], published in 2012, and the remaining four questionnaires that were published from 2015 onwards. The six questionnaires listed in Table 1 offer a variety in terms of content, length, and subscale structure, and they will each be described separately in the following sections.

Negative Attitudes towards Robots Scale (NARS)
The NARS [44] was first developed in Japanese and presents items on a five-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = undecided, 4 = agree, 5 = strongly agree). The original item pool during the development phase contained 33 items and was based on a theoretical model as well as questionnaires about communication apprehension and computer anxiety. During iterative psychometric testing involving analysis of internal consistency, factorial validity, test-retest reliability, and construct validity with other measures of anxiety, the number of items was gradually reduced to 14.
These analyses were conducted with separate samples of university students in Japan. Except for three items that need to be reverse coded, higher item scores represent a higher extent of negative attitudes.
After conducting a confirmatory factor analysis, the authors [44] concluded that a three-factor solution was most suitable, with one subscale called Negative Attitudes toward Situations of Interaction with Robots (six items; e.g., "I would feel uneasy if I was given a job where I had to use robots"), another one Negative Attitudes toward Social Influence of Robots (five items; e.g., "I would feel uneasy if robots really had emotions"), and the last one Negative Attitudes toward Emotions in Interaction with Robots (three items; e.g., "I would feel relaxed talking with robots"). Note that all three reverse coded items cluster into the last subscale. Test-retest reliability for the first two subscales was high (>0.70), but only moderate for the three-item subscale (0.54). The first two subscales were moderately (0.41) correlated with each other, but correlation coefficients with the third subscale were below 0.20. The authors originally intended to develop a scale to measure robot anxiety [49] and thus also included measures of state and trait anxiety for assessment of construct validity. However, correlations between NARS subscales and measures of anxiety were either not statistically significant or only very small [44], and the authors thus developed a separate questionnaire to measure robot anxiety [49] and communicated the NARS as measuring negative attitudes about robots.
Although the NARS had been used in different language versions and with the same factor structure to make cross-country comparisons of robot anxiety [50], some validation work later challenged the cross-cultural validity of the original format of the scale. For example, in a study with a sample of British university students, Syrdal and colleagues [51] deleted three items due to low reliability. While a principal components analysis still yielded a three-factor solution, the grouping of items was substantially different, and the authors named the three subscales: Future/Social Influence, relational Attitudes, and Actual Interactions and Situations. Studies validating other language versions also reported substantial deviations from the original structure. For the Portuguese version, Piçarra and colleagues [52] deleted two of the three items that had been deleted for the British version and concluded that a two-factor solution (Negative Attitudes toward Robots with Human Traits and Negative Attitudes toward Interactions with Robots) was most adequate. The Polish version [53] contained the same items as the Portuguese version, but with a slightly different allocation of items across the two factors. An exception to the lack of evidence of cross-cultural validity is the Turkish version, where the authors [54], using confirmatory factor analysis, found sufficient evidence of the adequacy of the original three-factor solution with all 14 items.

Robotic Social Attributes Scale (RoSAS)
The aim of developing the RoSAS was "to offer a means to assess the central attributes implicated in human perception of robots and, ultimately, to provide the robotic community with a tool to determine how perceived attributes affect the quality of interaction with robots" [46] (p. 254). The questionnaire was heavily influenced by the Godspeed questionnaire [43], which asks participants to rate robots on a semantic differential scale, such as "Fake" at one end of the five-point scale and "Natural" at the other.
The developers of the RoSAS [46] aimed to create a scale that, unlike the Godspeed questionnaire, is not related to a specific image or video of a robot. Instead, the RoSAS requests participants to provide ratings with the following instruction: "Using the scale provided, what is your impression of the category robots?" Each item is presented with a nine-point Likert scale, with 1 labeled definitely not associated and 9 as definitely associated.
After combining items from the Godspeed questionnaire and generating additional ones based on a review of the literature on social cognition, the RoSAS developers [46] used several separate samples of participants recruited online to test an initial set of 83 candidate items. An exploratory factor analysis revealed a suitable solution with 18 items loading onto three factors labeled warmth, competence, and discomfort. The correlation between these factors was fairly low, ranging from 0.18 to 0.34. In a final study, the scale developers [46] tested some hypotheses of differences related to robot appearance that were varied across the dimension machine-like to human-like and female to male.
The present literature search only identified one additional study [55] that analyzed the psychometric properties of the RoSAS-and that with the purpose of testing the validity of the scale when applied to an actual physical encounter with a robot rather than inquiring about the participant's general attitudes to robots. After participants engaged in a number of experimental trials that required them to hand over objects to a robot with a gripper, they were asked to complete the RoSAS. Factor analysis appeared to have been conducted separately for each subscale, confirming the unidimensionality for each subscale. Additionally, internal consistency for each subscale was found to be high (Cronbach's alpha above 0.80). However, it needs to be noted that the sample size was only 22, and these results therefore need to be interpreted with caution.

Ethical Acceptability Scale
The purpose of the Ethical Acceptability Scale [47] is to permit assessment of ethical issues in the use of robot-enhanced therapy with children with autism. Twelve items were developed by a team of ethicists, psychologists, therapists, and engineers. About half of the items are directly focused on the use of robots for therapy with children with autism (e.g., "It is ethically acceptable that social robots are used in therapy for children with autism"), and others are worded in general terms (e.g., "It is ethically acceptable to make social robots look like humans"). All items are rated on a five-point Likert scale (1 = strongly disagree, 2 = disagree, 3 = nor agree nor disagree, 4 = agree, 5 = strongly agree). Even though the questionnaire was designed to gauge general attitudes about ethical acceptability, participants in the original development study [47] were shown a brief video clip showing a variety of social robots with different physical appearances. The purpose was to ensure that participants made their ratings specifically in regard to social robots and not any other type of robot they might have pictured.
While the initial version of the questionnaire was in English, the developers also translated to into Dutch and Romanian to allow collection of data in their multi-country study involving participants in Belgium, the Netherlands, Romania, the United Kingdom, and the United States. For the purposes of validating the questionnaire, the developers pooled together data from all countries (n = 394). Based on the results of a principal components analysis, the scale developers [47] concluded that a three-factor solution was most appropriate. The first subscale was called Ethical Acceptability for Use (five items), the second one Ethical Acceptability of Human-like Interaction (four items), and the last one Ethical Acceptability of Non-human Appearance (three items). The internal consistency reliability for the subscales was good (Cronbach's alpha 0.86, 0,72, and 0.76, respectively). No further published reports of the psychometric properties of the Ethical Acceptability Scale were found.

Technology-Specific Expectations Scale (TSES)
The TSES [19] was developed to permit assessment of expectations by users prior to encountering and interacting with a robot. The authors thus clearly presented the scale as a measure of acceptability and contrasted it with acceptance or satisfaction after a robot encounter, for which they also proposed a scale-the Technology-Specific Satisfaction Scale (TSSS). The conceptual foundation underlying this work was the Expectation-Confirmation Theory according to which consumers' satisfaction is determined by the extent to which their initial expectation of a product is later confirmed by their experience with it. These expectations thus form a reference point that affects later evaluative judgments about the experience of the product. The TSES was designed to provide a measure of such baseline expectations but specific to intended interactions with a particular robot, which may be based on unrealistic preconceived ideas from exposure to science-fiction culture [19].
The development and testing of the TSES occurred in the context of an upcoming introduction of an autonomous empathic robotic tutor to school children aged between 14 to 16 years. The purpose of the robotic tutor was to teach topics related to sustainable development. Apart from the foundation of the scale development on Expectation-Confirmation Theory, the authors of the TSES [19] also stated that item generation was inspired by an existing questionnaire on beliefs and attitudes towards information technology usage [56]. The TSES presents ten questions on a five-point Likert scale (1 = Very low expectation, 2 = Low expectation, 3 = Neutral, 4 = High expectation, 5 = Very high expectation), with five items allocated to one of two subscales: Capabilities (e.g., "I think I will be able to interact with the robot") and Fictional View (e.g., "I think the robot will have superhuman capacities"). The phrase the robot is meant to be substituted by the name of the robot about to be introduced. TSSS matches the TSES in terms of item content and structure, but differs in terms of tense (past tense as opposed to future tense) as well as the use of the word satisfaction in the Likert-scale response option as opposed to expectation. Trialing the TSES with a sample of 56 children, the authors [19] reported Cronbach's alpha of 0.77 for Capabilities and 0.75 for Fictional View. No further information about psychometric were reported, nor in any subsequent papers that cited this study.

Frankenstein Syndrom Questionnaire (FSQ)
The FSQ was described by scale developers [45] as "a psychological tool specific for measuring acceptance of humanoid robots including expectations and anxieties toward this technology in the general public" (p. 242). In their rationale for developing the scale, Nomura and colleagues [45] referred to the concept of the Frankenstein Syndrome [57] that postulates a historical tendency for Western cultures to be more fearful of humanoid robots than Eastern cultures like Japan. In a pilot study, the scale developers collected open-ended feedback from 204 Japanese and 130 UK university students about the spread of humanoid robots and their perceived future role in society. Combining themes from these answers as well as content from a questionnaire about genetically modified food products, 30 items were generated for an initial English-language version of the FSQ, which was directly followed by the production of a Japanese version.
Questions are presented in a Likert-scale format with seven response options (1 = Strongly disagree, 2 = Disagree, 3 = Disagree a little, 4 = Not decidable, 5 = Agree a little, 6 = Agree, 7 = Strongly agree). When testing the Japanese version of the FSQ with a sample of 1,000 participants recruited online, instructions explained what humanoid robots are and were followed by six images of various example robots [45]. None of these robots were shown to be engaging in any task, and no further description was provided. Data were analyzed using exploratory factor analysis, which revealed a four-factor solution: General Anxiety Toward Humanoid Robots (13 items, e.g., "The development of humanoid robots is blasphemy against nature"), Apprehension Toward Social Risks of Humanoid Robots (5 items, e.g., "If humanoid robots cause accidents or trouble, persons and organizations related to development of them should give sufficient compensation to the victims"), Trustworthiness for Developers of Humanoid Robots (4 items, e.g., "I can trust persons and organizations related to development of humanoid robots"), and Expectation for Humanoid Robots in Daily Life (5 items, e.g., "Humanoid robots can create new forms of interactions both between humans and between humans and machines"). Three items were excluded due to results from an item analysis or low factor loadings. Cronbach's alpha values were marginally acceptable to good for the three subscales with the lower number of items (0.69 to 0.71) and excellent (0.91) for the 13-item subscale General Anxiety Toward Humanoid Robots. The correlations between subscale scores were varied, including little to no correlation as well as moderately high correlations between General Anxiety Toward Humanoid Robots and Apprehension Toward Social Risks of Humanoid Robots (r = 0.40) and Trustworthiness for Developers of Humanoid Robots with Expectation for Humanoid Robots in Daily Life (r = 0.51). Also note that higher scores on two of the subscales indicate higher anxiety but lower anxiety for the other two. Lastly, the results also indicated that older participants generally expressed lower anxiety levels as well as those participants who had previously experienced humanoid robots.
The developers of the FSQ [45] described the above factor structure as tentative, thus suggesting that further research confirm the psychometric properties of the scale. In a follow-up study [58], the research team involved in the development of the original scale conducted further psychometric analyses using a pooled dataset of Western and Japanese respondents who had been recruited online, as well as another dataset of Japanese respondents only as a test of generalizability. Based on the results from an exploratory factor analysis, the authors proposed a five-factor solution, with the factor named General Negative Attitudes Towards Robots (10 items), General Positive Attitudes Towards Robots (9 items), Principal Objections to Humanoid Robots (5 items), Trust in Robot Creators (4 items), and Interpersonal Fears (2 items). One of the items was discarded due to low reliability in one of the sub-samples. Cronbach's alpha was acceptable for all of these subscales.
In a further study [59], the FSQ development team investigated to what extent the factor structure may, among other factors, differ by age and thus collected a sample of 100 UK and 100 Japanese respondents, with each sub-sample consisting of equal number of respondents in their 20s and 50s. Their exploratory factor analysis revealed that a four-factor solution was slightly preferable to a five-factor model. Seven items were discarded due to cross-loadings or low reliability. Because one of the factors would have had only two items, this factor was also discarded, leaving the remaining 21 items grouped into three factors: Negative Attitudes Towards Robots (9 items), Expectation for Humanoid Robots (9 items), and Root Anxiety Toward Humanoid Robots (3 items). All subscales had Cronbach's alpha values above 0.85. For the first and third subscales, a higher score indicates an increased level of anxiety. For the whole sample, the first two subscales were not correlated, the correlation between the first and third subscales was moderately high (r = 0.47), and the second and third subscales had a small negative correlation (r = −0.21). The first and third FSQ subscales were moderately to highly correlated with two of the NARS subscales [44], and the second FSQ subscale presented with a moderate to high negative correlation with the other NARS subscale.

Multi-Dimensional Robot Attitude Scale
The developers of the Multi-Dimensional Robot Attitude Scale [48] argued that previous scales such as the NARS [44] focused only on negative attitudes towards robots and thus identified the need to create a measure that captures a wider range of attitudinal aspects. To generate items for the Multi-Dimensional Robot Attitude Scale, the developers recruited 83 Japanese adults and presented them with audiovisual material that introduced four different robots. Participants then provided verbal feedback in response to specific questions about their perception of each of these robots as well as open-ended feedback about their opinions about robots in general. Subsequently, participants were also interviewed as a group and asked about their experiences while answering these questions. All of these results, combined with a review of the relevant literature, provided the basis for the scale developers to generate 125 candidate items. These items presented sentences to which respondents were asked to indicate the extent of their agreement using a seven-point Likert scale ranging from −3 (not at all) to 3 (very much). Although the developers of the scale [48] referred to measurement of attitudes to domestic robots, the scale was still included in the present review due to the content of the items, which varied from being general in nature to being specific to social support.
The subsequent stage in the development of the Multi-Dimensional Robot Attitude Scale involved a cross-national sample (Mainland China n = 126, Japan n = 175, and Taiwan n = 130) where the candidate items were presented in the respective languages. Exploratory factor analysis identified a 12-factor structure, which was deemed to be robust across samples. To reduce the length of the scale, two to seven items were chosen from each factor, which tended to be either those items with a high factor loading or those that were deemed to maintain distinctiveness and clarity of the intended concepts. In this final 49-item version, item mean scores are calculated for each of the following 12 subscales: Familiarity (e.g., "If a robot was introduced to my home, I would feel like I have a new family member"), Interest (e.g., "I would want to boast that I have a robot in my home"), Negative Attitude (e.g., "It would be a pity to have a robot in my home"), Self-efficacy (e.g., "I have enough skills to use a robot"), Appearance (e.g., "I think the robot design should be cute"), Utility (e.g., "Robots are practical"), Cost (e.g., "I think robots are heavy"), Variety (e.g., "I think robots should make various sounds"), Control (e.g., "I think a robot could recognize me and respond to me"), Social support (e.g., "I expect my family or friends to teach me how to use a robot"), Operation (e.g., "Robots can be used by remote control"), and Environmental Fit (e.g., "I worry that robots are suitable for the state (layout of the furniture and other things) of my room now"). Apart from the subscales Cost and Control (Cronbach's alpha 0.56 and 0.64, respectively), all other alpha values were above 0.70, indicating adequate internal consistency reliability. No further published reports of the psychometric properties of the Multi-Dimensional Robot Attitude Scale could be found.

Discussion
The present review identified six scales that met the criteria of being measures of robot acceptability and having information available about their psychometric properties. Apart from the NARS [44], for which extensive psychometric information was published in 2006, all other scales have only emerged very recently-within the time window of 2012 to 2017. As a result, data on psychometric properties and cross-cultural generalizability are still only starting to become available. For the NARS [44] and FSQ [45], a convincingly robust factor structure remains to be identified, as follow-up studies using exploratory factor analysis did not fully agree with the results from the original scale development studies. For the RoSAS [46], only one additional study reported results from psychometric analyses [55], and the conclusions from this study were heavily limited by the small sample size. For the remaining acceptability scales identified in this review, the Ethical Acceptability Scale [47], the TSES [19], and the Multi-Dimensional Robot Attitude Scale [48], no further studies tested their psychometric properties.
In terms of item content and intended purpose, the six scales reviewed here offer researchers a fair degree of variety. While the NARS and FSQ are both inquiring about attitudes and anxieties about robots, the FSQ covers broader societal implications, and the NARS is more specifically focused on aspects directly related to interactions with robots. The Multi-Dimensional Robot Attitude Scale permits an assessment of a larger range of attitudes, including positive or neutral aspects, and the RoSAS is focused on more fundamental associations with robots due to its use of a semantic differential scale instead of agreement ratings with statements. The TSES is a useful tool to assess expectations prior to encountering a robot, particularly for the purpose of exploring the effects of these expectations on satisfaction after having interacted with the robot. Of all six questionnaires listed in Table 1, the Ethical Acceptability Scale is the most specific by inquiring about ethical issues related to the use of social robots in therapy for children with autism.
Although all six psychometric scales reviewed here met the definition of a measure of acceptability, the distinction between acceptability and acceptance is frequently blurred in the actual usage of the scales and the instructions provided to participants. For example, the original development study of the RoSAS [46] did not provide their respondents with any images, description, or video material of robots, and the scale was intended to be used without reference to specific robots. However, some of the subsequent use of the RoSAS also included experimental studies that manipulated certain aspects related to the nature of participant interactions with robots [60,61] and thus used the scale to assess post-intervention outcomes. Similarly, the NARS has also been used in a variety of ways, ranging from investigation of acceptability independent of specific encounters, exploration of the relationship between aspects of human-robot interaction and acceptance, to pre-and post-experimental assessment of attitudes to robots [62,63]. The use of images or similar material as part of questionnaire instructions is not necessarily a criterion that differentiates acceptability from acceptance measures. Robot acceptability studies do not need to be limited to assessment of very broad attitudes to robots such as in some cross-cultural work [50], but can also be linked to specific experiences, for example, when comparing responses from a general sample with those obtained from an online community of users of the popular social robot Aibo [64]. People's attitudes to robots are also directly influenced by the media [65,66] and prior personal exposure to robots [67]. All of these factors contribute to the participants' overall evaluation of robots and are often difficult to assess or control. The use of images in instructions may thus contain the effects of between-sample differences in histories and contexts of exposure to specific sub-types of robots. That way, researchers can be sure that participants provide their acceptability ratings in reference to a similarly defined robot as opposed to how the participant defines and imagines their reference point, which would introduce other sources of variability. As shown by previous research [68], attitudes may differ substantially depending on the imagined use of the robot, such as whether it is for household chores, as a learning instructor, or a pet.
The role of instructions also includes relating the robot acceptability questionnaire to the specific context of interest-in the case of the present review, social robotics. While the Ethical Acceptability Scale [47] reviewed here explicitly referred to social robots in their questions, the others keep the questions general or, as in the case of the TSES [19], require the name of a specific robot to be mentioned so that the scores can later be compared directly to a post-exposure acceptance rating. Apart from the Ethical Acceptability Scale, therefore, other scales can be applied to robots in general or any particular type, depending on the specific wording of questionnaire instructions used.
The following limitations need to be acknowledged. To contain the scope of the present review, only acceptability questionnaires were reviewed if they had been developed and validated in the context of social robotics. The delineation between the different types of robots can be diffused, and there is the possibility that some potentially relevant questionnaires have not been included. Specific questionnaires are available for the use of robots in other situations such as healthcare settings, and such measures have also tended to emerge relatively recently and have relatively little evidence of psychometric properties [69]. Just like in therapy for children with autism, where educational aims often overlap with the need to teach social skills [70,71], the application of robotics in healthcare settings generally involves a range of functions such as physical assistance, safety/monitoring, and social companionship for older adults [72]. As the field grows and more studies become available, a future review of robot acceptability questionnaires could thus focus on socially assistive robots [73] as opposed to social robots more broadly.

Conclusions
A variety of psychometrically validated scales are available to measure the acceptability of social robots, both for specific applications as well as for assessment of more general attitudes. The present review distinguished between acceptability and acceptance, where the former refers to evaluations of robots prior to encountering specific robots and the latter to evaluations after such interactions. The literature does not consistently make this distinction, and the acceptability questionnaires reviewed here have occasionally been used for both purposes. Particularly the early work on robot acceptability was largely focused on investigations of cross-cultural differences. Further work on robot acceptability could thus explore a larger range of topics mirroring the work of acceptance studies, which, for example, investigated the acceptance of robots by future professionals who are likely tasked to be implementing robots in health and educational contexts [74,75]. As the field is developing, more detailed information about the psychometric performance of robot acceptability questionnaires is required. Currently, evidence of factor structure and other psychometric indicators for acceptability measures is either very limited or inconsistent. Future work will be required to test the reliability and validity of these scales using more advanced methods such as confirmatory factor analysis or Rasch analysis.