This subsection shows the results from the user study, conducted to evaluate human perceptions of the proposed scores. Our results in general demonstrate the proficiency of the proposed scores, particularly under the predictability constraint of the trust region. However, the limited participant sample size restricts the generalisability of the conclusions, and the experiments could benefit from the recruitment of more participants in the future.
6.1.1. Experiments with Live Demonstrations
For the experiments with trajectories generated live, we conducted a user study focused on evaluating the trustability score. This is for both experiments conducted on the simulator and on the real robot. As a result, the participants would be expected to watch pairs of trajectories generated in real time, then judge which one appears to be more trustable.
The results in the table below (
Table 2) show how many of the pairs of the trajectories the participants were able to correctly identify as the more trustable one. The last column records the participant’s definition of trustability before they were informed of our definition of trustability in the second round. The rest of the tables in
Section 6.1 share a similar structure.
The results show that the participants who initially did not align with our definition of trustability experienced a significant improvement in their ability to identify the more trustable trajectory once they were informed of our definition. This is evident in the increased number of correctly identified trajectories, as well as the decrease in instances where participants were unable to distinguish between the two trajectories (reported as N/A). These findings suggest that our encoding of the robot’s benevolence within the trustability score aligns well with human perception, by effectively and proficiently capturing this important social quality.
Moreover, participant 1 also reported that they determined the more trustable trajectory in round 2 based on benevolent movement cues such as the initial downward dip of the end effector. This subtle acknowledgement of the human’s presence suggests that trajectories with higher trustability scores may implicitly convey social signals of benevolence, highlighting the score’s potential to capture such cues.
However, the poor performance throughout the first round of most experiments also demonstrates that it was not always intuitive for the participants to associate benevolence as a core component of trustability. This finding has been reinforced by the fact that only one out of the four participants in the first round judged trustability based on our definition—benevolence. Participants 1 and 4 also reported that they viewed competence and integrity to be important factors in determining perceived trust in the context of the trust model [
12], as they emphasised the importance of the robot displaying clear intent towards the goal in perceiving trust. Participant 2 also shared similar views in the other components of trust being significant, as the preference for more predictable trajectories signalled the desire for more consistent and reliable behaviour.
As previously discussed, the time-limited sampling approach in generating trajectories would result in trajectories intended to be trustable, to not be truly trustable, where the trustability score of the trajectory falls below the trustability score threshold of 0.3. In practice, the participants were mostly comparing between two highly predictable trajectories, to determine which one was more trustable. This was especially the case for the experiments conducted on the real robot, where computation is slower, resulting in a lower chance of a truly trustable trajectory being generated. Overall, this is reflected in many studies, where the pair of trajectories was too similar to each other. This problem could be resolved by using pre-recorded demonstrations, which will be discussed next.
6.1.2. Experiments with Pre-Recorded Demonstrations
As mentioned previously, by leveraging pre-recorded videos of trajectories in experiments, the intended trustable/sociable trajectories can be guaranteed to be over their defined thresholds for their respective scores. Since we have greater control of the trajectories shown to the participants, we could fall back to the default acceptable trajectory generation algorithm, which would result in generated trustable/sociable trajectories with scores close to the maximum values, as determined from the planners analysis. This would allow the participants to be more confident with their decisions, resulting in very few reports that the participants could not distinguish between the pairs of trajectories (reported as N/A), as the two trajectories within each pair are now very distinct.
The results below (
Table 3 and
Table 4) from the trustability and sociability experiments reveal a new problem, where optimising for solely the trustability or the sociability score resulted in unpredictable trajectories, where the participants could not understand any intent from the robot, let alone trustable or sociable intent. This has been demonstrated from the poor results from both experiments. We hence conduct experiments with the trust region of predictability applied, where we show the participants trustably and socially predictable trajectories. The results from these experiments show a great improvement in the humans’ ability to recognise the trustable/sociable trajectories.
Table 3.
A summary of the findings from the trustability experiment (pre-recorded demonstrations).
Table 3.
A summary of the findings from the trustability experiment (pre-recorded demonstrations).
| Participant | Round 1 (Participant’s Own Definition) | Round 2 (Participant Given Definition) | Participant’s Own Definition of Trustability in the First Round |
|---|
| P5 | 1/5 correct | 0/5 correct | Safety |
| P6 | 0/5 correct | 0/5 correct | Predictability/Efficiency |
| P7 | 3/5 correct | 5/5 correct | Predictability |
| P8 | 0/5 correct | 2/5 correct | Predictability |
These results were initially surprising, as even though we have significantly increased the trustability score of the intended trustable trajectory compared to the ones in the live demonstrations, the participants performed worse. Not only did most participants struggle to identify the trustable trajectory in the first round, this was also mostly replicated in the second round, even after they were given our definition of trustability.
However, the way most participants perceived trustability (in the first round) reveal the reasonings behind the poor performance. Even though the trustable trajectories had a high trustability score (on average around 0.48), their predictability′ scores were very low (on average around 0.3). The participants in general reported that they struggled to understand the robot’s intent from these highly unpredictable trajectories, and thus decided to choose the highly predictable trajectory as the one they could trust. Although excessive movements from the unpredictable trajectories provided more opportunity to signal trustable intent (resulting in a higher trustability score), they also crucially hindered the perception of trust. This was from clear signals of the robot’s incompetence, which is another key component in the trust model [
15]. This explains the disappointing performance in the second round, even after the participants were told our definition of trustability.
We could also now understand why the participants performed worse in this experiment compared to the one with live demonstrations. Even though the pairs of trajectories were very similar in the live demonstrations, and there rarely existed a trajectory that was truly trustable, both of the trajectories within the pairs were however both highly predictable. As a result, the trust region [
28] was not violated, and the participants could sometimes interpret trustable intent from certain trajectories, despite being minor. This motivates using trustably predictable trajectories for generating trustable motion, as these would convey trustable intent that could be interpreted.
Table 4.
A summary of the findings from the sociability experiment (pre-recorded demonstrations).
Table 4.
A summary of the findings from the sociability experiment (pre-recorded demonstrations).
| Participant | Round 1 (Participant’s Own Definition) | Round 2 (Participant Given Definition) | Participant’s Own Definition of Sociability in the First Round |
|---|
| P9 | 0/5 correct | 0/5 correct | Humanness/Predictability |
| P10 | 0/5 correct | 0/5 correct | Predictability |
| P11 | 1/5 correct | 0/5 correct | Humanness |
| P12 | 0/5 correct | 0/5 correct | Humanness/Predictability |
The sociable trajectories also suffered from a similar problem, where the sociability scores were very high (at around 0.65) and the predictability′ scores were very low (at around 0.4). Predictability seems to be the crucial limiting factor in the motion’s socialness, as the participants performed even worse in the sociability experiment than the trustability experiment, where the number of sociable trajectories correctly identified in total is one, over all four participants. This time, the participants also noted a movement present in all sociable trajectories that caused them to determine them as not sociable. These trajectories all involved the end effector rotating more than 360 degrees during motion, which most humans concluded to not only be unpredictable and difficult to understand, but also physically impossible to be replicated on humans, and thus inhuman. Although in the context of the sociability score, these movements provided more chances in conveying sociable intent, in directing the general motion away from the human observer and towards the goal, they nonetheless heavily violated the perceived socialness by exhibiting unnatural behaviour which confused the human observers.
The participants also reported the definition of sociability to be complicated and not intuitive, meaning that even after being informed of our definition of sociability, the participants still struggled to have a clear understanding of which trajectory we intended to be social. This is reflected from no improvement in performance in the second round across all the participants. Despite this, it is not considered a major concern, as our main goal is to propose scores that implicitly captured social characteristics, not definitions that intuitively described how we capture these characteristics.
All of this also motivates employing socially predictable trajectories for generating sociable motion that humans could interpret more intuitively.
The results shown in
Table 5 demonstrate that the participants were much more competent in choosing the trustable trajectory when they are trustably predictable, as opposed to only being trustable. Similarly to the experiment conducted on live demonstrations, the participants performed better in general after being informed of our definition of trustability, which demonstrates that trustability does encode the described characteristics well. Since there were no reports of N/A, we can also conclude that the participants were more confident in identifying trustably predictable trajectories.
In general, the participants noted predictability to be an important factor in determining perceived trust. This justifies our approach of applying the trust region of predictability to enforce a hard constraint on how predictable a trajectory must first be, in order to encode the importance of competence and integrity.
The results in
Table 6 reveal very significant improvements from the sociability experiment.
Since most participants were able to reliably and intuitively identify the sociable trajectory in the first round of the experiment using their own interpretation, this shows that socially predictable trajectories are proficient in encoding how humans naturally perceive socialness in robot motion.
Although different participants gave different reasonings for their choice, the fact that they were all able to confidently choose the socially predictable trajectory demonstrates that these trajectories indeed exhibit universal social behaviour seen in interactions. This also highlights the significance of our contribution, as the definition of sociability has been shown to be very subjective, and can even be interpreted in opposing ways, with some participants associating it with more human-like behaviour and others with less human-like behaviour. Nonetheless, our proposal of socially predictable trajectories could be widely accepted by humans with diverse views of sociability.