1. Introduction
Increasing automation and the rapidly growing use of robots in industrial as well as social areas result in a stronger need for research regarding collaboration between humans and robots. Key factors for a safe and successful combination of human and robot abilities include acceptance and trust in the robot, as well as negative emotions towards the robot [
1]. In order to prevent physical and psychological harm to humans, reducing these negative emotions and increasing trust and acceptance is essential [
2,
3].
Research has already shown that the introduction of collaborative robots into the workplace can lead to multiple negative emotions among workers, ranging from fear of losing their jobs [
2] to increased stress levels while working with the robot, resulting in a reduction in workers’ mental health [
4,
5]. Moreover, the incorrect behavior of industrial robots can lead to increasing mental stress in the test subjects, who subsequently changed their behavior when working with the robot, increasing the risk for errors [
6].
In addition to the methods for subjectively measuring trust, acceptance, fear, and emotions, there are also objective measures that can supplement the subjective results. The psychological processes that arise when experiencing a situation are often accompanied by physical reactions [
7]. This fact can be used to yield emotional, cognitive, and physical strain measurable and thus, more objective. This also applies to the strain experienced during human–robot interaction and therefore, makes it an interesting topic for this field.
These psychophysiological measures are characterized by several parameters that support their use in the analysis of situation-related stress. On the one hand, psychophysiological measures are non-invasive, and on the other hand, they cannot be intentionally influenced by humans, in most cases [
8]. Due to the high complexity of psychological processes, a single psychophysiological measure is usually not sufficient to provide an in-depth picture [
7]. Therefore, several of these measures should be combined and supplemented with subjective and technical data whenever possible. A multidimensional approach is therefore advisable. The collection of psychophysiological data, such as electrodermal activity and electrocardiography, is well suited for the recording the beforementioned stress levels and potentially negative emotional states of humans during interactions with an industrial robot [
9,
10].
Virtual and augmented reality simulations have emerged as promising tools for training operators in collaborative tasks, potentially reducing anxiety before working with real robots [
11,
12]. Nikolaidis et al. [
13] showed that so-called cross-training, in which humans and robots swap roles, had the tendency to increase trust in the robot. Furthermore, Palmarini et al. [
12] used augmented reality (AR) technology for developing an interface that could successfully contribute to increasing trust in human–robot collaboration (HRC). Additionally, immersive virtual training is suggested as a way of safely and harmlessly investigating human behavior when working with robots in high-risk situations [
14].
However, it remains partially unclear whether subjective feelings and psychophysiological reactions observed during virtual training accurately reflect those experienced in real-world human–robot interactions. Addressing this knowledge gap is crucial for designing effective training protocols that build trust and ensure safety. Therefore, the aim of the exploratory study was to examine the emotional and psychophysiological reactions of individuals while performing collaborative tasks with an industrial robot in both real and virtual working environments. The following research questions should be answered:
Are there differences regarding subjective measures [(I) perceived valence, arousal, and dominance; (II) usefulness and satisfaction; (III) trust; (IV) mental demand, performance, and effort] during the human–robot interaction of men and women at three levels of complexity (low, medium, and high) in real and virtual working environments?
Are there differences regarding objective measures [(V) cardiovascular and (VI) electrodermal activity] during the human–robot interaction of men and women at three levels of complexity (low, medium, and high) in real and virtual working environments?
2. Materials and Methods
2.1. Study Participants
A total of 46 participants (23 female), aged between 20 and 58 years (M = 26.63, SD = 7.68), participated in the study; 16 participants (35%) had already worked with a robot. Furthermore, they reported that they had less to very much VR-experience (M = 3.07, SD = 0.95; 5-point scale ranging from 1 = no VR-experience to 5 = very much VR-experience). Additionally, all participants were asked to assess their general attitude towards working with industrial robots. Overall, they assessed it with partly to very good (M = 4.35, SD = 0.53; 5-point scale ranging from 1 very poor to 5 = very good). All participants provided their informed consent at the beginning of the study and participated in the study voluntarily. Ethical approval for this study was obtained from the ethics committee of Furtwangen University (approval Number: 22-030).
2.2. Study Design
A mixed design was used for the exploratory study. The independent variables included gender (IV1; men, women) and training method (IV2; real environment, virtual environment). Furthermore, the participants were required to perform three different interaction tasks (IV3; measurement repetition factor) within each training condition with three different levels of complexity (low, medium, high). The tasks were characterized by a sequential progression in terms of complexity, and the participants therefore completed the tasks in an ascending order. This procedure was intended to simulate real training conditions and thus facilitate a familiarization with the task and simulate an employee training for human–robot collaboration in the industrial context. Regarding the study design and the measured dependent variables, the presented study was based on previous experiments of the research group (e.g., Refs. [
15,
16,
17]).
Furthermore, it is important to note that the distribution of gender and training method was nearly balanced, with 12 woman and 11 men in the real training condition and 12 men and 11 women in the VR training condition, χ2(1) = <0.01, p = 1.000. Also, the previous working experience with robots was statistically equally distributed between the two different training groups, χ2(1) = 0.10, p = 0.757. The same can be reported for the previous VR experience, t(44) = −0.15, p = 0.879, as well as for the general attitude towards working with industrial robots, t(44) = 1.13, p = 0.267.
The following dependent variables (DVs) were used: subjective measures included valence, arousal and dominance measured using the self-assessment manikin [
18]; usefulness and satisfaction assessed using the acceptance scale of van der Laan et al. [
19] (both ranging from −2 to +2). Furthermore, trust, using the subjective trust score for the scenarios of human–robot interaction, as presented by Wagner-Hartl et al. [
20] (5-point scale ranging from 1 = strongly disagree to 5 = strongly agree), and the three scales of mental demand, performance, and effort from the NASA Task Load Index (NASA-TLX); [
21,
22], were also used. Objective measures included cardiovascular activity using heart rate (HR) and heart rate variability (HRV RMSSD; root mean square of successive differences) and electrodermal activity, analyzing skin conductance level (SCL), sum amplitude, non-specific skin conductance responses (NS.SCR), and mean sum amplitude (sum amplitude/NS.SCR). All used psychophysiological parameters were baseline corrected. The psychophysiological measures were recorded using movisens EcgMove4 [
23] and EdaMove4 [
24] (thenar/hypothenar, non-dominant hand) devices.
2.3. Materials and Procedure
Physical materials included two 3D-printed workpieces (100 mm × 50 mm × 10 mm), shown in
Figure 1. The first workpiece (left side) featured embedded M6 threaded inserts on both sides, while the second workpiece (right side) comprised two M5 threaded inserts on the left, as visible in
Figure 1. Custom 3D-printed grippers with interchangeable 11 mm and 13 mm socket attachments were designed for handling the used screws. Workpieces were secured to the HORST600 collaborative robot [
25] using a 3D-printed fixture.
Figure 2 shows the setup of the real working environment. The HORST600 robot [
25] used for the completion of the interactive tasks is visible on the right, with the workpieces on the table on the left side. A virtual replica of the workspace was used as the virtual environment, with real-time robot data streamed via a Modbus TCP interface and replayed to ensure the exact same motion of the virtual and physical robot. Special care was taken to create even conditions regarding the design of the virtual and the real environment. The workpiece and robot models were created in Blender, and the HTC Vive Pro HMD [
26] provided immersive VR interaction. The VR hand controllers of the HTC Vive Pro were used for handling objects in the virtual environment. All robot interactions were recorded via a camera and logged in Excel.
Upon arrival, participants were welcomed and introduced to the study (see
Figure 3). After providing their informed consent, participants were asked to fill out a sociodemographic questionnaire. After mounting the physiological sensors, a baseline of the psychophysiological measures was recorded, while participants sat quietly facing a fixing cross on the wall in front of them. The baseline measurement had a duration of three minutes. After finishing the baseline measurement, the participants were instructed to complete the interaction task together with the robot. Therefore, they were randomly assigned to either the VR or the real environment group.
Independent from the group (virtual or real environment), the interaction tasks consisted of three experimental subtasks, with increasing complexity (low, medium, high). The subtasks and their execution were identical under both the real and virtual environment conditions (see also
Section 2.2) to ensure comparability between settings. The subtasks are shown in
Table 1. Each subtask paired a human action with an automated robotic action. After each subtask, a subjective assessment with the questionnaires described in
Section 2.2 was conducted. Additionally, prior to the second and third task, a short resting measurement of 90 s was conducted to assess recovery and transitional physiological states. The execution time of each task took about 60 s (no difference between real and virtual environment), and the average time needed to answer the questionnaires after the tasks was 3 min. The total duration of the study took around 50 min.
Subjective data was collected via Unipark surveys, capturing sociodemographic data and participant perceptions as stated. As mentioned previously (see
Section 2.2), the psychophysiological measures were recorded via ECG and EDA using movisens EcgMove4 [
23] and EdaMove4 [
24] devices. Initial quality checks of the recordings were performed immediately after each session, with any artefacts noted in the Excel log. Raw ECG and EDA data were processed using movisens’s DataAnalyzer and custom software. Time windows were defined by the manually logged start and end times of each task. The baseline correction used the middle 90 s of a three-minute rest period. This time period was used to avoid possible onset and offset effects within the baseline measurement. Subjective questionnaires were scored according to their respective published procedures. Any artefacts within the data were identified during processing and excluded before analysis.
2.4. Statistical Analyses
IBM SPSS (Version 30) statistics and JASP were used to calculate the results. The statistical analyses were based on a significance level of 5%. Due to the exploratory approach of the study [
27], tendencies towards significance were also analyzed and based on a significance level of 10%.
4. Discussion
The aim of the study was to examine the emotional and psychophysiological reactions of individuals performing collaborative tasks with an industrial robot in both real and virtual working environments. The underlying idea was to base the tasks on potential employees’ trainings for human–robot interaction within an industrial setting. Therefore, three different levels of task complexity (low, medium, high) were performed within a real or a virtual (VR) working environment that duplicates the real working environment in a virtual scenario. The analyzed levels of task complexity (training levels) built on each other and were always completed by the participants in ascending order. Their implementation follows frequently mentioned participant suggestions from previous research of the research group regarding requirements for VR trainings of human–robot interaction [
17] and the former findings regarding task complexity and acceptance of human–robot interaction [
20].
To answer the research questions concerning differences with regard to the subjective measures, the results reveal differences in various analyzed variables. It can be shown that the working environment seems to have a significant effect on the perceived valence during human–robot interaction. Following the results, the interaction with the robot was perceived as significantly more pleasant within the VR environment than within the real environment. Additionally, the VR training environment was perceived as significantly more useful than the real training environment. The results are in line with those of previous research [
14,
17]. Furthermore, they underline the importance of using XR as promising approach for the development of future trainings e.g., [
11,
12,
17].
On the other hand, the participants of the VR group reported a significantly higher effort required to accomplish the level of performance during the training interaction with the robot than the group that performed the tasks in the real working environment, respectively. One explanation for this could be that virtual reality is not yet strongly integrated into everyday life, and therefore, the use of the controllers may still be unfamiliar. This may have had an influence on the observed results. Interestingly, the electrodermal responses of the participants pointed in the other direction, whereas the sum amplitude was tendentially significantly higher during human–robot interaction in the real than in the VR environment, and also tendentially higher for women within these conditions. Following Boucsein and Backs [
8], this can be interpreted such that a moderate increase in the amplitude of non-specific electrodermal responses is associated with increased cognitive activity (preparation activation) and a marked increase with negatively toned emotions and emotional strain (affect arousal). One explanation could be that for the real working environment, it was additionally shown that the highest task complexity was assessed as significantly more mentally demanding than the lowest and medium level of complexity. This result is in line with the results for the objective measure heart rate, which was significantly higher during the performance within tasks with the medium and highest complexity than during the performance of a task with the lowest complexity. Following Boucsein and Backs [
8], an (moderate) increase in the heart rate can be interpreted as mental strain (preparatory activation). Furthermore, differences in task complexity were also visible in the electrodermal responses of the participants and especially between levels of low and high complexity. Again, following Boucsein and Backs [
8], an increase in the skin conductance level (SCL) can be interpreted as strain and high general arousal. Furthermore, women show a significantly higher sum amplitude level during the highest task complexity than during the lowest task complexity. In summary, the comparison of the subjective data and the objective psychophysiological measures indicate that the implementation of different levels of task complexity training was successful within the presented study.
Moreover, regarding the subjective measures, only one gender difference was shown. For perceived trust, the results suggested that women assessed tasks with the highest level of complexity significantly higher than tasks with the lowest level of complexity. Future research should take this into account. For example by evaluating whether this could be an effect of habituation, or whether it is a perceived feeling, i.e., “if the robot can do this difficult task, then I can trust it”.
Like every study, this study exhibits some limitations. Elevated room temperatures caused ECG and EDA sensors to fail or lose adhesion, compromising some psychophysiological recordings. Inadequate ventilation during hot summer conditions introduced artifacts in the EDA data, suggesting that future studies should ensure better climate control and more robust equipment. In the virtual scenario, HMD connection problems and alignment failures with the workpieces disrupted some trials. Future research should solve the problems that occurred in the VR environment and also include a larger sample of participants. The intended complexity gradient in the interaction tasks were only partially manifested in subjective and physiological measures, leaving potential order and learning effects unexamined. Future studies should take this into account and expand this research.
Furthermore, future research should also investigate how effectively the training phase reduces participants’ anxiety and builds trust in the robot, with particular focus on the higher acceptance observed after virtual training. A larger, systematic, and adaptive training could be developed to specifically target negative emotions and strengthen operator confidence, leveraging psychophysiological findings that virtual reality elicits responses similar to those for real-world interactions. Furthermore, subsequent studies should consider ergonomic factors, incorporating elements such as motion sensors and ergonomically-oriented analysis. This would facilitate the investigation of potential differences in these parameters within virtual reality and real-world environments.