Previous Article in Journal
Synthetic Hamiltonian Energy Prediction for Motor Performance Assessment in Neurorehabilitation Procedures: A Machine Learning Approach with TimeGAN-Augmented Data
Previous Article in Special Issue
A Dataset of Standard and Abrupt Industrial Gestures Recorded Through MIMUs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration

1
ARM-Lab, Scuola Universitaria Professionale della Svizzera Italiana, 6900 Lugano, Switzerland
2
Bitbrain, 50006 Zaragoza, Spain
3
Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza, 50009 Zaragoza, Spain
4
Faculty of Informatics, Università della Svizzera Italiana, 6900 Lugano, Switzerland
*
Author to whom correspondence should be addressed.
Robotics 2025, 14(12), 184; https://doi.org/10.3390/robotics14120184
Submission received: 31 October 2025 / Revised: 28 November 2025 / Accepted: 3 December 2025 / Published: 5 December 2025
(This article belongs to the Special Issue Human–Robot Collaboration in Industry 5.0)

Abstract

Human–robot collaboration (HRC) is a key focus of Industry 5.0, aiming to enhance worker productivity while ensuring well-being. The ability to perceive human psycho-physical states, such as stress and cognitive load, is crucial for adaptive and human-aware robotics. This paper introduces MultiPhysio-HRC, a multimodal dataset containing physiological, audio, and facial data collected during real-world HRC scenarios. The dataset includes electroencephalography (EEG), electrocardiography (ECG), electrodermal activity (EDA), respiration (RESP), electromyography (EMG), voice recordings, and facial action units. The dataset integrates controlled cognitive tasks, immersive virtual reality experiences, and industrial disassembly activities performed manually and with robotic assistance, to capture a holistic view of the participants’ mental states. Rich ground truth annotations were obtained using validated psychological self-assessment questionnaires. Baseline models were evaluated for stress and cognitive load classification, demonstrating the dataset’s potential for affective computing and human-aware robotics research. MultiPhysio-HRC is publicly available to support research in human-centered automation, workplace well-being, and intelligent robotic systems.

1. Introduction

In the field of Human–Robot Collaboration (HRC), physiological signals are raising high interest thanks to their potential to capture human states such as stress, cognitive load, and fatigue [1]. In the human-centric view promoted by Industry 5.0, industrial workplaces should aim at striking a balance between worker productivity and well-being [2]. This includes conceiving robotic systems that can not only perform physical tasks in support of human workers but also change their behavior depending on the psycho-physical state of operators, coupled with context information. This approach of deliberative robotics [3] cannot unleash its full potential unless the human psycho-physical state can be perceived by the robot. This idea is the core goal of the Fluently project, which aims to enhance human–robot collaboration by enabling robots to adapt their behavior based on the psycho-physical state of human operators.
To develop robotic systems capable of adapting to human states, it is essential to build machine learning models that can reliably infer the mental state from physiological and behavioral signals. However, training such models requires datasets that not only include a diverse range of conditions but also reflect real-world industrial settings. Many existing datasets focus on a limited subset of modalities and are rarely collected outside of controlled laboratory conditions, limiting their applicability to HRC scenarios [4,5].
The goal of this work is to provide a comprehensive multimodal dataset that enables the study and prediction of human mental states in realistic human–robot collaboration settings. To achieve this, we introduce MultiPhysio-HRC, the first publicly available dataset that combines real-world, industrial-like HRC scenarios with a rich set of synchronized physiological and behavioral signals. The dataset integrates facial features, audio, and five physiological modalities—Electroencephalography (EEG), Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and electromyography (EMG)—offering a complete multimodal representation of stress, cognitive load, and emotional state. It includes a diverse set of tasks designed to elicit different mental conditions. The chosen tasks range from controlled cognitive tests to immersive VR experiences and physically integrated industrial procedures. Furthermore, validated psychological questionnaires collected at multiple stages of the protocol provide fine-grained ground-truth labels. Together, these elements constitute the core contributions of this paper and establish MultiPhysio-HRC as a foundational resource for advancing mental state estimation in human–robot collaboration. The dataset is publicly available at https://automation-robotics-machines.github.io/MultiPhysio-HRC.github.io/ (accessed on 2 December 2025).
The remainder of this paper is organized as follows: Section 2 presents the related dataset with similar modalities combination; Section 3 explains the experimental protocol for data collection, describing tasks and data; in Section 4 the processing pipelines for filtering and feature extraction are detailed; Section 5 presents the results achieved using traditional models, while Section 6 discusses them. Section 7 concludes the work by presenting final remarks and future directions.

2. Related Works

The field of Affective Computing has a long history of utilizing public datasets for emotion and mental state recognition through diverse experimental setups and various combinations of physiological and behavioral data.
Physiological monitoring has also become central in workload estimation, since these measures provide a continuous and objective description of how the human body responds to cognitive and physical demands. Unlike self-reported questionnaires, physiological signals reflect real-time changes in the autonomic and central nervous systems that accompany variations in mental effort and stress [6]. Workload is a multidimensional construct with intertwined cognitive and physical components, and signals such as ECG, EDA, EEG, EMG, and respiration offer reliable quantitative indicators of these internal processes.
One of the first publicly available datasets was published in [7]. This dataset features ECG, EDA, RESP, and EMG data on driver stress during real-world driving tasks. The WESAD dataset [8] is a multimodal dataset aimed at stress and affect detection using wearable sensors. It includes physiological and motion data from 15 participants recorded via both wrist-worn (Empatica E4) and chest-worn (RespiBAN) devices. Sensor modalities include ECG, EDA, EMG, respiration, temperature, and acceleration. Participants were exposed to neutral, stress (via the Trier Social Stress Test), and amusement conditions. Ground truth was collected using PANAS, SAM, STAI, and SSSQ questionnaires. The dataset enables benchmarking of affective state classification with a focus on wearable technology. The DREAMER dataset [9] focuses on emotion recognition in response to audiovisual stimuli. It consists of EEG and ECG signals from 23 subjects exposed to 18 short emotional video clips. After each clip, participants self-assessed their emotional state in terms of valence, arousal, and dominance using the SAM (Self-Assessment Manikins) scale. The recordings were collected using low-cost, wireless devices, making the dataset particularly suitable for developing lightweight emotion recognition systems. In [10], AVCAffe, a large-scale audio-visual dataset that studies cognitive load and affect in remote work scenarios, is presented. This dataset includes data from 106 participants performing seven tasks via video conferencing. Tasks included open discussions and collaborative decision-making exercises, designed to elicit varying levels of cognitive load. AVCAffe includes annotations for arousal, valence, and cognitive load attributes. StressID [11] is a comprehensive multimodal dataset specifically designed for stress identification, containing synchronized recordings of facial expressions, audio, and physiological signals (ECG, EDA, respiration) from 65 participants. The dataset features annotated data collected during 11 tasks, including guided breathing, emotional video clips, cognitive tasks, and public speaking scenarios.
However, the number of public datasets focusing on the physiological response of individuals during real-world HRC tasks is extremely limited. The SenseCobot dataset stands out as a structured effort to investigate operator stress during collaborative robot programming tasks [12,13]. In this study, users were trained to program a UR10e cobot in a simulated industrial setup. The authors collected EEG, ECG, GSR, and facial expressions as input data and used NASA-TLX as ground truth labels. The SenseCobot dataset lacks exposure to complex, task-integrated HRC contexts such as physical collaboration or time-constrained industrial procedures. In contrast, the MultiPhysio-HRC dataset addresses this gap by incorporating a broader range of scenarios, including manual and robot-assisted battery disassembly, cognitive load induction through psychological tests (e.g., Stroop, N-back), and immersive virtual reality tasks. Moreover, MultiPhysio-HRC features a richer set of modalities—including EEG, ECG, EDA, EMG, respiration (RESP), facial action units, and audio features, together with detailed ground truth from validated self-assessment questionnaires (STAI-Y1, NASA-TLX, SAM, and NARS), enabling a more holistic assessment of stress, cognitive load, and emotional state in realistic industrial HRC settings. A summary of the most relevant public datasets, including their modalities, purpose, ground truth, and limitations, is provided in Table 1.
Existing datasets in affective computing and workload estimation mainly focus on controlled laboratory protocols, audiovisual emotion elicitation, remote collaborative tasks, or simplified forms of human–robot interaction such as robot programming. These efforts have demonstrated the value of physiological and behavioral signals for studying stress, emotion, and cognitive demand, but they rarely capture the complexity of real industrial environments, where cognitive, physical, and temporal pressures interact continuously. As highlighted in recent reviews, multimodal datasets collected in ecologically valid, operational HRC scenarios are largely missing, limiting the development of models that can generalize beyond controlled settings and support human-aware robotic systems in practice. The MultiPhysio-HRC dataset directly addresses this research gap by providing synchronized physiological, facial, and audio data acquired across controlled cognitive tasks, immersive VR experiences, manual industrial operations, and physically integrated human–robot disassembly. This design combines the experimental rigor of laboratory tests with the validity of real HRC workflows, laying the foundation for future models capable of estimating mental states in realistic industrial contexts. By integrating heterogeneous task conditions and multiple forms of ground truth, the dataset supports a holistic assessment of mental state that captures its cognitive, affective, and physiological dimensions.

3. MultiPhysio-HRC

3.1. Experimental Protocol

The data collection campaign was designed to build a multimodal and multi-scenario dataset for mental state assessment, integrating psychological, physiological, and behavioral data. The protocol designed for this dataset acquisition is inspired by the work presented in [14]. The protocol spans two days of activities, focusing on varying stress levels and operational conditions, including human–robot collaboration and manual tasks. A schematic representation of the overall protocol is represented in Figure 1.
The protocol integrates tasks designed to selectively elicit cognitive workload together with operational scenarios involving overlapping physical and cognitive demands. Controlled tasks allow us to isolate mental workload markers in a reproducible way, as commonly adopted in physiological monitoring studies, while manual and robot-assisted disassembly replicate the complex psycho-physical load that characterizes real industrial collaboration [1]. This combined design responds to the gap identified in recent HRC ergonomics literature, calling for more ecologically valid datasets collected during realistic human–robot operations [15], and enables the researchers to have a direct comparison of the isolated mental states with the complex ones obtained during HRC.

3.1.1. Day 1—Baseline and Stress Induction

Participants began with a resting period to establish baseline physiological measures. Following this, they were asked to perform activities including cognitive load tests, breathing exercises, and VR games. In detail:
1.
Rest: The participant sits comfortably for two minutes and is invited to relax without specific instructions.
2.
Cognitive tasks. The participant sits in front of a computer screen, using a keyboard and mouse to interact with different games aimed at increasing their cognitive load and eliciting psychological stress. The selected tasks are:
(a)
Stroop Color Word Test (SCWT) [16] (three minutes). Color names (e.g., “RED”) appear in different colors. The participants must push the keyboard button corresponding to the color of the displayed letters (e.g., “B” if the word “RED” is written in Blue characters). The task was performed with two difficulty levels: one second and half a second to answer.
(b)
N-Back task [17] (two minutes). A single letter is shown on the screen every two seconds. The participant must press a key whenever the letter is equal to the N-th previous letter.
(c)
Mental Arithmetic Task (two minutes). The participant must perform a mental calculation in three seconds and press an arrow key, selecting the correct answer among four possibilities.
(d)
Hanoi Tower [18]. The participant must rebuild the tower in another bin, without placing a larger block over a smaller one. There was no time constraint on this task.
(e)
Breathing exercise (two minutes). A voice-guided controlled breathing exercise.
The order of these tasks was randomly chosen for each participant. A representation of the displayed screen is shown in Figure 2. During the execution of these tasks (except the Hanoi tower and the breathing exercise), a ticking clock sound was reproduced to arouse a sense of hurry, and a buzzer sound was played in case of mistakes, to increase the psychological stress.
3.
VR games. Finally, participants performed immersive tasks in virtual reality environments such as Richie’s Plank Experience (https://store.steampowered.com/app/517160/Richies_Plank_Experience/, accessed on 2 December 2025) to elicit a high-intensity psycho-physical state. In this game, participants had to walk on a bench suspended on top of a building.
After each one of these tasks, the ground truth questionnaires were administered (see Section 3.4).

3.1.2. Day 2—Manual and Robot-Assisted Tasks

The second day was dedicated to a battery disassembly task (described in Section 3.2), designed to compare the experience of fully manual work with HRC. In detail, the second day was structured in the following phases:
  • Rest. The participant sits comfortably for five minutes and is invited to relax without specific instructions.
  • Manual disassembly. The participant uses bare hands or simple tools to partially disassemble an e-bike battery pack.
  • Collaborative disassembly. The participant is given instructions about how to interact with the robot by voice commands. Then, they perform the same disassembly by asking the cobot to perform support or parallel operations. The voice commands are not only used to give instructions to the robot naturally, but are also opportunities to collect voice data and observe human–robot dynamics under operational conditions.
Each task (manual and robot-assisted) was repeated up to five times to elicit fatigue. After each one of these tasks, the ground truth data was collected.

3.2. Task and Robotic Cell Description

The industrial task described in Section 3.1.2 involves e-bike battery disassembly, a task selected due to its fundamental importance for fostering sustainable industrial practices. Participants performed both manual and collaborative disassembly of various battery models, with procedures designed to adhere to real-world conditions safely. For safety reasons, the original battery cells were replaced with aluminum cylinders of the same shape and dimensions, eliminating soldering materials and hazardous components.
During manual disassembly, the operator opened the battery cover, removed the Battery Management System (BMS), detached the cables, unscrewed the battery components, removed the soldering, and extracted the batteries. In the collaborative disassembly phase, given the difficulty associated with opening the battery casing, this step was conducted collaboratively: the robot pressed against the battery cover to stabilize it, while the human operator loosened the fixturing. Subsequently, while the operator disassembled the BMS, the robot simultaneously unscrewed other battery components. Once the operator finished disassembling the BMS, the human and robot cooperatively unscrewed the remaining components. In Figure 3, the complete set of steps of the collaborative disassembly is represented.
A Fanuc CRX-20 (https://www.fanuc.eu/eu-en/product/robot/crx-20ial, accessed on 2 December 2025) collaborative robot was used for this task. To ensure operator safety, the Fanuc CRX-20 features built-in safety mechanisms, including force and contact sensors, enabling the robot to detect and respond to unexpected physical interactions. The robotic cell used for the data acquisition is shown in Figure 4. The robot was equipped with voice control capabilities, allowing the operator to issue verbal instructions for specific commands. The pipeline consists of an Automatic Speech Recognition (ASR) module and a Natural Language Understanding (NLU) module, which translates the spoken word into robot instructions. This pipeline is presented in [19]. After receiving the instructions, IPyHOP [20], a Hierarchical Task Network (HTN) planner, decomposed the high-level command into a sequence of atomic robotic actions. When required, the robot automatically switched tools to execute these actions effectively. The motion trajectories for the robot were computed using the Pilz industrial motion planner from MoveIt2 [21], ensuring precise and safe manipulation.

3.3. Participants

In total, 55 subjects participated on the first day of the data collection. The sample mean age is 27.98 ± 10.22 . 48 subjects were male and 7 were female. Out of the 55, 42 also participated in the second day. Most subjects were invited from the author’s research facility, while the others accepted an external invitation. Participant background varies from undergraduate engineering students to researchers, including professionals in other fields.

3.4. Ground Truth

Throughout the experiment, ground truth data were collected by administering multiple self-assessment questionnaires. After each task described in Section 3.1, the subjects were asked to answer three questionnaires:
  • The Stress Trait Anxiety Inventory-Y1 (STAI-Y1) [22] consists of 20 questions that measure the subjective feeling of apprehension and worry, and it is often used as a stress measurement.
  • The NASA Task Load Index (NASA-TLX) [23] measures self-reported workload and comprises six metrics (mental demand, physical demand, temporal demand, performance, effort, and frustration level).
  • The Self-Assessment Manikin (SAM) [24] assesses participant valence, arousal, and dominance levels. The scale used in this dataset is from one to five.
Moreover, at the beginning of the first part of the experiment, participants were asked to complete the Negative Attitude Towards Robots (NARS) [25] questionnaire to identify their attitude toward robots.

3.5. Acquired Data

Electroencephalogram signals were acquired using the Bitbrain Diadem (https://www.bitbrain.com/neurotechnology-products/dry-eeg/diadem, accessed on 2 December 2025, Model: EEG.A1, Manufacturer: Bitbrain, Zaragoza, Spain), which is a wearable dry-EEG with 12 sensors over the pre-frontal, frontal, parietal, and occipital brain areas. In particular, the acquired channels are: AF7, Fp1, Fp2, AF8, F3, F4, P3, P4, PO7, O1, O2, PO8, plus ground and reference electrode on the left earlobe.
For the collection of electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and electromyography (EMG), we used the Versatile Bio (https://www.bitbrain.com/neurotechnology-products/biosignals/versatile-bio, accessed on 2 December 2025, Model: BIO.A1, Manufacturer: Bitbrain, Zaragoza, Spain) sensor from Bitbrain. The ECG sensor was placed in a V2 configuration to reduce signal noise caused by arm movements. To allow free movement during the experiment, the EDA sensor was placed on the index and middle fingers of the non-dominant hand. The EMG sensor was placed on the right trapezius, while the respiratory band was placed over the subject’s chest. In Figure 5, a sample of the collected physiological signals is represented. These devices have been used in other HRC setups such as [26].
Video recordings of the participants were obtained using the laptop webcam (Lenovo Legion Pro 5 16IRX8, Lenovo, Beijing, China), placed in front of them during both cognitive tasks and industrial tasks. The camera acquisition rate is 30 Hz. Finally, audio recordings were obtained using a Bluetooth microphone (Model: LJM-NINE, Manufacturer: KUKIHO, Shenzhen, China), with audio captured at 48 KHz.
All the physiological signals were acquired at 256 Hz using the software SennsLab (v7.1, https://www.bitbrain.com/neurotechnology-products/software/sennslab, accessed on 2 December 2025). The software manages Bluetooth communication with the devices and synchronizes the physiological signals and audio-video data. The data are displayed in real-time, allowing for visual inspection during the experiment.

4. Methods

4.1. Data Processing

The ECG signals were filtered using a combination of a band-pass filter (with a frequency range from 0.05 to 40 Hz) and a Savitzky–Golay filter.
Electromyography signals were filtered using a band-pass filter with a frequency range from 10 to 500 Hz coupled with a detrending algorithm, which removes the signal trend by evaluating the linear least-squares fit of the data as specified in the SENIAM recommendations [27].
The Electrodermal activity signal was filtered using a low-pass filter with a cut-off frequency of 10 Hz, coupled with a convolutional signal smoothing. Then, the signal is down-sampled at 100 Hz and divided into phasic and tonic components using the algorithm presented in [28].
Respiration signals were filtered using a second-order band-pass filter with a frequency range from 0.03 to 5 Hz.
Electroencephalogram signals were processed using two filters: a second-order band-pass filter with a frequency range from 0.5 to 40 Hz and a band-stop filter from 49 to 51 Hz to remove the amplifier noise.

4.2. Features Extraction

4.2.1. Physiological Data

Following the processing pipeline, a total of 250 features were extracted from the processed physiological signals, segmented in 60 s windows, using the Neurokit2 package [29] (0.2.10). These features comprise time-domain, frequency-domain, and complexity measures. For ECG signals, Heart Rate Variability (HRV) features were computed, following the definitions outlined in [30]. EMG feature descriptions can be found in [31], while EDA-related features are detailed in [32].
Concerning the EEG signals, after processing we segment the signal in 5 s window and compute 7 for each of the 12 channels, together with the ratios over the right and left hemispheres ( θ F 3 / α P 3 , and θ F 4 / α P 4 ), which where significant to discriminate between levels of mental workload in [33]. We evaluate the power in the frequency bands ( γ (30–80 Hz), β (13–30 Hz), α (8–13 Hz), θ (4–8 Hz), and δ (1–4 Hz)) using Welch’s Power Spectral Density (PSD) [34]. Welch’s method estimates the power spectrum of a signal by segmenting it into overlapping windows, computing the Discrete Fourier Transform (DFT) for each window, and then averaging the squared magnitudes. The PSD is computed as follows:
P ( ω ) = 1 K k = 1 K | X k ( ω ) | 2 M
where X k ( ω ) is the DFT of the k-th windowed segment, and M is the number of points in each segment. Moreover, we compute Differential entropy (DiffEn) and Sample Entropy (SampEn) for each channel.

4.2.2. Face Action Units

To optimize computational efficiency, facial data were analyzed at a reduced frame rate of 2 fps. Action Unit (AU) detection was performed using the pre-trained XGBoost model from Py-Feat [35], which identifies the presence of facial muscle activations. The model estimates a probability score for each of the 20 detected action units at every selected frame, forming a multivariate time series per repetition.

4.2.3. Voice Features

The spoken segments were automatically detected using the Silero-VAD model [36]. Features consisted of statistical measurements of the fundamental frequency, harmonicity, shimmer, and jitter. Moreover, the features include speech formats and Mel Frequency Cepstrum Coefficients (MFCCs). From the latter, we evaluated statistical measurements such as mean and standard deviation as in [37], but we also included median, kurtosis, and skewness measurements.

4.2.4. Text Embeddings

Given the spoken segments, we used the large variant of OpenAI’s Whisper model [38] to transcribe the voice into text. This transcription is later fed into a Sentence Transformer model [39] to extract the embeddings of the given text. Since participants are all Italian mother tongue, we employed a model fine-tuned for the Italian language [40].

5. Results

The proposed experimental protocol allows for the identification of a wide range of mental states in the participants. In Figure 6, the average ground truth label for each of the tasks is presented. It can be seen that participants experienced different emotional states and cognitive load during the experiment, allowing the dataset to grasp a more holistic view of the participants’ psycho-physical state.
Using the features mentioned in Section 4.2, we assess the performance of out-of-the-box baseline models in a regression and a classification task. As baseline models, we select RandomForest [41], AdaBoost [42], and XGBoost [43]. To evaluate the baseline models, we performed Leave-One-Subject-Out (LOSO) validation and computed the performance as mean and standard deviation across subjects. Both features and labels are normalized (min–max) using the maximum and minimum values of each subject. For the sake of simplicity, we evaluated three modalities: the data obtained using the Versatile Bio (ECG, EDA, EMG, RESP), the EEG data, and the voice features.
First, we performed the regression over the normalized scores of NASA and STAI. The results are presented in Table 2. Here, it can be noticed that physiological data provided the lowest RMSE, suggesting that they carry the most relevant information for estimating stress and cognitive load.
For the classification task, we identified three classes from STAI and NASA-TLX subjects’ specific scores collected throughout the entire experience. The Low class is identified as the tasks where the subject gave a score lower than μ δ / 2 , where μ is the subject’s mean score across all the tasks and δ is the standard deviation. The Medium class consists of all tasks where the subject answered with a score between μ δ / 2 and μ + δ / 2 . Finally, the tasks with High class are the ones where the subject answered with a score higher than μ + δ / 2 . The results for the classification task are presented in Table 3. In this task, physiological features (ECG, EDA, EMG, RESP) achieved the highest F1 scores, particularly for cognitive load classification. The confusion matrices, as mean and standard deviation across all test subjects, for each modality are displayed in Figure 7.
Overall, physiological signals provide the most informative features for both regression and classification tasks, outperforming EEG and voice-based features. EEG signals contain valuable information but are more susceptible to noise, resulting in slightly lower performance compared to physiological data. Voice-based features exhibit the lowest predictive power, suggesting that vocal markers alone may not be sufficient for accurately estimating stress and cognitive load.

6. Discussion

The baseline experiments conducted on the MultiPhysio-HRC dataset provide several important findings. First, the results show that physiological and behavioral signals collected across heterogeneous tasks carry information related to stress and cognitive load, even though the achieved accuracies are modest. The confusion patterns indicate that models can distinguish broad workload trends but tend to misclassify adjacent states, suggesting that workload varies along a continuum rather than forming sharply separable categories. This behavior is consistent with what is reported in existing literature on affective computing and workload estimation, where classification performance often decreases as tasks become more ecological and multimodal. When compared with findings from related datasets such as WESAD, DREAMER, StressID, and SenseCobot, our results follow similar trends. Studies based on controlled laboratory protocols or audiovisual elicitation typically report higher accuracies, largely due to low-motion conditions, stronger signal-to-noise ratios, and well-separated stimuli. In contrast, datasets collected in realistic human–robot collaboration or industrial environments show the same challenges observed here: increased variability in physiological signals, overlapping cognitive and physical demands, and reduced label separability. The fact that our baseline models behave similarly to those developed on other real-world datasets confirms that the collected signals reflect meaningful physiological responses and are aligned with what is expected in unconstrained settings. Regarding reliability and practical utility, the physiological data included in MultiPhysio-HRC remain informative despite noise introduced by movement and task complexity. ECG, EDA, respiration, and EEG patterns exhibit consistent trends associated with stress and mental effort across participants, demonstrating their potential for building practical monitoring systems. On the other side, the low performances of baseline, single- or few-modality models further stress the importance of highly multimodal datasets, to support the development of advanced machine learning methods that can operate in realistic industrial HRC environments. In particular, the results indicate that more advanced machine learning models, such as multimodal fusion techniques, could further enhance predictive performance as proven in [44]. From a practical standpoint, the dataset enables research into mental state-aware robot behaviors, adaptive task allocation, and safety-oriented monitoring.
Several limitations must be acknowledged. The number of participants, although substantial for a multimodal physiological study, limits generalization across populations. Subjective questionnaires, used as ground truth, provide validated but coarse labels and may not capture rapid fluctuations in mental state. Motion artifacts, particularly during manual and collaborative disassembly tasks, introduce additional noise that can reduce model performance. Moreover, only traditional machine learning approaches were tested; more sophisticated deep learning architectures may better exploit the dataset’s temporal and multimodal structure. Finally, although the dataset includes multiple scenarios, further expansion to additional industrial tasks and larger participant cohorts would improve robustness and generalization.
We summarize our main contributions as follows:
  • Real-World HRC Context: To the best of our knowledge, MultiPhysio-HRC is the first publicly available dataset to include realistic industrial-like HRC scenarios comprehensively.
  • Complete Multimodal Data: While existing datasets often include subsets of modalities, MultiPhysio-HRC integrates facial features, audio, and a comprehensive set of physiological signals: EEG, ECG, EDA, RESP, and EMG. This combination allows for a holistic assessment of mental states, addressing cognitive load, stress, and emotional dimensions.
  • Task Diversification: The dataset comprises tasks specifically designed to elicit various mental states. These include cognitive tests, immersive VR activities, and industrial tasks.
  • Rich Ground Truth Annotations: Ground truth labels were collected through validated psychological questionnaires at multiple stages during the experiment. Combined with multimodal measurements, these labels offer unparalleled granularity for studying human states in HRC contexts.
Overall, MultiPhysio-HRC represents a significant contribution to the field by providing a comprehensive multimodal dataset collected across both controlled and realistic HRC settings. It fills a critical gap identified in recent literature and provides a foundation for developing practical mental state estimation methods for human-aware robotic systems.

7. Conclusions

In this paper, we introduced MultiPhysio-HRC, a multimodal physiological signals dataset for industrial Human–Robot Collaboration (HRC). Our dataset provides a comprehensive collection of physiological signals (EEG, ECG, EDA, RESP, and EMG), facial features, and voice data, recorded in multiple scenarios, including real-world industrial-like settings. Through the diversity of the proposed exercises, we elicited diverse cognitive and emotional states, enabling a rich understanding of human psycho-physical responses. The baseline models applied to the dataset suggest that physiological signals contain valuable information for estimating cognitive load and stress levels. However, the results indicate that achieving high accuracy remains challenging, underscoring the need for advanced machine learning approaches and multimodal fusion techniques. By making MultiPhysio-HRC publicly available, we aim to accelerate research in affective computing and human-aware robotics, fostering safer and more human-centered industrial human–robot collaboration.
Future work will aim to expand the dataset with additional industrial human–robot collaboration scenarios, as well as to explore advanced multimodal learning techniques that combine physiological, facial, and audio signals. Incorporating continuous ground-truth measures and studying how mental state estimates can inform adaptive robot behaviors will further advance the practical applications of this dataset for human-aware HRC, such as the design of adaptive robot behaviors.

Author Contributions

A.B., S.B., O.A., A.V., L.M.G. and L.M. conceived the study and designed the experimental framework. A.B., S.B., O.A., and P.U. developed the methodology; A.B. and P.U. implemented the software and prepared the visualizations. A.B., S.B., O.A., P.U. and L.M. carried out the investigation and validation. A.V. and L.M. provided resources. A.B. wrote the original draft, and all the coauthors reviewed and edited the manuscript. S.B. and O.A. supervised the work, with O.A. also overseeing project administration; A.V., O.A., S.B. and L.M. secured funding. All authors have read and agreed to the published version of the manuscript.

Funding

The research in this paper has been partially funded by the Horizon Europe project Fluently (Grant ID: 101058680) and Eurostars project Singularity (Grant ID: 2309).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of the University of Applied Sciences and Arts of Southern Switzerland (SUPSI).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The MultiPhysio-HRC dataset presented in this study is publicly available at https://automation-robotics-machines.github.io/MultiPhysio-HRC.github.io/ (accessed on 2 December 2025). The dataset comprises multimodal physiological recordings, audio, and facial features, along with ground-truth annotations from validated psychological questionnaires, collected during real-world human–robot collaboration scenarios. Data are shared under the terms of the Creative Commons Attribution (CC BY) license.

Acknowledgments

We are grateful to all participants who volunteered their time and effort across the two-day acquisition protocol, making this dataset possible. We thank the administrative and technical staff at ARM-Lab (SUPSI) for their support with laboratory logistics, equipment setup, and safety procedures during the collaborative robot experiments. We also appreciate our colleagues who assisted with session scheduling, data collection, and questionnaire administration.

Conflicts of Interest

The authors Pablo Urcola and Luis Montesano are employees of Bitbrain, the company that manufactures the devices used to record physiological data. The investigation reported by the paper is part of an EU project where Bitbrain participates as a research partner. The focus of the research is not related to the devices, which have been in the market for a long time, but on the type of dataset and analysis. Research has followed the standard scientific methodologies to avoid any biases and has been directed by the authors from SUPSl.

Abbreviations

The following abbreviations are used in this manuscript:
ABAdaBoost
ASRAutomatic Speech Recognition
AUAction Unit
DFTDiscrete Fourier Transform
ECGElectrocardiography
EDA            Electrodermal Activity
EEGElectroencephalography
EMGElectromyography
GSRGalvanic Skin Response
HRCHuman–Robot Collaboration
MFCCMel-Frequency Cepstral Coefficients
NASA-TLXNASA Task Load Index
NARSNegative Attitudes Toward Robots Scale
PANASPositive and Negative Affect Schedule
PSDPower Spectral Density
RESPRespiration
RFRandom Forest
RMSERoot Mean Square Error
SAMSelf-Assessment Manikin
SENIAMSurface EMG for Non-Invasive Assessment of Muscles
SSSQShort Stress State Questionnaire
STAI-Y1State–Trait Anxiety Inventory, Form Y-1
SUPSIUniversity of Applied Sciences and Arts of Southern Switzerland
USIUniversità della Svizzera italiana
VADVoice Activity Detection
VRVirtual Reality
XGBXGBoost (Extreme Gradient Boosting)

Appendix A

The complete list of features extracted from the physiological signals used in this work: electrocardiography (ECG, summarized via HRV), electrodermal activity (EDA), surface electromyography (EMG), and respiration (RESP). Features are grouped by domain (time, frequency, nonlinearity/complexity) and summarized in Table A1. Unless stated otherwise in the main text, features were computed on fixed-length analysis windows and then aggregated across windows per sample.
Table A1. Physiological features computed from ECG/HRV, EDA, EMG, and RESP signals, grouped by domain (time, frequency, nonlinearity).
Table A1. Physiological features computed from ECG/HRV, EDA, EMG, and RESP signals, grouped by domain (time, frequency, nonlinearity).
SignalDomainFeatures
ECG (HRV)TimeMeanNN, SDNN, RMSSD, SDSD, CVNN, CVSD, MedianNN, MadNN, MCVNN, IQRNN, Prc20NN, Prc80NN, pNN50, pNN20, MinNN, MaxNN, HTI, TINN
FrequencyULF, VLF, LF, HF, VHF, LFHF, LFn, HFn, LnHF
NonlinearitySD1, SD2, SD1SD2, S, CSI, CVI, CSI_Modified, PIP, IALS, PSS, PAS, GI, SI, AI, PI, C1d, C1a, SD1d, SD1a, C2d, C2a, SD2d, SD2a, Cd, Ca, SDNNd, SDNNa, DFA_alpha1, MFDFA_alpha1_Width, MFDFA_alpha1_Peak, MFDFA_alpha1_Mean, MFDFA_alpha1_Max, MFDFA_alpha1_Delta, MFDFA_alpha1_Asymmetry, MFDFA_alpha1_Fluctuation, MFDFA_alpha1_Increment, ApEn, SampEn, ShanEn, FuzzyEn, MSEn, CMSEn, RCMSEn, CD, HFD, KFD, LZC
EDATimeMean, SD, Kurtosis, Skewness, Mean Derivative, Mean Negative Derivative, Activity, Mobility, Complexity, Peaks Count, Mean Peaks Amplitude, Mean Rise Time, Sum Peaks Amplitude, Sum of Rise Time, SMA
FrequencyEnergy, Spectral Power, Energy Wavelet lv1–lv4, total Energy Wavelet, Energy Distribution lv1–lv4, Mean MFCCs 1–20, SD MFCCs 1–20, Median MFCCs 1–20, Kurtosis MFCCs 1–20, Skewness MFCCs 1–20
NonlinearityApEn, SampEn, ShanEn, FuzzEn, MSE, CMSE, RCMSE, Entropy Wavelet lv1–lv4
EMGTimeRMSE, MAV, VAR
FrequencyEnergy, MNF, MDF, ZC, FR, DWT_MAV_1–4, DWT_STD_1–4
Nonlinearity
RESPTimeMean, Max, Min, RAV_Mean, RAV_SD, RAV_RMSSD, RAV_CVSD, Symmetry_PeakTrough_Mean, Median, Max, Min, Std, Symmetry_RiseDecay_Mean, Median, Max, Min, Std, RRV_RMSSD, RRV_MeanBB, RRV_SDBB, RRV_SDSD, RRV_CVBB, RRV_CVSD, RRV_MedianBB, RRV_MadBB, RRV_MCVBB
FrequencyRRV_VLF, RRV_LF, RRV_HF, RRV_LFHF, RRV_LFn, RRV_HFn
NonlinearityRRV_SD1, RRV_SD2, RRV_SD2SD1, RRV_ApEn, RRV_SampEn

References

  1. Lorenzini, M.; Lagomarsino, M.; Fortini, L.; Gholami, S.; Ajoudani, A. Ergonomic human-robot collaboration in industry: A review. Front. Robot. AI 2023, 9, 813907. [Google Scholar] [CrossRef]
  2. Lu, Y.; Zheng, H.; Chand, S.; Xia, W.; Liu, Z.; Xu, X.; Wang, L.; Qin, Z.; Bao, J. Outlook on human-centric manufacturing towards Industry 5.0. J. Manuf. Syst. 2022, 62, 612–627. [Google Scholar] [CrossRef]
  3. Valente, A.; Pavesi, G.; Zamboni, M.; Carpanzano, E. Deliberative robotics – a novel interactive control framework enhancing human-robot collaboration. CIRP Ann. 2022, 71, 21–24. [Google Scholar] [CrossRef]
  4. Spezialetti, M.; Placidi, G.; Rossi, S. Emotion Recognition for Human-Robot Interaction: Recent Advances and Future Perspectives. Front. Robot. AI 2020, 7, 532279. [Google Scholar] [CrossRef]
  5. Heinisch, J.S.; Kirchhoff, J.; Busch, P.; Wendt, J.; von Stryk, O.; David, K. Physiological data for affective computing in HRI with anthropomorphic service robots: The AFFECT-HRI data set. Sci. Data 2024, 11, 333. [Google Scholar] [CrossRef]
  6. Tamantini, C.; Laura Cristofanelli, M.; Fracasso, F.; Umbrico, A.; Cortellessa, G.; Orlandini, A.; Cordella, F. Physiological Sensor Technologies in Workload Estimation: A Review. IEEE Sens. J. 2025, 25, 34298–34310. [Google Scholar] [CrossRef]
  7. Healey, J.; Picard, R. Detecting Stress During Real-World Driving Tasks Using Physiological Sensors. IEEE Trans. Intell. Transp. Syst. 2005, 6, 156–166. [Google Scholar] [CrossRef]
  8. Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection. In Proceedings of the Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder, CO, USA, 2 October 2018; pp. 400–408. [Google Scholar] [CrossRef]
  9. Katsigiannis, S.; Ramzan, N. DREAMER: A Database for Emotion Recognition Through EEG and ECG Signals From Wireless Low-cost Off-the-Shelf Devices. IEEE J. Biomed. Health Inform. 2018, 22, 98–107. [Google Scholar] [CrossRef] [PubMed]
  10. Sarkar, P.; Posen, A.; Etemad, A. AVCAffe: A Large Scale Audio-Visual Dataset of Cognitive Load and Affect for Remote Work. Proc. AAAI Conf. Artif. Intell. 2023, 37, 76–85. [Google Scholar] [CrossRef]
  11. Chaptoukaev, H.; Strizhkova, V.; Panariello, M.; Dalpaos, B.; Reka, A.; Manera, V.; Thümmler, S.; Ismailova, E.; Nicholas, W.; Bremond, F.; et al. StressID: A Multimodal Dataset for Stress Identification. In Proceedings of the Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 29798–29811. [Google Scholar]
  12. SenseCobot. SenseCobot. 2023. Available online: https://zenodo.org/records/8363762 (accessed on 2 December 2025).
  13. Borghi, S.; Ruo, A.; Sabattini, L.; Peruzzini, M.; Villani, V. Assessing operator stress in collaborative robotics: A multimodal approach. Appl. Ergon. 2025, 123, 104418. [Google Scholar] [CrossRef]
  14. Bussolan, A.; Baraldo, S.; Gambardella, L.M.; Valente, A. Assessing the Impact of Human-Robot Collaboration on Stress Levels and Cognitive Load in Industrial Assembly Tasks. In Proceedings of the ISR Europe 2023, 56th International Symposium on Robotics, Stuttgart, Germany, 26–27 September 2023; pp. 78–85. [Google Scholar]
  15. Nenna, F.; Zanardi, D.; Orlando, E.M.; Nannetti, M.; Buodo, G.; Gamberini, L. Getting Closer to Real-world: Monitoring Humans Working with Collaborative Industrial Robots. In Proceedings of the Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, New York, NY, USA, 11 March 2024; HRI ’24. pp. 789–793. [Google Scholar] [CrossRef]
  16. Scarpina, F.; Tagini, S. The Stroop Color and Word Test. Front. Psychol. 2017, 8, 557. [Google Scholar] [CrossRef] [PubMed]
  17. Meule, A. Reporting and Interpreting Working Memory Performance in n-back Tasks. Front. Psychol. 2017, 8, 352. [Google Scholar] [CrossRef]
  18. Schmidtke, K. Tower of Hanoi Problem. In The Corsini Encyclopedia of Psychology; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2010; pp. 1–2. [Google Scholar] [CrossRef]
  19. Avram, O.; Fasana, C.; Baraldo, S.; Valente, A. Advancing Human-Robot Collaboration by Robust Speech Recognition in Smart Manufacturing. In Proceedings of the European Robotics Forum 2024; Secchi, C., Marconi, L., Eds.; Springer: Cham, Switzerland, 2024; pp. 168–173. [Google Scholar]
  20. Bansod, Y.; Patra, S.; Nau, D.; Roberts, M. HTN Replanning from the Middle. Int. FLAIRS Conf. Proc. 2022, 35. [Google Scholar] [CrossRef]
  21. Coleman, D.; Sucan, I.; Chitta, S.; Correll, N. Reducing the barrier to entry of complex robotic software: A moveit! case study. arXiv 2014, arXiv:1404.3785. [Google Scholar] [CrossRef]
  22. Spielberger, C.; Gorsuch, R. Manual for the State-Trait Anxiety Inventory (form Y) (“Self-Evaluation Questionnaire”); Consulting Psychologists Press: Palo Alto, CA, USA, 1983. [Google Scholar]
  23. Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. [Google Scholar] [CrossRef]
  24. Bradley, M.M.; Lang, P.J. Measuring emotion: The self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 1994, 25, 49–59. [Google Scholar] [CrossRef]
  25. Nomura, T.; Kanda, T.; Suzuki, T.; Kato, K. Psychology in Human-Robot Communication: An Attempt through Investigation of Negative Attitudes and Anxiety toward Robots. In Proceedings of the RO-MAN 2004, 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759), Kurashiki, Okayama, Japan, 22–22 September 2004; pp. 35–40. [Google Scholar] [CrossRef]
  26. Loizaga, E.; Bastida, L.; Sillaurren, S.; Moya, A.; Toledo, N. Modelling and Measuring Trust in Human–Robot Collaboration. Appl. Sci. 2024, 14, 1919. [Google Scholar] [CrossRef]
  27. Stegeman, D.F.; Hermens, H.J. Standards for Surface Electromyography: The European Project “Surface EMG for Non-Invasive Assessment of Muscles (SENIAM)”. Available online: https://www.researchgate.net/publication/398119434_Preparatory_use_of_neurodynamics_to_enhance_upper_limb_function_in_patients_with_acquired_brain_injury_a_randomized_controlled_trial (accessed on 2 December 2025).
  28. Greco, A.; Valenza, G.; Lanata, A.; Scilingo, E.; Citi, L. cvxEDA: A Convex Optimization Approach to Electrodermal Activity Processing. IEEE Trans. Biomed. Eng. 2016, 63, 797–804. [Google Scholar] [CrossRef]
  29. Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.H.A. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 2021, 53, 1689–1696. [Google Scholar] [CrossRef]
  30. Pham, T.; Lau, Z.J.; Chen, S.H.A.; Makowski, D. Heart Rate Variability in Psychology: A Review of HRV Indices and an Analysis Tutorial. Sensors 2021, 21, 3998. [Google Scholar] [CrossRef]
  31. Orguc, S.; Khurana, H.S.; Stankovic, K.M.; Leel, H.; Chandrakasan, A. EMG-based Real Time Facial Gesture Recognition for Stress Monitoring. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 2651–2654. [Google Scholar] [CrossRef]
  32. Shukla, J.; Barreda-Angeles, M.; Oliver, J.; Nandi, G.C.; Puig, D. Feature Extraction and Selection for Emotion Recognition from Electrodermal Activity. IEEE Trans. Affect. Comput. 2021, 12, 857–869. [Google Scholar] [CrossRef]
  33. Raufi, B.; Longo, L. An Evaluation of the EEG Alpha-to-Theta and Theta-to-Alpha Band Ratios as Indexes of Mental Workload. Front. Neuroinformatics 2022, 16, 861967. [Google Scholar] [CrossRef]
  34. Welch, P. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 1967, 15, 70–73. [Google Scholar] [CrossRef]
  35. Cheong, J.H.; Jolly, E.; Xie, T.; Byrne, S.; Kenney, M.; Chang, L.J. Py-Feat: Python Facial Expression Analysis Toolbox. Affect. Sci. 2023, 4, 781–796. [Google Scholar] [CrossRef] [PubMed]
  36. Team, S. Silero VAD: Pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. 2021. Available online: https://github.com/snakers4/silero-vad (accessed on 2 December 2025).
  37. Tomba, K.; Dumoulin, J.; Mugellini, E.; Abou Khaled, O.; Hawila, S. Stress Detection Through Speech Analysis. In Proceedings of the 15th International Joint Conference on e-Business and Telecommunications, Porto, Portugal, 26–28 July 2018; pp. 394–398. [Google Scholar] [CrossRef]
  38. Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust speech recognition via large-scale weak supervision. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 28492–28518. [Google Scholar]
  39. Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Hong Kong, China, 3–7 November 2019. [Google Scholar]
  40. Procopio, N. sentence-bert-base-italian-xxl-cased. Available online: https://huggingface.co/nickprock/sentence-bert-base-italian-xxl-uncased (accessed on 2 December 2025).
  41. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  42. Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  43. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; KDD ‘16’. pp. 785–794. [Google Scholar] [CrossRef]
  44. Bussolan, A.; Baraldo, S.; Gambardella, L.M.; Valente, A. Multimodal fusion stress detector for enhanced human-robot collaboration in industrial assembly tasks. In Proceedings of the 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), Pasadena, CA, USA, 26–30 August 2024; pp. 978–984. [Google Scholar] [CrossRef]
Figure 1. Data acquisition protocol.
Figure 1. Data acquisition protocol.
Robotics 14 00184 g001
Figure 2. Displayed screen of each cognitive task: SCWT (top left), N-Back (bottom left), Arithmetic (top right), and Hanoi tower (bottom right).
Figure 2. Displayed screen of each cognitive task: SCWT (top left), N-Back (bottom left), Arithmetic (top right), and Hanoi tower (bottom right).
Robotics 14 00184 g002
Figure 3. Overview of the battery module disassembly process. (1) The module housing is opened, and pins are inserted. (2) The cobot detaches the upper enclosure from the module, while the human pushes the pins. (3) Screws securing the electronics cover are loosened. (4) The electronics cover and wiring connections are separated. (5) Remaining fasteners holding the cell pack are unscrewed. (6) Plates and connectors are removed to expose the cell array. (7) The cell holders are released. (8) Cells are then extracted using robotic handling for safe removal and further processing.
Figure 3. Overview of the battery module disassembly process. (1) The module housing is opened, and pins are inserted. (2) The cobot detaches the upper enclosure from the module, while the human pushes the pins. (3) Screws securing the electronics cover are loosened. (4) The electronics cover and wiring connections are separated. (5) Remaining fasteners holding the cell pack are unscrewed. (6) Plates and connectors are removed to expose the cell array. (7) The cell holders are released. (8) Cells are then extracted using robotic handling for safe removal and further processing.
Robotics 14 00184 g003
Figure 4. Experimental robotic cell setup. The multiple components of the disassembled battery can be seen placed on the table.
Figure 4. Experimental robotic cell setup. The multiple components of the disassembled battery can be seen placed on the table.
Robotics 14 00184 g004
Figure 5. Sample of the acquired physiological data. The participant signals are filtered and normalized (min–max).
Figure 5. Sample of the acquired physiological data. The participant signals are filtered and normalized (min–max).
Robotics 14 00184 g005
Figure 6. The radar chart displays the mean values of various ground truth metrics (STAI, NASA, Valence, Arousal, and Dominance) across different experimental conditions. The values are normalized (min–max) by subject.
Figure 6. The radar chart displays the mean values of various ground truth metrics (STAI, NASA, Valence, Arousal, and Dominance) across different experimental conditions. The values are normalized (min–max) by subject.
Robotics 14 00184 g006
Figure 7. Confusion matrices for the best model for each modality for each class. The left column reports results for the STAI class, while the right column for the NASA class. Mean and standard deviation for each test subject are reported. (a) Confusion matrix obtained with XGB for Stress classification and using physiological features. (b) Confusion matrix obtained with RF for Cognitive Load classification and using physiological features. (c) Confusion matrix obtained with RF for Stress classification and using EEG features. (d) Confusion matrix obtained with XGB for Cognitive Load classification, using EEG features. (e) Confusion matrix obtained with XGB for Stress classification, using speech features. (f) Confusion matrix obtained with RF for Cognitive Load classification, using speech features.
Figure 7. Confusion matrices for the best model for each modality for each class. The left column reports results for the STAI class, while the right column for the NASA class. Mean and standard deviation for each test subject are reported. (a) Confusion matrix obtained with XGB for Stress classification and using physiological features. (b) Confusion matrix obtained with RF for Cognitive Load classification and using physiological features. (c) Confusion matrix obtained with RF for Stress classification and using EEG features. (d) Confusion matrix obtained with XGB for Cognitive Load classification, using EEG features. (e) Confusion matrix obtained with XGB for Stress classification, using speech features. (f) Confusion matrix obtained with RF for Cognitive Load classification, using speech features.
Robotics 14 00184 g007aRobotics 14 00184 g007b
Table 1. Summary of publicly available multimodal datasets for affective computing, stress, and workload estimation.
Table 1. Summary of publicly available multimodal datasets for affective computing, stress, and workload estimation.
DatasetScenario/ContextModalitiesTarget ConstructsGround Truth# Part.Limitations w.r.t. HRC
Healey (2005) [7]Real-world drivingECG, EDA, EMG, RESPDriver stressRoad segment stress labels24Driving only; no collaborative or industrial tasks
WESAD [8]Three different affective states (TSST, amusement, neutral)ECG, EDA, EMG, RESP, temperature, accelerationStress and affectPANAS, SAM, STAI, SSSQ15Wearable setting; no physical collaboration or robot interaction
DREAMER [9]Audiovisual emotion elicitation in labEEG, ECGValence, arousal, dominanceSAM23No workload or HRC tasks; short passive stimuli only
AVCAffe [10]Remote collaborative work via video conferencingFacial video, audio, physiological signalsCognitive load, affectCognitive load and V/A annotations106Remote setting; no physical tasks or shared workspace with robots
StressID [11]Breathing, emotional video clips, cognitive and speech tasksECG, EDA, RESP, facial video, audioStress and affectNASA–TLX, SAM65Broad lab tasks, but no human–robot collaboration
SenseCobot
[12,13]
Collaborative robot programming in a simulated industrial cellEEG, ECG, GSR, facial expressionsStressNASA–TLX21Focus on programming; no physical collaboration
MultiPhysio-HRC (ours)Manual and robot-assisted battery disassembly, cognitive tests, immersive VR tasksEEG, ECG, EDA, EMG, RESP, facial AUs, audio featuresStress, cognitive load, affect, attitudes toward robotsSTAI-Y1, NASA–TLX, SAM, NARS36First dataset combining rich multimodal physiology with realistic industrial HRC and disassembly workflows
Table 2. Results from the regression of the STAI-Y1 and NASA-TLX scores using baseline models (RF: RandomForest, AB: AdaBoost, XGB: XGBoost). Best performances are reported in bold.
Table 2. Results from the regression of the STAI-Y1 and NASA-TLX scores using baseline models (RF: RandomForest, AB: AdaBoost, XGB: XGBoost). Best performances are reported in bold.
ResponseSignalModelRMSE
STAI-Y1
μ = 31.86 ,
m a x = 55.00
Physio
n = 250
RF 0.20 ± 0.09
AB 0.20 ± 0.09
XGB 0.23 ± 0.09
EEG
n = 88
RF 0.32 ± 0.08
AB 0.30 ± 0.08
XGB 0.32 ± 0.08
Voice
n = 439
RF 0.32 ± 0.08
AB 0.33 ± 0.08
XGB 0.34 ± 0.07
NASA-TLX
μ = 39.56 ,
m a x = 91.11
Physio
n = 250
RF 0.19 ± 0.08
AB 0.19 ± 0.09
XGB 0.20 ± 0.09
EEG
n = 88
RF 0.31 ± 0.08
AB 0.29 ± 0.08
XGB 0.32 ± 0.08
Voice
n = 439
RF 0.32 ± 0.08
AB 0.32 ± 0.08
XGB 0.33 ± 0.08
Table 3. Results from the classification of the 3 stress classes and of the 3 cognitive load classes using baseline models (RF: RandomForest, AB: AdaBoost, XGB: XGBoost). Best performances are reported in bold.
Table 3. Results from the classification of the 3 stress classes and of the 3 cognitive load classes using baseline models (RF: RandomForest, AB: AdaBoost, XGB: XGBoost). Best performances are reported in bold.
ResponseSignalModelF1-Score
Stress ClassPhysio
n = 250
RF 0.37 ± 0.15
AB 0.38 ± 0.12
XGB 0.39 ± 0.14
EEG
n = 88
RF 0.37 ± 0.12
AB 0.34 ± 0.16
XGB 0.37 ± 0.11
Voice
n = 439
RF 0.35 ± 0.15
AB 0.34 ± 0.12
XGB 0.36 ± 0.12
Cognitive Load ClassPhysio
n = 250
RF 0.49 ± 0.16
AB 0.48 ± 0.46
XGB 0.47 ± 0.15
EEG
n = 88
RF 0.35 ± 0.09
AB 0.33 ± 0.14
XGB 0.35 ± 0.10
Voice
n = 439
RF 0.41 ± 0.15
AB 0.37 ± 0.12
XGB 0.38 ± 0.15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bussolan, A.; Baraldo, S.; Avram, O.; Urcola, P.; Montesano, L.; Gambardella, L.M.; Valente, A. MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration. Robotics 2025, 14, 184. https://doi.org/10.3390/robotics14120184

AMA Style

Bussolan A, Baraldo S, Avram O, Urcola P, Montesano L, Gambardella LM, Valente A. MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration. Robotics. 2025; 14(12):184. https://doi.org/10.3390/robotics14120184

Chicago/Turabian Style

Bussolan, Andrea, Stefano Baraldo, Oliver Avram, Pablo Urcola, Luis Montesano, Luca Maria Gambardella, and Anna Valente. 2025. "MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration" Robotics 14, no. 12: 184. https://doi.org/10.3390/robotics14120184

APA Style

Bussolan, A., Baraldo, S., Avram, O., Urcola, P., Montesano, L., Gambardella, L. M., & Valente, A. (2025). MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration. Robotics, 14(12), 184. https://doi.org/10.3390/robotics14120184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop