Forecasting Vital Signs in Human–Robot Collaboration Using Sequence-to-Sequence Models with Bidirectional LSTM: A Comparative Analysis of Uni-and Multi-Variate Approaches †

: Our research investigates an approach to forecasting human vital signs by formulating the problem as a sequence-to-sequence (seq2seq) task, utilizing bidirectional long short-term memory models (BiLSTM). The study aims to compare the forecasting accuracy of uni-and multivariate modeling strategies over different forecasting horizons ranging from 1 s to 10 s. The dataset comprises sensor data collected during a lab study in which thirteen participants engaged in a collaborative assembly scenario with a robot. Our results show that univariate models outperform multivariate ones in terms of forecasting accuracy, offering valuable insights into accurate forecasting of human physiological parameters, with potential implications for human-robot collaboration, personalized medical monitoring, and healthcare applications.


Introduction
In the dynamic realm of human-robot collaboration (HRC), a significant challenge lies in equipping robotic systems with the ability to seamlessly adapt to users' internal states, such as stress or relaxation.Ongoing research in this field has shown that stress can be indirectly assessed through the integration of diverse sensors that monitor various physiological indicators, including electrocardiograph (ECG), pupil dilation (PD), electromyograph (EMG), electroencephalograph (EEG), heart rate variation (HRV), skin temperature, respiratory rate and electrodermal activity (EDA), or galvanic skin response (GSR) [1][2][3][4].Machine learning classification techniques have made noteworthy advancements in stress detection [5][6][7][8].In diverse environments, such as academic, driving, or office-like settings, accuracy rates exceeding 90% have been achieved [5].By going beyond simply recognizing emotions in real-time, to anticipatory modeling, robotic systems can adjust their behavior proactively, leading to more natural, productive collaborations.However, despite the promising developments in stress detection, the exploration of forecasting future states remains limited.Some research has been conducted on forecasting vital signs in intensive care patients [9], postoperative complications [10], or in health monitoring [11].In [11], the authors compared different models, evaluating their accuracy in univariate forecasts of pulse, oxygen level percentage (SpO2), and blood pressure.Notably, deep learning models such as long short-term memory (LSTM) and gated recurrent unit (GRU) outperformed classical forecasting strategies like autoregressive (AR) and autoregressive integrated moving average (ARIMA) models, with GRUs performing the best.Earlier work also revealed in different use cases that Bidirectional Long Short-Term Memory (BiLSTM) models lead to a significant improvement in average time series prediction accuracy of 37.78% [12] compared to classical LSTMs.It was observed that training the bidirectional variant was slower, suggesting that it extracts unique features inaccessible to other models [13].In the field of mental state and vital sign forecasting the performance of BiLSTMs is unknown.Given the current state of the research, an intriguing avenue for further investigation pertains to the exploration of the intricate interplay between diverse sensor modalities, which may hold the potential to enhance vital sign forecasting.Specifically, there is an opportunity to explore whether the simultaneous utilization of multiple modalities in a multivariate forecasting framework can yield improved forecasting accuracies by leveraging information that remains latent in univariate models.This study significantly contributes by highlighting the impacts of multivariate forecasting strategies versus univariate approaches.It also provides insights into vital sign forecasting, particularly through the integration of BiLSTMs with collaborative robotics, thus advancing the existing knowledge in this field.

Dataset
The dataset used in this study consists of vital signs from 13 subjects recorded in the context of a collaborative assembly.In this assembly, a human worker collaboratively assembles a component with a collaborative robot (cobot).To capture the influence of the cobot on the human's vital signs, six different scenarios, differing in various factors such as the degree of collaboration or the working speed of the robot, were executed.Between every configuration, the recording was stopped.As a result, each of the 13 subjects contributes 6 individual sequences, each lasting approximately 2 min, culminating in a total of 76 sequences.The utilized sensor modalities are the Interbeat Intervals of the heart (IBI) measured via ECG and the EDA of the skin, both using the BITalino (r)evolution Plugged Kit BLE/BT (PLUX Wireless Biosignals, Portugal) as well as the Pupil Dilation (PD), measured with pupil core eye tracking glasses (Pupil Labs, Berlin, Germany).

Bidirectional Long Short-Term Memory Model
Bidirectional long short-term memory networks (BiLSTM) are a type of recurrent neural network (RNN) architecture used in natural language processing and sequential data tasks, like time series data.Introduced to overcome the limitations of regular RNN they enhance traditional LSTMs by processing input data in both forward and backward directions, capturing context from both past and future [14].BiLSTMs were introduced to address the vanishing gradient problem and improve the modeling of long-range dependencies in sequential data.

Preprocessing
The data preprocessing involved three steps.First, each modality was handled independently.For IBI, no direct measures were needed.For PD, blink removal was essential using the procedure outlined in [15,16].EDA-Signal involves extracting the skin conductance response, as described in [17].In the second step, all modalities underwent uniform processing, which included resampling, smoothing, and data normalization to enhance quality and ensure consistency.In the final phase, individual modalities were synchronized to create a multivariate dataset.Extensive feature engineering was then performed on this dataset, yielding both static features (e.g., means, minimums, and maximums of time series) and dynamic features (e.g., moving averages and lag features).

Stationarity
Stationarity signifies that statistical parameters such as the mean and variance exhibit relative constancy throughout the observed time span [18].This property holds significant importance, particularly in forecasting applications.To assess stationarity, we employed the augmented Dickey-Fuller test (ADF-Test), which is one of the most commonly used measures of stationarity [19][20][21].To induce stationarity a differentiation procedure was implemented, resulting in stationarity in 99% of all sequences.

Sequence-to-Sequence Modeling
In the context of time series forecasting, sequence-to-sequence modeling is a technique wherein a learner maps a sequence of past values to a sequence of future values [22].To adapt the dataset into a format suitable for input and output sequences, we employed the sliding window method presented in [23].Three variations of each dataset were created for one-second, five-second, and ten-second forecasting horizons, with consistent look-back window lengths.

Measures of Evaluation
To assess the forecasting accuracy of the models, we employ the Symmetric Mean Absolute Percentage Error (sMAPE).The formula for calculating sMAPE is presented below [24].
To establish a baseline for assessing the model's performance and to ensure the robustness of our results, we employ a simple benchmark known as the Naïve Forecast as recommended by [25].In this approach, the prediction for the next time step is generated by using the value from the previous time step, which makes it simple to calculate but nonetheless an effective benchmark method.This basic forecasting method is mathematically represented by Equation ( 2) [26].ŷt+k = y t (2)

Univariate Forecast
Table 1 illustrates the superior performance of the BiLSTM model compared to the baseline across all forecasting horizons for univariate IBI.Particularly noteworthy is the fact that when extending the forecasting horizon from 1 to 5 s, a marked increase in sMAPE is observed, amounting to 13.91% for univariate IBI and 3.32% for univariate PD.In contrast, extending the forecasting horizon from 5 to 10 s only results in an increase of 1.35% for univariate IBI and 0.27% for univariate PD.

Multivariate Forecast
Table 3 compares forecasting accuracy for univariate and multivariate models across different horizons.The multivariate approach consistently yields slightly higher sMAPE, outperforming univariate IBI by just 0.24% at the 5-s horizon.

Discussion
The results presented in this work reveal a substantial disparity in performance between univariate and multivariate models.Despite the potential for multivariate models to leverage relationships among individual parameters, generated features, and additional skin conductance data, the incorporation of this supplementary input does not yield an improvement in forecasting accuracy.Several possible explanations for this phenomenon can be considered.Firstly, no meaningful relationships may exist among the various parameters under investigation.This lack of inherent correlations may limit the capacity of multivariate models to extract valuable predictive insights, rendering the inclusion of additional input variables ineffective.Secondly, the quality of the supplementary skin conductance data may be a contributing factor.It is conceivable that these data introduce noise into the prediction process, thereby diminishing overall accuracy.Further investigation into the reliability and relevance of the additional data may clarify its impact on model performance.Thirdly, the selected features for the multivariate models may either have no significant influence on the prediction accuracy or, in some cases, exert a detrimental effect.The inclusion of irrelevant or potentially confounding features can hinder the model's ability to discern meaningful patterns in the data, leading to suboptimal forecasting outcomes.These findings underscore the importance of a thorough understanding of the underlying relationships within the data and the potential consequences of incorporating additional variables.

Conclusions
The research findings presented in this study shed light on the predictive performance of univariate versus multivariate deep learning models in the context of forecasting vital signs.Notably, the univariate prediction of IBI and pupil diameter yields superior results when compared to the multivariate approach, which incorporates additional variables such as skin conductance and generated features.This suggests that the univariate models excel in capturing the intricate patterns and relationships within these physiological signals.Interestingly, as the forecasting horizon increases from one to five seconds, a significant decrease in accuracy is observed.However, this decline in accuracy remains relatively stable when extending the forecasting horizon from five to ten seconds.These findings have important implications for predictive modeling in physiological signal analysis where high precision is required, such as assessing cognitive load or attention levels.The observed stability in forecasting accuracy for longer horizons indicates that the univariate approach may offer a reliable foundation for longer-term physiological forecasting tasks.This work contributes valuable insights into the selection of modeling approaches for vital sign forecasting, underscoring the significance of considering the specific predictive goals and horizons in such applications.Future research in this domain should explore alternative feature engineering strategies, data preprocessing techniques, and model architectures to unlock the latent predictive potential of multivariate approaches.Overall, this study contributes valuable insights into the complexities of multivariate modeling in physiological signal analysis and paves the way for further advancements in this field.
Funding: This research was funded by the Ministry of Innovation, Science and Research of North Rhine-Westphalia Germany.
Institutional Review Board Statement: Institutional Review Board approval was not sought for this study as the data utilized in our research originated from a pre-existing dataset.Furthermore, all data employed in our analysis had been anonymized to protect the privacy and confidentiality of the individuals involved.This rigorous anonymization process ensured that no personally identifiable information was accessible or discernible in the dataset, thereby mitigating any ethical concerns associated with the use of human subjects' data.Consequently, the research conducted in this study adheres to established ethical guidelines and does not require additional IRB approval.
Informed Consent Statement: Informed consent was obtained from all the subjects involved in the study.

Table 1 .
sMAPE of the univariate forecast of the interbeat intervals.

Table 2
displays results for univariate PD forecasting.The Naïve Method consistently shows higher prediction errors than the BiLSTM model across all horizons.

Table 2 .
sMAPE of the univariate forecast of the pupil dilation.

Table 3 .
sMAPE of the multivariate compared to the univariate forecasts.