Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model

Hag, Ala; Asadi, Houshyar; Qazani, Mohammad Reza Chalak; Hoang, Thuong; Kulkarni, Ambarish; Greuter, Stefan; Nahavandi, Saeid

doi:10.3390/computers15030193

Open AccessArticle

Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model

by

Ala Hag

^1,*,

Houshyar Asadi

^1,*,

Mohammad Reza Chalak Qazani

^2,3,*

,

Thuong Hoang

⁴

,

Ambarish Kulkarni

⁵

,

Stefan Greuter

⁶

and

Saeid Nahavandi

⁵

¹

Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, VIC 3216, Australia

²

College of Science and Engineering, James Cook University, Townsville, QLD 4811, Australia

³

Faculty of Computing and Information Technology (FCIT), Sohar University, Sohar 311, Oman

⁴

School of Communication and Creative Arts, Deakin University, Burwood, VIC 3125, Australia

⁵

School of Engineering, Swinburne University of Technology, Melbourne, VIC 3122, Australia

⁶

School of Information Technology, Deakin University, Geelong, VIC 3216, Australia

^*

Authors to whom correspondence should be addressed.

Computers 2026, 15(3), 193; https://doi.org/10.3390/computers15030193

Submission received: 2 February 2026 / Revised: 9 March 2026 / Accepted: 11 March 2026 / Published: 17 March 2026

(This article belongs to the Special Issue Innovative Research in Human–Computer Interactions)

Download

Browse Figures

Versions Notes

Abstract

Motion sickness (MS) is a prevalent condition that can significantly degrade user comfort and immersion, particularly in virtual reality (VR) environments. Accurate prediction models are essential for early detection and mitigation of MS symptoms, thereby improving the overall VR experience. Most existing approaches rely on bio-physiological data acquired through body-mounted sensors, which may restrict user mobility and diminish immersion. This study proposes a less intrusive alternative, leveraging head and torso kinematic data for MS prediction. We introduce a hybrid Convolutional–Recurrent Neural Network (C-RNN) designed to capture both spatial and temporal features for enhanced classification accuracy. Using a dataset of 40 participants, the proposed C-RNN outperformed traditional machine learning models—including Support Vector Machines (SVMs), k-Nearest Neighbors (KNN), Decision Trees (DT), and a baseline Recurrent Neural Network (RNN)—across multiple evaluation metrics. The C-RNN achieved 85.63% accuracy, surpassing SVM (60%), KNN (73.75%), DT (74.38%), and RNN (81.88%), with corresponding gains in precision, recall, F1-score, and ROC AUC. These results demonstrate that head–torso motion patterns provide sufficient predictive signal for accurate MS detection, offering a non-intrusive, efficient alternative to physiological sensing that supports improved comfort and sustained immersion in VR.

Keywords:

motion sickness; prediction; virtual reality; cybersickness; head and torso movements

1. Introduction

Motion sickness (MS) is a prevalent condition that arises when sensory inputs conflict with prior perceptual expectations [1,2]. Developments in transportation, automotive technology, and immersive entertainment have broadened the range of situations that can induce MS, including tilting vehicles, amusement rides, space travel, and virtual reality (VR) environments [3]. Common symptoms include dizziness, nausea, vomiting, drowsiness, stomach discomfort, and general malaise [1,4,5].

MS may be triggered by physical motion or by dynamic visual stimuli, such as simulators, video games, and virtual environments. Visually induced motion sickness (VIMS), which stems from illusory self-motion, has been extensively studied [2,6,7,8]. In VR, the term cybersickness describes a similar phenomenon exacerbated by the high level of immersion and sensory conflict generated by head-mounted displays (HMDs) [9,10,11].

Recent advances in computer-generated environments have accelerated the adoption of virtual reality (VR) across industrial, commercial, public, and research sectors [12,13,14]. VR applications now extend to healthcare, military training, human–computer interaction, behavioral research, and architectural design [15,16]. As these applications in interactive environments grow in complexity and duration, data-driven and machine learning approaches have been increasingly adopted to monitor user state, interaction quality, and discomfort in immersive environments [17,18,19].

In parallel, the digitization of educational and research infrastructures has led to the integration of VR into centralized and remotely supervised platforms, such as virtual laboratories and industrial training systems. These implementations facilitate scalable, data-driven learning while supporting sustainable educational models, particularly in engineering and laboratory-based disciplines. Within such environments, real-time user monitoring and adaptive feedback mechanisms are essential to maintain both learning effectiveness and user safety [20].

The most widely accepted explanatory framework for MS is the Sensory Conflict Theory, which attributes symptoms to discrepancies between visual, vestibular, and proprioceptive signals [10]. When these sensory channels provide inconsistent information—such as during VR immersion or vehicular travel—the brain interprets the mismatch as motion, which can provoke sickness. In VR, additional factors such as stereoscopic rendering, vergence–accommodation conflicts, and head-tracking inaccuracies can intensify symptoms like nausea and disorientation [21,22]. Sensory conflict theory has therefore informed numerous strategies aimed at reducing MS by minimizing such mismatches.

Efforts to predict MS have relied on both subjective and objective methods. Subjective tools, such as the Simulator Sickness Questionnaire (SSQ) [23], the Motion Sickness Questionnaire (MSQ) [3], and the Fast Motion Sickness Scale (FMS) [5], are widely used but are limited by their retrospective nature and inability to capture time-resolved triggers [24]. Objective measures—including heart rate variability, skin conductance, body temperature, and electrogastrography—reduce subjective bias and have shown measurable correlations with MS severity [25,26,27]. However, such physiological measurements typically require body-mounted sensors, which may restrict mobility and reduce immersion.

Kinematic indicators such as head and body movements offer a promising, non-intrusive alternative. Prior research has demonstrated significant associations between movement patterns and MS symptoms in contexts including driving, flying, and VR [28,29,30,31]. Recent vision-based driver monitoring studies have shown that head position and motion patterns can be reliably detected using deep learning architectures such as capsule networks, even under challenging real-world driving conditions, supporting the feasibility of motion-based monitoring without intrusive sensing [32]. Reduced head movement has been linked to lower MS incidence in simulators [33], while experienced drivers tend to exhibit more stable body movements and reduced symptoms [31]. Large, time-varying discrepancies between virtual and physical head orientations (DVP) have also been identified as strong cybersickness triggers [22], with electroencephalography (EEG) studies further confirming the neural correlates of symptom severity [21,34]. Recent studies have explored hybrid AI models for quantifying cybersickness, demonstrating improved performance over single-model approaches [35,36,37,38]. For example, Hag et al. [39] introduced a multimodal 1CNN–GRU–Attention architecture trained on galvanic skin response (GSR) data, showing that non-sequential fusion of spatial and temporal representations with attention mechanisms significantly enhances classification robustness under imbalanced conditions. Similarly, recent attention-based deep learning models incorporating head and eye movement dynamics have demonstrated improved simulator sickness quantification through adaptive feature weighting strategies. Despite these advances, many existing works still rely on sequential architectures such as CNN–LSTM, where convolutional features are passed into recurrent units in a chained manner, potentially leading to partial loss of spatial–temporal information. Despite these advances, there is a lack of work focusing solely on head–torso kinematics for MS prediction using deep learning, without reliance on intrusive physiological sensors. This study addresses that gap by introducing a hybrid Convolutional–Recurrent Neural Network (C–RNN) that combines 1D-CNN layers for extracting discriminative spatial features with recurrent layers for modeling temporal dependencies. This design is justified because CNNs can automatically capture short-term oscillatory patterns in motion data that may signal the onset of discomfort, while RNNs preserve temporal context, enabling more accurate classification over time. Our novelty lies in demonstrating that such a model can achieve high accuracy using only non-intrusive, low-cost motion data, offering a practical alternative to multimodal physiological approaches.

We evaluate our approach on a publicly available dataset of 40 participants performing a VR driving task, applying identical preprocessing to all baseline models to ensure fair comparisons. Results show that the proposed C–RNN outperforms traditional ML classifiers (SVM, KNN, DT) and a baseline RNN in accuracy, precision, recall, F1-score, and ROC AUC and performs competitively against more complex deep learning methods reported in related work.

The remainder of this paper is organized as follows: Section 2 details the materials and methods, Section 4 presents the results, and Section 5 discusses the findings and their implications.

2. Materials and Methods

This study employed a publicly available dataset originally collected by Chang et al. [30,31], which investigates the influence of physical driving experience on body movements and MS during virtual driving. The dataset includes data from 40 participants, evenly split by gender (20 men and 20 women), with a mean age of 24.08 ± 2.86 years. Participants were classified into two groups: the driver group, consisting of individuals who held a valid driver’s license and drove at least once per week during the two months preceding the study, and the non-driver group, consisting of individuals with no driving experience. All participants had normal vision and no history of illness or vestibular system disorders.

In this work, our contribution lies in the development of a novel C-RNN-based predictive model for MS detection, incorporating systematic pre-processing of motion signals, model training and evaluation, and a comparative performance analysis against established machine learning baselines.

The experiment was conducted using an Xbox system with a standard gaming unit and gamepad, while head and torso movements were recorded using a magnetic tracking device (Flock of Birds, Ascension Technologies, Inc., Burlington, VT, USA). These sensors recorded position data with six degrees of freedom (DOF) at 60 Hz from each receiver and saved it for subsequent analysis.

To assess the occurrence of MS, participants responded to a forced-choice yes/no question using a modified version of the Simulator Sickness Questionnaire (SSQ).

Participants engaged in the VR experience using Forza Motorsport 3 on the Xbox, where they were asked to drive a Ford/R3 714 vehicle along the 6.95 km Extreme Circuit, known as the ‘Camino Viejo de Montserrat.’ The game session could last up to 40 min, and participants had the option to play continuously or stop whenever they chose. During driving, head and torso movement data were recorded. Also, a self-report questionnaire of Pre- and post-SSQ was collected to derive an overall Severity Score for MS.

The dataset is structured into 40 text files, with each file belonging to a single subject. Each file contains data from three receivers, where each sample contains six degrees of freedom (X, Y, Z, A, E, R) data of head and torso movements over a minimum duration of 5 min.

In addition to the motion data, an SSQ raw data file was provided, which includes evaluation data from each subject, such as gender, condition (sick/well), pre-SSQ, and post-SSQ responses. The SSQ scores are used for data annotation to correlate motion data with the SSQ scores. A detailed summary of the dataset structure is shown in Table 1.

3. Methodology

The proposed model combines two powerful deep learning techniques: Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), forming a unique Convolutional Recurrent Neural Network (C-RNN) designed for MS prediction. The CNN focuses on identifying spatial patterns in movement data, while the RNN efficiently handles the time-series features, capturing the flow and dependencies over time. By combining both spatial and temporal patterns, this model provides a more accurate and dynamic prediction of MS. Figure 1 illustrates the overall methodology for MS prediction.

3.1. Data Preprocessing

The head and torso movement data were initially collected at a sampling rate of 60 Hz, with 6-DOF positions. Since each participant’s experiment had varying durations (ranging from 5 to 40 min), we selected a consistent 5-min segment of recorded sensor data for each subject. This window consists of 2 min from the middle and 3 min from the end of the experiment, ensuring consistency across all participants. The data segmentation was applied to all signals, where continuous sensor data

X = {x_{1}, x_{2}, \dots, x_{n}}

is divided into fixed-length windows, typically representing 20 s intervals. Each segment

S_{k}

contains a sequence of data points from a specific time window, such that

S_{k} = {x_{k \cdot d + 1}, x_{k \cdot d + 2}, \dots, x_{(k + 1) \cdot d}}

where d is the number of data points in each segment and k represents the total number of segments. Then, a z-score normalization is applied to each segment to standardize the values, ensuring consistent scaling across samples and mitigating the effects of outliers and noise in the data using the following formula:

X_{i} = \frac{X_{i} - μ}{σ}

where

X_{i}

represents a data point and

μ

and

σ

denote the mean and standard deviation of the segment.

3.2. Data Annotation of MS Data

To construct the ground truth and label the data for classifying the severity level of MS, the analysis scores from the pre-SSQ and post-SSQ were used, which categorize each subject’s data as “sick” or “well.” Each segment

S_{k}

is assigned a label

y_{k}

based on pre- and post-SSQ responses:

y_{k} = \{\begin{matrix} 0 & if “ well ” \\ 1 & if “ sick ” \end{matrix}

Finally, the segmented and labeled data pairs are prepared for model training as

\hat{y_{k}} = Model (S_{k}), \forall S_{k} \in Window Size

where

\hat{y_{k}}

represents the predicted label for each segment

S_{k}

, where the model predicts whether the participant is ‘sick’ (1) or ‘well’ (0). This prepares the dataset for input into the prediction model.

3.3. Model Architecture for MS Prediction

In this study, several machine learning and deep learning models, including Support Vector Machines (SVM), k-Nearest Neighbors (KNN), Decision Trees (DT), Recurrent Neural Networks (RNNs) and Convolutional Recurrent Neural Network (C-RNN) were used for the prediction of MS, as these models have demonstrated effectiveness in previous research [40]. The performance of these models was evaluated using multiple metrics, including accuracy, precision, recall, F1 score, and ROC AUC.

3.3.1. C-RNN Architecture

The proposed C-RNN is a hybrid deep learning model that combines the strengths of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to effectively capture both spatial and temporal patterns in motion data. The 1D-CNN layers, positioned before the RNN, play a dual role. They automatically detect and refine discriminative local patterns in the motion signals, such as short-term oscillations in head and torso movement that may signal the onset of discomfort. At the same time, they reduce feature dimensionality and filter out irrelevant variations, allowing the RNN to concentrate on longer-term temporal dependencies that are more indicative of motion sickness progression.

By extracting and refining local spatial features before sequence modeling, the CNN stage enhances the RNN’s ability to interpret temporal dynamics, leading to improvements in both recall (fewer false negatives) and precision (fewer false positives). This integration ensures that subtle yet meaningful patterns in the movement data are preserved and amplified before temporal analysis.

While CNNs are well-suited for capturing spatial structures from images, signals, or short time-series segments, RNNs excel at modeling sequential dependencies, making them ideal for tracking changes over time. The C-RNN architecture, therefore, offers a powerful approach for tasks that require simultaneous spatial and temporal analysis, such as video interpretation, speech recognition, and time-series prediction [41].

The structure of our C-RNN model comprises an input layer, two 1D-CNN layers, an RNN layer, and fully connected layers, as illustrated in Figure 2.

In the input layer, each subject’s data sample consists of 18 features obtained from three head and torso sensors sampled at 60 Hz, resulting in 3600 time steps per minute. The input data were then reshaped for use in the two 1D-CNN layers, each comprising 128 filters with a kernel size of 3 and the Rectified Linear Unit (ReLU) as an activation function. This layer is responsible for extracting spatial and temporal patterns from the input data. A dropout layer with a rate of 0.1 follows each convolutional layer to prevent overfitting by randomly disabling a fraction of the neurons during training. Then, a MaxPooling1D layer is applied to reduce the spatial dimensionality of the extracted features. The output of the max pooling layer is flattened to create a one-dimensional feature vector as the input to the RNN model. The RNN consists of 64 units, which pass the output through two fully connected layers with 32 and 1 units, respectively, using an Exponential Linear Unit (ELU) and sigmoid activation functions. The ELU mitigates the vanishing gradient problem by allowing negative outputs for negative inputs, which helps prevent neurons from becoming inactive during training and improves the overall model performance [42]. This is followed by a dropout layer with a rate of 0.1 in the second fully connected layer to prevent overfitting. A binary cross-entropy loss function and the Adam optimizer were applied with an early stopping regularization technique with 10 epochs of patience to restore the best weight at the end of training.

3.3.2. Models Training and Evaluation

The dataset was divided into training and testing sets with an 80/20 ratio using stratified sampling to ensure that both subsets had similar class distributions. Five-fold cross-validation was applied to the training set to enhance model robustness and evaluate performance across multiple subsets of the data. This technique mitigates the risk of overfitting by ensuring the model is evaluated on various data partitions, offering a more reliable assessment of its generalization capability. The models were evaluated using metrics such as accuracy, precision, recall, F1-score, and AUC (area under the ROC curve).

Accuracy is calculated as

Accuracy = \frac{T P + T N}{T P + F P + T N + F N}

where True Positive (TP) denotes the count of accurately predicted positive classes; False Positive (FP) represents the count of erroneously predicted positive classes; True Negative (TN) indicates the count of accurately predicted negative classes; while False Negative (FN) signifies the count of erroneously predicted negative classes. Precision is a measure of the proportion of correctly classified positive instances. Recall indicates the proportion of true positive instances correctly identified. F-score combines precision and recall into a single metric. The AUC measures the model’s ability to distinguish between positive and negative samples, and accuracy provides a holistic view of the model’s performance.

4. Results

This section details the evaluation of the proposed C-RNN model for MS prediction, comparing it with other models, including SVM, KNN, DT, and RNN. The models were evaluated using a range of performance metrics, including accuracy, precision, recall, and F1-score, to address the class imbalance in the dataset. The dataset was evaluated using five-fold cross-validation. In each fold, one subset was used for testing, while the remaining data were used for training and validation. The results from each fold were averaged to provide the final performance metrics, and a comparative analysis was conducted across models.

Table 2 summarizes the performance metrics for all models evaluated. The C-RNN model achieved the highest accuracy of 85.63%, outperforming SVM (60%), KNN (73.75%), DT (74.38%), and RNN (81.88%) by a significant margin. The precision, recall, and F1-score for the C-RNN model also showed substantial improvements compared to other models, reflecting its superior ability to accurately predict MS symptoms.

To further evaluate the significant differences in the performance of the models, a paired t-test was conducted on accuracy, precision, recall, and F1 score between the C-RNN model and other baseline models. The statistical results of the t-test show that the C-RNN model significantly outperforms all models (SVM, KNN, and DT) (

p - value < 0.05

) except RNN, highlighting its superior ability to predict the onset of MS as shown in Figure 3. Also, Figure 4 further illustrates the training and validation accuracy and loss curves between C-RNN and RNN during training epochs. We observed rapid convergence of the model during the initial epochs, reaching approximately 89% precision for C-RNN and 79% precision for RNN, respectively. The loss function decreases significantly in C-RNN, indicating the model’s ability to learn from the training data compared to that of RNN, which may not effectively learn from the extracted features.

5. Discussion

The results demonstrate the strong potential of using head and torso movements for MS prediction in VR environments. The superior performance of the proposed C-RNN model underscores the importance of integrating convolutional and recurrent layers to effectively capture the complex spatial–temporal patterns associated with motion sickness. Specifically, the convolutional layers extract localized motion features, while the recurrent layers model their temporal dependencies, enabling a more comprehensive representation of dynamic sensory conflicts. In contrast, although traditional machine learning models such as SVM, Decision Trees (DT), and k-Nearest Neighbors (KNN) can provide moderate classification performance, they lack the capacity to model sequential dependencies inherent in time-series data. Consequently, they are less effective in capturing the progressive and time-dependent nature of motion sickness development. The performance evaluation shows that RNN and C-RNN models achieve high accuracy, at 81.88% and 85.63%, respectively, outperforming traditional machine learning models. As highlighted in Figure 3, C-RNN consistently exceeds the other models in precision, recall, and F1-score, demonstrating its superior MS classification ability. The AUC values for C-RNN (0.90) and RNN (0.88) indicate strong discriminatory power. C-RNN achieves a slightly higher AUC value, indicating superior discriminatory capability between sick and well subjects. The improved precision of C-RNN implies fewer false positives, while its higher recall suggests fewer false negatives. The F1-score further confirms the better classification performance of C-RNN [43,44,45]. The superior performance of the C-RNN can be interpreted within the framework of sensory conflict theory. Motion sickness arises from time-dependent discrepancies between visual and vestibular signals. These discrepancies manifest as short-term oscillatory motion patterns and longer temporal instability in head–torso coordination. The convolutional layers detect localized oscillatory inconsistencies in motion trajectories, while the recurrent layer models their temporal accumulation, reflecting the progressive buildup of sensory conflict. Traditional classifiers lack the capacity to capture this hierarchical temporal structure, which explains their comparatively lower performance. While previous studies investigated a fusion of several sensors and found slightly higher results, the C-RNN model’s success in accurately predicting MS symptoms without the need for additional sensors beyond the head and torso movement sensors offers a more user-friendly and non-intrusive solution for MS detection. Unlike multimodal approaches that rely on EEG, ECG, or eye-tracking signals, our results demonstrate that head–torso kinematics alone provide sufficient predictive information, reducing hardware complexity and improving practical deployability in consumer VR systems. This approach could be particularly beneficial in both VR and real-world applications, where minimizing sensor requirements and maintaining user comfort are crucial for improving the overall experience. For example, Li et al. [46] and Islam et al. [37] integrated several sensors and achieved an accuracy of 75.6% and 87.77%.

The findings support sensory conflict theory, which posits that MS arises when the brain receives inconsistent inputs from multiple sensory systems, including vision, the vestibular system, and proprioception. In VR, these conflicts occur when virtual visual stimuli do not align with physical body movements, leading to disorientation and discomfort. Accurate prediction of MS therefore enables proactive mitigation strategies aimed at minimizing such sensory discrepancies before symptoms intensify.

During VR experiences involving head and torso movements, mismatches between perceived and actual motion can disrupt multisensory integration and trigger MS symptoms [26,47]. The present results demonstrate that head–torso kinematic patterns contain sufficient predictive information to capture these evolving sensory inconsistencies. Prior research further reinforces the importance of head dynamics in cybersickness: Walker et al. [48] reported a significant correlation between reduced head movement frequency and increased sickness severity, while Arcioni et al. [22] emphasized the relationship between head movement behavior and postural stability in head-mounted display (HMD) environments. Together, these findings provide theoretical support for leveraging temporal patterns in head–torso coordination to model cybersickness progression.

Despite these promising results, several limitations should be acknowledged. The dataset comprises 40 young adults (mean age of 24 years), which may constrain generalizability across broader age groups and diverse susceptibility profiles. MS sensitivity has been shown to vary with age, vestibular function, and prior driving experience. Although stratified sampling and cross-validation were applied to enhance reliability and mitigate overfitting, validation on larger and demographically diverse cohorts is necessary to strengthen statistical robustness and external validity.

Additionally, the dataset is derived from a virtual driving scenario. Predictive performance may differ across other VR applications—such as flight simulators, first-person gaming environments, or educational simulations—where motion dynamics and visual flow characteristics vary substantially. Cross-domain evaluation is therefore an important direction for assessing model adaptability across heterogeneous VR contexts.

Another limitation relates to the reliance on the SSQ, which is inherently subjective and retrospective. While widely validated, the SSQ lacks fine-grained temporal resolution for identifying specific triggering events during VR exposure. Future research should incorporate real-time objective physiological or behavioral measurements, such as heart rate variability or eye-tracking metrics, to reduce reporting bias and enable event-correlated analysis. Integrating such measurements with kinematic data may facilitate dynamic monitoring systems capable of continuous cybersickness prediction rather than post hoc classification.

Overall, the proposed C-RNN demonstrates strong potential for real-world MS prediction and severity classification using non-intrusive kinematic signals. Advanced preprocessing techniques, including adaptive windowing strategies, signal smoothing filters, and automated feature extraction mechanisms, may further enhance robustness by reducing noise in six-DOF motion data and isolating the most discriminative features. Beyond methodological refinements, future research should explore the integration of motion sickness prediction models into large-scale VR educational and industrial training platforms. Embedding motion-based monitoring within virtual laboratories and immersive training systems may support sustainable and adaptive digital environments by enhancing user safety, improving engagement, and enabling personalized real-time adjustments. Architectural refinements, hyper-parameter optimization, and multimodal fusion strategies incorporating complementary modalities such as eye-tracking or electroencephalography (EEG) may further strengthen robustness and generalizability. Ultimately, this line of research aims to advance intelligent VR systems capable of continuous monitoring, adaptive mitigation, and scalable deployment across diverse immersive applications.

6. Conclusions

This study presented a motion-based approach for predicting MS in virtual reality environments using head and torso kinematic signals. By leveraging a hybrid Convolutional–Recurrent Neural Network (C-RNN), the model effectively captured spatial motion characteristics and their temporal progression, achieving reliable MS classification without relying on additional physiological sensors. Unlike multimodal sensor-based systems, the proposed framework operates using motion data readily available in standard VR tracking pipelines, improving practicality and user comfort. The predictive capability of the model enables adaptive mitigation strategies, such as visual flow adjustment or field-of-view modulation, to reduce cybersickness in real time. Overall, the findings demonstrate that deep learning applied to head–torso dynamics provides an effective and deployable solution for enhancing comfort and usability in immersive VR systems.

Author Contributions

Conceptualization, A.H. and H.A.; methodology, A.H. and M.R.C.Q.; software, A.H.; validation, H.A., S.N. and S.G.; formal analysis, A.H. and A.K.; investigation, A.H. and T.H.; data curation, A.H.; writing—original draft preparation, A.H.; writing—review and editing, all authors; supervision, S.N. and S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Australian Research Council (Project ID: DE210101623).

Data Availability Statement

The data used in this study are publicly available from the original studies by Chang et al. [30,31].

Acknowledgments

The authors would like to acknowledge the support of the Australian Research Council (Project ID: DE210101623) and IISRI, Deakin University, for providing the required equipment and labs to successfully conduct this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bertolini, G.; Straumann, D. Moving in a moving world: A review on vestibular motion sickness. Front. Neurol. 2016, 7, 14. [Google Scholar] [CrossRef]
Koohestani, A.; Nahavandi, D.; Asadi, H.; Kebria, P.M.; Khosravi, A.; Alizadehsani, R.; Nahavandi, S. A Knowledge Discovery in Motion Sickness: A Comprehensive Literature Review. IEEE Access 2019, 7, 85755–85770. [Google Scholar] [CrossRef]
Golding, J.F. Motion sickness susceptibility. Auton. Neurosci. Basic Clin. 2006, 129, 67–76. [Google Scholar] [CrossRef]
Bos, J.E.; Bles, W.; Groen, E.L. A theory on visually induced motion sickness. Displays 2008, 29, 47–57. [Google Scholar] [CrossRef]
Keshavarz, B.; Hecht, H. Validating an efficient method to quantify motion sickness. Hum. Factors 2011, 53, 415–426. [Google Scholar] [CrossRef] [PubMed]
D’Amour, S.; Bos, J.E.; Keshavarz, B. The efficacy of airflow and seat vibration on reducing visually induced motion sickness. Exp. Brain Res. 2017, 235, 2811–2820. [Google Scholar] [CrossRef]
Kennedy, R.S.; Drexler, J.; Kennedy, R.C. Research in visually induced motion sickness. Appl. Ergon. 2010, 41, 494–503. [Google Scholar] [CrossRef]
Lubeck, A.J.; Bos, J.E.; Stins, J.F. Motion in images is essential to cause motion sickness symptoms, but not to increase postural sway. Displays 2015, 38, 55–61. [Google Scholar] [CrossRef]
Howard, M.C.; Van Zandt, E.C. A meta-analysis of the virtual reality problem: Unequal effects of virtual reality sickness across individual differences. Virtual Real. 2021, 25, 1221–1246. [Google Scholar] [CrossRef]
Ng, A.K.; Chan, L.K.; Lau, H.Y. A study of cybersickness and sensory conflict theory using a motion-coupled virtual reality system. Displays 2020, 61, 101922. [Google Scholar] [CrossRef]
Yildirim, C. Cybersickness during VR gaming undermines game enjoyment: A mediation model. Displays 2019, 59, 35–43. [Google Scholar] [CrossRef]
Abe, M.; Yoshizawa, M.; Sugita, N.; Tanaka, A.; Chiba, S.; Yambe, T.; Nitta, S.i. A method for evaluating effects of visually-induced motion sickness using ICA for photoplethysmography. In Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 4591–4594. [Google Scholar] [CrossRef]
Reinhard, R.; Rutrecht, H.M.; Hengstenberg, P.; Tutulmaz, E.; Geissler, B.; Hecht, H.; Muttray, A. The best way to assess visually induced motion sickness in a fixed-base driving simulator. Transp. Res. Part F Traffic Psychol. Behav. 2017, 48, 74–88. [Google Scholar] [CrossRef]
Salmerón-Manzano, E.; Manzano-Agugliaro, F. The Higher Education Sustainability through Virtual Laboratories: The Spanish University as Case of Study. Sustainability 2018, 10, 4040. [Google Scholar] [CrossRef]
Hell, S.; Argyriou, V. Machine Learning Architectures to Predict Motion Sickness Using a Virtual Reality Rollercoaster Simulation Tool. In Proceedings of the 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), Taichung, Taiwan, 10–12 December 2018; pp. 153–156. [Google Scholar] [CrossRef]
Teaford, M.A.; Cook, H.E.; Hassebrock, J.A.; Thomas, R.D.; Smart, L.J. Perceptual Validation of Nonlinear Postural Predictors of Visually Induced Motion Sickness. Front. Psychol. 2020, 11, 1533. [Google Scholar] [CrossRef]
Zielasko, D.; Riecke, B.E. To Sit or Not to Sit in VR: Analyzing Influences and (Dis)Advantages of Posture and Embodied Interaction. Computers 2021, 10, 73. [Google Scholar] [CrossRef]
Ramaseri Chandra, A.N.; El Jamiy, F.; Reza, H. A Systematic Survey on Cybersickness in Virtual Environments. Computers 2022, 11, 51. [Google Scholar] [CrossRef]
Sharifkhani, M.; Davidson, J.; MacCallum, K.; Evans-Freeman, J.; Brown, C.; Bullsmith, C.; Richards, B. Sustainable Practices in Education: Virtual Labs. In People, Partnerships and Pedagogies; Cochrane, T., Narayan, V., Brown, C., MacCallum, K., Bone, E., Deneen, C., Vanderburg, R., Hurren, B., Eds.; ASCILITE Publications: Christchurch, New Zealand, 2023; pp. 205–214. [Google Scholar] [CrossRef]
Alcayde, A.; Robalo, I.; Montoya, F.G.; Manzano-Agugliaro, F. SCADA System for Online Electrical Engineering Education. Inventions 2022, 7, 115. [Google Scholar] [CrossRef]
Liu, M.; Yang, B.; Xu, M.; Zan, P.; Chen, L.; Xia, X. Exploring quantitative assessment of cybersickness in virtual reality using EEG signals and a CNN-ECA-LSTM network. Displays 2024, 81, 102602. [Google Scholar] [CrossRef]
Arcioni, B.; Palmisano, S.; Apthorp, D.; Kim, J. Postural stability predicts the likelihood of cybersickness in active HMD-based virtual reality. Displays 2019, 58, 3–11. [Google Scholar] [CrossRef]
Balk, S.A.; Bertola, M.A.; Inman, V.W. Simulator Sickness Questionnaire: Twenty Years Later. In Proceedings of the 7th International Driving Symposium on Human Factors in Driver Assessment, Training, and Vehicle Design: Driving Assessment 2013, Bolton Landing, NY, USA, 17–20 June 2013; pp. 257–263. [Google Scholar] [CrossRef]
Gruden, T.; Popović, N.B.; Stojmenova, K.; Jakus, G.; Miljković, N.; Tomažič, S.; Sodnik, J. Electrogastrography in autonomous vehicles—An objective method for assessment of motion sickness in simulated driving environments. Sensors 2021, 21, 550. [Google Scholar] [CrossRef]
Keshavarz, B.; Peck, K.; Rezaei, S.; Taati, B. Detecting and predicting visually induced motion sickness with physiological measures in combination with machine learning techniques. Int. J. Psychophysiol. 2022, 176, 14–26. [Google Scholar] [CrossRef] [PubMed]
Recenti, M.; Ricciardi, C.; Aubonnet, R.; Picone, I.; Jacob, D.; Svansson, H.Á.; Agnarsdóttir, S.; Karlsson, G.H.; Baeringsdóttir, V.; Petersen, H.; et al. Toward predicting motion sickness using virtual reality and a moving platform assessing brain, muscles, and heart signals. Front. Bioeng. Biotechnol. 2021, 9, 635661. [Google Scholar] [CrossRef]
Zhang, L.L.L.; Wang, J.Q.Q.; Qi, R.R.R.; Pan, L.L.L.; Li, M.; Cai, Y.L.L. Motion Sickness: Current Knowledge and Recent Advance. CNS Neurosci. Ther. 2016, 22, 15–24. [Google Scholar] [CrossRef]
Iskander, J.; Attia, M.; Saleh, K.; Nahavandi, D.; Abobakr, A.; Mohamed, S.; Asadi, H.; Khosravi, A.; Lim, C.P.; Hossny, M. From car sickness to autonomous car sickness: A review. Transp. Res. Part F Traffic Psychol. Behav. 2019, 62, 716–726. [Google Scholar] [CrossRef]
Lackner, J.R. Motion sickness: More than nausea and vomiting. Exp. Brain Res. 2014, 232, 2493–2510. [Google Scholar] [CrossRef]
Chang, C.H.; Chen, F.C.; Kung, W.C.; Stoffregen, T.A. Effects of physical driving experience on body movement and motion sickness during virtual driving. Aerosp. Med. Hum. Perform. 2017, 88, 985–992. [Google Scholar] [CrossRef]
Chang, C.H.; Stoffregen, T.A.; Cheng, K.B.; Lei, M.K.; Li, C.C. Effects of physical driving experience on body movement and motion sickness among passengers in a virtual vehicle. Exp. Brain Res. 2021, 239, 491–500. [Google Scholar] [CrossRef]
Hollosi, J.; Ballagi, A.; Kovacs, G.; Fischer, S.; Nagy, V. Bus Driver Head Position Detection Using Capsule Networks under Dynamic Driving Conditions. Computers 2024, 13, 66. [Google Scholar] [CrossRef]
Saruchi, S.A.; Zamzuri, H.; Hassan, N.; Ariff, M.H.M. Modeling of head movements towards lateral acceleration direction via system identification for motion sickness study. In Proceedings of the 2018 International Conference on Information and Communications Technology, ICOIACT 2018, Yogyakarta, Indonesia, 6–7 March 2018; pp. 633–638. [Google Scholar] [CrossRef]
Ozkan, A.; Uyan, U.; Celikcan, U. Effects of speed, complexity and stereoscopic VR cues on cybersickness examined via EEG and self-reported measures. Displays 2023, 78, 102415. [Google Scholar] [CrossRef]
Hwang, J.U.; Bang, J.S.; Lee, S.W. Classification of Motion Sickness Levels using Multimodal Biosignals in Real Driving Conditions. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 1304–1309. [Google Scholar] [CrossRef]
Islam, R.; Lee, Y.; Jaloli, M.; Muhammad, I.; Zhu, D.; Rad, P.; Huang, Y.; Quarles, J. Automatic Detection and Prediction of Cybersickness Severity using Deep Neural Networks from user’s Physiological Signals. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Porto de Galinhas, Brazil, 9–13 November 2020; pp. 400–411. [Google Scholar] [CrossRef]
Islam, R.; Desai, K.; Quarles, J. Cybersickness Prediction from Integrated HMD’s Sensors: A Multimodal Deep Fusion Approach using Eye-tracking and Head-tracking Data. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Virtual, 4–8 October 2021; pp. 31–40. [Google Scholar] [CrossRef]
Hag, A.; Qazani, M.R.C.; Wei, L.; Nahavandi, S.; Asadi, H. Attention-Based Deep Learning for Quantifying Simulator Sickness using Eye and Head Motion Data in the Genesis Simulator. In Proceedings of the 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Vienna, Austria, 5–8 October 2025; pp. 1045–1051. [Google Scholar] [CrossRef]
Hag, A.; Chalak Qazani, M.R.; Asadi, H. Quantifying Motion Sickness in Virtual Reality Using a Multimodal 1CNN–GRU–Attention Approach With GSR Data. IEEE Trans. Intell. Transp. Syst. 2025, 26, 22003–22014. [Google Scholar] [CrossRef]
Yang, A.H.X.; Kasabov, N.; Cakmak, Y.O. Machine learning methods for the study of cybersickness: A systematic review. Brain Inform. 2022, 9, 24. [Google Scholar] [CrossRef] [PubMed]
Jeong, D.; Yoo, S.; Yun, J. Cybersickness analysis with EEG using deep learning algorithms. In Proceedings of the 26th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2019 Proceedings, Osaka, Japan, 23–27 March 2019; pp. 827–835. [Google Scholar] [CrossRef]
Clevert, D.A.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (ELUs). In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016 Conference Track Proceedings, San Juan, Puerto Rico, 2–4 May 2016; pp. 1–14. [Google Scholar] [CrossRef]
Castro, J.B.; Feitosa, R.Q.; Happ, P.N. An Hybrid Recurrent Convolutional Neural Network for Crop Type Recognition Based on Multitemporal Sar Image Sequences. In Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2018, Valencia, Spain, 22–27 July 2018; pp. 3824–3827. [Google Scholar] [CrossRef]
Kim, J.; Kim, W.; Oh, H.; Lee, S.; Lee, S. A Deep Cybersickness Predictor Based on Brain Signal Analysis for Virtual Reality Contents. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10579–10588. [Google Scholar] [CrossRef]
Lagoutaris, V.; Moustakas, K. Motion Prediction Of Traffic Agents With Hybrid Recurrent-Convolutional Neural Networks. In Proceedings of the International Conference on Digital Signal Processing, DSP, Rhodes (Rodos), Greece, 11–13 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Li, Y.; Liu, A.; Ding, L. Machine learning assessment of visually induced motion sickness levels based on multiple biosignals. Biomed. Signal Process. Control 2019, 49, 202–211. [Google Scholar] [CrossRef]
Laessoe, U.; Abrahamsen, S.; Zepernick, S.; Raunsbaek, A.; Stensen, C. Motion sickness and cybersickness—Sensory mismatch. Physiol. Behav. 2023, 258, 114015. [Google Scholar] [CrossRef]
Walker, A.D.; Muth, E.R.; Switzer, F.S.; Hoover, A. Head movements and simulator sickness generated by a virtual environment. Aviat. Space Environ. Med. 2010, 81, 929–934. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Frameworkfor predicting motion sickness using head and torso movement data.

Figure 2. Proposed C-RNN model for predicting MS using head and torso movements.

Figure 3. Performancecomparison of C-RNN with other models, showing mean and standard deviation of key evaluation metrics.

Figure 4. The performance of RNN and C-RNN in terms of accuracy and loss functions.

Table 1. Summary of the data structure provided in the dataset.

Files	Subjects	Data Structure	Duration	Frequency Rate
Movement data (40 .txt files)	40 subjects; 1 file per subject	6 DOF (X, Y, Z, A, E, R); 3 receivers	5–40 min	60 Hz
SSQ raw data.xlsx	1 row per subject	Gender, condition (sick/well), pre-SSQ and post-SSQ	–	–

Table 2. Performance comparison of SVM, KNN, DT, RNN, and C-RNN for MS classification based on head and torso movements.

Algorithm	Precision	Recall	F-Score	ROC AUC	Accuracy
SVM	58%	65%	60%	66%	60%
KNN	72%	76%	73%	82.5%	73.75%
DT	72%	75%	73.5%	75%	74.38%
RNN	80%	78%	79%	88.5%	81.88%
C-RNN	83%	89%	86%	90.01%	85.63%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hag, A.; Asadi, H.; Qazani, M.R.C.; Hoang, T.; Kulkarni, A.; Greuter, S.; Nahavandi, S. Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model. Computers 2026, 15, 193. https://doi.org/10.3390/computers15030193

AMA Style

Hag A, Asadi H, Qazani MRC, Hoang T, Kulkarni A, Greuter S, Nahavandi S. Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model. Computers. 2026; 15(3):193. https://doi.org/10.3390/computers15030193

Chicago/Turabian Style

Hag, Ala, Houshyar Asadi, Mohammad Reza Chalak Qazani, Thuong Hoang, Ambarish Kulkarni, Stefan Greuter, and Saeid Nahavandi. 2026. "Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model" Computers 15, no. 3: 193. https://doi.org/10.3390/computers15030193

APA Style

Hag, A., Asadi, H., Qazani, M. R. C., Hoang, T., Kulkarni, A., Greuter, S., & Nahavandi, S. (2026). Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model. Computers, 15(3), 193. https://doi.org/10.3390/computers15030193

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Cybersickness in Virtual Reality from Head–Torso Kinematics Using a Hybrid Convolutional–Recurrent Network Model

Abstract

1. Introduction

2. Materials and Methods

3. Methodology

3.1. Data Preprocessing

3.2. Data Annotation of MS Data

3.3. Model Architecture for MS Prediction

3.3.1. C-RNN Architecture

3.3.2. Models Training and Evaluation

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI