Electrogastrogram-Derived Features for Automated Sickness Detection in Driving Simulator

The rapid development of driving simulators for the evaluation of automated driving experience is constrained by the simulator sickness-related nausea. The electrogastrogram (EGG)-based approach may be promising for immediate, objective, and quantitative nausea assessment. Given the relatively high EGG sensitivity to noises associated with the relatively low amplitude and frequency spans, we introduce an automated procedure comprising statistical analysis and machine learning techniques for EGG-based nausea detection in relation to the noise contamination during automated driving simulation. We calculate the root mean square of EGG amplitude, median and dominant frequencies, magnitude of Power Spectral Density (PSD) at dominant frequency, crest factor of PSD, and spectral variation distribution along with newly introduced parameters: sample and spectral entropy, autocorrelation zero-crossing, and parameters derived from the Poincaré diagram of consecutive EGG samples. Results showed outstanding robustness of sample entropy with moderate robustness of autocorrelation zero-crossing, dominant frequency, and its median. Machine learning reached an accuracy of 88.2% and revealed sample entropy as one of the most relevant and robust parameters, while linear analysis highlighted spectral entropy, spectral variation distribution, and crest factor of PSD. This study clearly indicates the need for customized feature selection in noisy environments, as well as a complementary approach comprising machine learning and statistical analysis for efficient nausea detection.


Introduction
Driving simulators are used in various industries to study human behavior, observe their performance and driving skills, and validate new human-machine interfaces in vehicles. They provide a robust, safe, and controllable testing environment, but sometimes cause simulation sickness and other unpleasant sensations [1,2]. The general approach to assessing sickness incorporates both subjective techniques mainly based on questionnaires (such as Motion Sickness Questionnaire or Simulator Sickness Questionnaire) and objective measures commonly comprising physiological recordings (heart rate, body temperature, electrodermal activity, electrogastrogram (EGG), etc.) [3][4][5][6]. The main difference between those two approaches is that subjective measures (driver's perception) are administered commonly after the recording session, while physiological measurements can be used for continuous assessment. Although different in nature, for complete practical assessment, a holistic approach comprising both objective and subjective measures is advised [7][8][9].
Real time or quasi real time continuous assessment with physiological measures is an attractive approach for the evaluation of nausea and simulator sickness phenomenon. However, these measures proved to be prone to a variety of factors either due to the environmental or noise contamination as it has been shown previously for skin temperature, EDA, and EGG [9,10]. Although the measurement of pupil diameter and pupillary rhythm

Rationale for Introduction of New EGG-Based Parameters
The level of randomness is a particularly useful measure in many areas of data analysis. For example, in stock market analysis, approximate entropy as a measure of randomness suggests more data predictability in times of crisis indicating more pronounced repeated patterns [23]. Sample entropy is also a promising method for determining the regularity of signals based on the existence of patterns. The method is similar to approximate entropy, but it is independent of the signal length and has better relative consistency [24]. It was successfully used, for example, to separate uterine electromyogram (EMG) records of term and pre-term delivery groups [25]. On the other side, spectral entropy or the entropy of a signal normalized power distribution can be used to estimate the uniformity of signal power distribution and, as a result, discriminate between narrowband and wideband signals. The method was, for example, used to determine the depth of anesthesia from an electroencephalogram (EEG) [26,27]. For the EGG-based detection of nausea, we used both sample and spectral entropy as we hypothesize that these features would discriminate between less and more random changes in EGG signals corresponding to the baseline recording and nausea occurrence, respectively. In fact, this could fit perfectly into EGG- based nausea assessment as repeated patterns are more pronounced in the baseline EGG waveform shape consisting mainly of normal gastric rhythm and revealing narrowband nature in comparison to the EGG signal during nausea occurrence.
The autocorrelation zero-crossing is another method for determining the randomness of signals. The method is based on calculating the first zero-crossing of the autocorrelation function: the closer it is to the maximum of the autocorrelation function, the more random the signal. This method has been used successfully in the analysis of the EMG and electrocardiogram (ECG) where it showed less proneness to the signal interferences [25,28]. For that reason, we hypothesize that it could be simply adopted for EGG-based analysis.
Previously, EGG frequency dynamics were expressed by utilizing the number of turning points as a test of randomness to study gastric coupling in animals [29,30]. Here, we adopt a common approach to examine heart rate variability based on standard deviations of a Poincaré plot to study EGG-related nausea alterations. A Poincaré plot allows for the evaluation of non-linear aspects of signal sequences, and it is commonly used in biomedical engineering to characterize heart rate variability [31]. Our hypothesis for the adoption of a Poincaré plot and its adaptation in EGG analysis is based on the premise that changes in EGG signal as a result of nausea occurrence would express more non-linear properties in comparison to the baseline EGG recordings.

Noise Effect on EGG-Based Parameters
Although promising, EGG-based assessment of nausea comes with a major drawbackexaggerated noise. Due to the relatively low amplitude and low frequency content, the EGG signal is easily affected by noises and artifacts, especially by the movement artifacts. This is clearly pronounced in motion-based driving simulators as they come with an interactive and dynamic environment causing EGG data quality to decrease due to the excessive noise [10].
To test the applicability of previously used and newly introduced EGG-based parameters and their robustness to noise levels, we added synthetic colored noise on datasets comprising EGG signals recorded in 20 healthy participants during driving simulation to create semi-synthetic datasets. To the best of our knowledge, this is the first study reporting effect of noise on EGG features for simulator sickness-related nausea assessment.
Moreover, we introduce and demonstrate a new machine learning approach to automatically evaluate nausea occurrence by EGG-based parameters in relation to noise. We report results of both statistical analysis and machine learning by reasoning for a middle ground approach between "data modeling" and "algorithmic modeling" cultures to yield a strong empirical foundation as recommended in [32].

Aims of the Study
Our objective is twofold. We firstly aim at the extensive exploration of known and novel EGG-based parameters for nausea assessment. Secondly, we seek features with proven higher levels of robustness to noises and artifacts that would be more appropriate for nausea detection in dynamic driving simulator environments. To achieve our aims, we employ automated techniques for the extraction of proposed EGG-based features with various Signal-to-Noise Ratios (SNRs) with added synthetic noise and contribute to the existing body of knowledge in the following ways:

1.
We present an extended list of EGG-based features for nausea assessment following pertinent reasoning for their calculation.

2.
We report on the sensitivity/robustness of the proposed EGG-based parameters to different levels of SNRs and the noise effect on nausea detection.

Materials and Methods
We introduce an extensive list of EGG-based parameters for nausea detection and for the evaluation of parameter robustness to different levels of SNRs. We use both traditional statistical linear analysis and non-linear machine learning to test the usability of the selected EGG features for the detection of nausea incidence.

Available EGG Data and Recording Procedure
The data for the analysis are obtained from the study by Gruden et al. [4]. In the continuation, we summarize the most important information on the data collection required to explain the main goals of the present research.
The study was conducted at the Faculty of Electrical Engineering, University of Ljubljana, Slovenia. Twenty individuals (two females), mostly students or staff from the Faculty of Electrical Engineering in Ljubljana participated in the study. The participants were between 19 and 40 years old, had a valid driving license, and had more than one year of driving experience. They were instructed to fast for at least 6 h and not to drink for at least 2 h before the study [33].
The study was performed in the Nervtech driving simulator (Nervtech d.o.o., Trzin, Slovenia) [34,35] with a motion platform with 4 degrees of freedom (yaw, pitch, roll, and heave) ( Figure 1). The cockpit consisted of an adjustable car seat, a Fanatec ClubSport Wheel Base V2 steering wheel with dynamic feedback, and a Fanatec ClubSport Pedals V3 pedal set with three pedals (both from Fanatec, Endor AG, Landshut, Germany) [36,37]. The driving environment was simulated using the SCANeR simulation software (AV simulation, Boulogne, France) [38], a virtual reality headset (Oculus, Facebook Technologies LLC, Menlo Park, CA, USA) [39], and a stereo speaker set. Based on the definition of physical and functional fidelity as defined by Kinkade and Wheaton [40] and Hays [41], the simulator used in this study can be described as a high-fidelity driving simulator. After the participants were introduced to the experiment procedure, they signed informed consents in accordance with the Declaration of Helsinki and University Code of Ethics. The participants then completed a test trial to get acquainted with the experimental environment. They were instructed to raise their hand to stop the experiment at any time if sickness was too severe to be able to continue. The main part of the experiment was divided into three parts:

1.
Baseline measurement before the driving simulation.

2.
Driving simulation in autonomous vehicle.

3.
EGG measurement while resting after driving simulation. The driving simulation consisted of a less dynamic drive, which took part on a highway road and a more dynamic drive, which took part on a countryside road. The duration of each driving scene was approximately 7 min. The screenshots of the highway and countryside roads are given in Figure 2. The user study had a within-subject design as the participants' state was measured before, during, and after AV driving simulation.  The driving simulation consisted of a less dynamic drive, which took part on a highway road and a more dynamic drive, which took part on a countryside road. The duration of each driving scene was approximately 7 min. The screenshots of the highway and countryside roads are given in Figure 2. The user study had a within-subject design as the participants' state was measured before, during, and after AV driving simulation. The driving simulation consisted of a less dynamic drive, which took part on a highway road and a more dynamic drive, which took part on a countryside road. The duration of each driving scene was approximately 7 min. The screenshots of the highway and countryside roads are given in Figure 2. The user study had a within-subject design as the participants' state was measured before, during, and after AV driving simulation. In order to induce higher levels of sickness and at the same time to reduce the potential and exaggerated EGG artifacts originating from motion or steering, fully autonomous driving was used (SAE level 5) [42]. The participants' primary task was, therefore, only to observe the autonomous driving.
Five Ag/AgCl surface electrodes (H92SG, Kendall/Covidien, Dublin, Ireland) were placed on the participants' stomach to measure three-channel EGG following the recommendations [33,43,44]. The EGG signal was recorded using an amplification and filtering device [10] and digitized using Biopac UIM100C MP150 Analog-to-Digital Converter In order to induce higher levels of sickness and at the same time to reduce the potential and exaggerated EGG artifacts originating from motion or steering, fully autonomous driving was used (SAE level 5) [42]. The participants' primary task was, therefore, only to observe the autonomous driving.
Five Ag/AgCl surface electrodes (H92SG, Kendall/Covidien, Dublin, Ireland) were placed on the participants' stomach to measure three-channel EGG following the recommendations [33,43,44]. The EGG signal was recorded using an amplification and filtering device [10] and digitized using Biopac UIM100C MP150 Analog-to-Digital Converter (ADC) (Biopac Systems, Goleta, CA, USA) [45]. The sampling frequency was set to 2 Hz and the resolution was 16 bits.

EGG Preprocessing and Creation of Semi-Synthetic EGG Dataset
Preprocessing and feature extraction are performed offline in Matlab version R2019b (The Mathworks, Natick, MA, USA). For feature extraction, we used four segments corresponding to the baseline recording, countryside driving, highway driving, and EGG measurement after driving simulation while resting. Furthermore, participants were instructed to continuously report any experience of nausea or sickness by pressing a button, which was recorded alongside EGG data. As the frequency of EGG slow waves is approximately 3 cycles per minute, and therefore, a large segment of data is required to reliably detect the parameters, the nausea incidence variable was set for each segment to 1 for a segment if at least one button press was detected in that segment; otherwise, it was set to 0.
In the previous study examining noise effect on EGG analysis [46], SNRs from −50 dB to 15 dB with step 5 dB were used, but the majority of changes are seen from −20 dB to 15 dB. Hence, we created a semi-synthetic dataset by adding pseudo-random noise matching SNR values of −20 dB, −10 dB, 0 dB, 10 dB, and 20 dB. We report on exact mean SNRs as we are not able to keep SNRs at a constant level due to the application of a pseudo-random generator and the fact that noise power was set before the filtering which removed the majority of the wideband noise. The actual obtained SNRs in this study are −23 dB, −13 dB, −3 dB, 7 dB, and 17 dB.
Semi-synthetic EGG signals are digitally filtered with the 6th order zero-phase-distortion Butterworth bandpass filter with cutoff frequencies set at 1 cycle per minute (cpm) and 10 cpm (0.0167 Hz and 0.167 Hz). This process colored the noise by filtering it in a range from 1 cpm to 10 cpm. Due to the relatively low EGG amplitude (in the range of µV), the raw EGG signal is very sensitive to noises and motion artifacts. The recorded EGG signals are therefore inspected by an experienced researcher who removed parts with identified Sensors 2022, 22, 8616 6 of 18 motion artifacts. Following an established procedure, these artifacts were identified as short, high-amplitude spikes in contrast to a slowly changing, low-amplitude signal representing gastric activity [4,10,22,47]. From the three measured EGG channels, the channel with the fewest artifacts is selected for further analysis. As 3 subjects are excluded as a consequence of low signal quality, the remaining analysis is performed in 17 subjects [4]. Succeeding feature extraction and nausea detection procedures are completely automated.

Automated Procedure for EGG-Based Features Extraction
All extracted EGG-based parameters are presented in Table 1. For the calculation of the level of randomness, we use Sample Entropy (SampEnt) of time series (SampEntT) and of Power Spectral Density (PSD) (SampEntP), as well as Spectral Entropy (SpectEnt). Additionally, we calculate the autocorrelation zero-crossing, as well as geometrical features corresponding to the Poincaré plots (SD1, SD2, and SDEGG in Table 1). By visual inspection of SampEntT for embedding dimensions m = 2, 3, and 4, we conclude that there are two distinct groups with entropy higher and lower than 10. We use this information as empirical reasoning for splitting the data into two groups and for transforming SampEntT into categorical data for further analysis.
To observe the impact of noise on the selected set of parameters, we statistically compare the values of calculated parameters in different SNR conditions. For normally distributed data, paired sample t-test is used to evaluate the existence of statistically significant differences between original and noisy EGG-based parameters for all SNRs, while Cohen's d (Cd) is used to estimate the effect size. For non-normally distributed data, paired Wilcoxon's Signed-Ranks test is used along with Cliff's delta (Cdelta) as effect size measure. To test normality of the data, we use Shapiro-Wilk's normality test with p set to 0.05. Statistically significant differences among parameters corresponding to the nausea occurrence with those revealing the non-nausea occurrence are explored by t-test for normally distributed data, while non-parametric Wilcoxon-Mann-Whitney's U test is used to compare dependent variables for two independent groups for non-normally distributed data. Here, we also calculated Cd and Cdelta for normally and non-normally distributed data, respectively.
Summary statistics for categorical entropy parameters (SampEntT) in relation to nausea occurrence is reported in conjunction with Pearson's Chi square test with a simulated p value that is used to explore the significant correlation of nausea incidence within the categorical features. If not stated otherwise, the level for statistical significance is set at 0.05.
Binary Random Forest (RF) algorithm is constructed with the aim of classifying EGGbased parameters obtained from EGG data recorded with and without nausea occurrence. The choice of hyperparameters is based on the previous publication on a similar dataset classifying the physiological data [50] and on the Breiman and Cutler's Random Forests for Classification and Regression [60,61]. The seed is set to 100 for the sake of reproducibility. The rationale for RF selection is that previous studies focused on the exploration of noise effect on classification performance, which showed that RF is relatively resistant to noise probably due to the bagging ensemble procedure [62,63]. All parameters from Table 1 are used as input to the RF classifier. We perform leave-one-out cross-validation on the training data as this type of validation is more suitable for datasets with a smaller number of instances [64]. The data are split into training (75%) and test sets (25%) taking into account the distribution balance within the splits by createDataPartition Caret procedure [54] to ensure that both training and test sets are representative of the dataset. Furthermore, we use the automated Caret procedure for tuning the classifier parameters.
The Caret procedure for RF resampling across tuning parameters resulted in the determination of optimal parameters termed mtry that corresponds to the number of variables randomly sampled at each data split. Accuracy is used by the automatic procedure to determine the optimal mtry by using the largest value of accuracy. For parameters obtained from noisy data, we apply two types of RF classifiers: (1) RF trained and tested on noisy data and (2) RF trained on original and tested on noisy data. Together with reported parameters for machine learning (ML) evaluation (Kappa, confidence interval for 95%, accuracy, sensitivity, specificity, precision, AUC (Area Under the Receiver Operating Curve) for both training and tests sets, and recall), we present feature importance plots. Features are ranked according to the score obtained by the sum of the number of times the feature is selected by all trees in created binary RF.  Table 2 presents the results of the statistical analysis for the EGG-based parameters obtained on noisy data with different SNR levels in comparison to the parameters without synthetically added colored noise. Table 2. Results of paired t-tests and Cohen's d (for normally distributed data), as well as paired Wilcoxon's Signed-Ranks tests and Cliff's delta (for not normally distributed data) for comparison of non-noisy and noisy EGG-based parameters are presented. For testing normality, p was set at 0.05. Values with p > 0.05 (no significant difference due to the noise is observed, i.e., EGG-based features are robust to the noise levels) are highlighted in bold. SNR stands for Signal-to-Noise Ratio, V stands for V-statistics, p for probability, Cdelta for Cliff's delta, Cd for Cohen's d, t for t-statistics, and df for degrees of freedom. Comparative results of binary RF classification for original and noisy data are shown in Table 3. Evaluation ML parameters for classifiers trained on the original dataset and tested in noisy data are presented in Table 4. Results for RF classifiers with categorical SampEntT parameters for m = 2, 3, and 4 did not influence the results, so we decided not to present them.  For the original dataset, the mtry = 17 is selected by automatic tuning procedure with an accuracy of 0.790. For noisy datasets, mtry is set to 2 with an accuracy of 0.782, 17 with an accuracy of 0.827, 2 with an accuracy of 0.827, 17 with an accuracy of 0.827, and 2 with an accuracy of 0.810 for SNRs of −23 dB, −13 dB, −3 dB, 7 dB, and 17 dB, respectively.

Results
Importance plots for RF applied on parameters obtained from noisy and original EGG data are presented in Figure 3.
Importance plots for RF applied on parameters obtained from noisy and original EGG data are presented in Figure 3. The results of the t-tests for normally distributed data and of the Wilcoxon-Mann-Whitney's U tests for non-normally distributed data are given in Table 5. Table 6 presents summary statistics for categorical SampEntT EGG-based parameters in relation to nausea occurrence.
Pearson's Chi square test is used to explore the correlation of nausea occurrence with categorical SampEntT parameters for m = 2, 3, and 4, and for SNR = 17 dB, 7 dB, −3 dB, −13 dB, and −23 dB. However, no statistically significant relationships are found. The results of the t-tests for normally distributed data and of the Wilcoxon-Mann-Whitney's U tests for non-normally distributed data are given in Table 5. Table 6 presents summary statistics for categorical SampEntT EGG-based parameters in relation to nausea occurrence. Table 5. Results of statistical tests for the comparison of parameters with and without nausea for non-noisy and noisy data with different noise levels. For testing normality, p is set at 0.05. Statistically significant differences are presented in bold (p < 0.05). SNR stands for Signal-to-Noise Ratio, W stands for W-statistics, p for probability, Cdelta for Cliff's delta, Cd for Cohen's d, t for t-statistics, and df for degrees of freedom.  Pearson's Chi square test is used to explore the correlation of nausea occurrence with categorical SampEntT parameters for m = 2, 3, and 4, and for SNR = 17 dB, 7 dB, −3 dB, −13 dB, and −23 dB. However, no statistically significant relationships are found.

Discussion
As expected, noise influences EGG signals and consequently EGG-based parameters [66]. Presented results suggest that EGG-based parameters have divergent robustness to the additive colored noise. Here, we provide a quantitative approach to evaluate the noise effect on the results of statistical analysis and RF performance for sickness-related nausea detection during automated driving simulation. The discussion on different levels of SNR should be taken with precaution as the original dataset already contains a certain level of noise in the studied frequency band (from 1 cpm to 10 cpm). Hence, we stress that reported SNRs of semi-synthetic EGG data are lower than the actual SNRs. Taking into account that we applied channel selection and manual deletion of segments with exaggerated noise following the procedure applied in [4], we may assume that the true SNRs are non-significant in comparison to the reported SNRs.
Our results provide some important and intriguing insights into the behavior of selected EGG parameters in noisy conditions demonstrating their usability to detect EGG signals affected by nausea and simulator sickness. At the same time, we are aware that a larger and more diverse sample should be used to confirm or contradict the following insights obtained from the presented results.

Effect of Noise on EGG-Based Parameters
Expectedly, changing SNRs have different impacts on EGG-based parameters as their robustness to noise often degraded. Luckily, some features showed independent relationships with different SNRs. We therefore discuss each observed feature independently and indicate its potential and limitations in different noisy conditions. Table 2 reveals that SampEntP is the least affected and does not change significantly across SNRs ranging from 17 dB to −23 dB, whereas SampEntT shows sensitivity to noise at higher SNRs (−13 dB, −23 dB). This may be due to the fact that the PSD of colored noise should be relatively flat on a studied segment of EGG spectrum (and thus relatively deterministic), whereas the noise signal in time domain is random. The sensitivity of SampEntT to noise at higher SNR values is to be expected as the substantial noise compromises the calculations of sample entropy [24].
Both SampEntT and SampEntP remain unchanged for all three embedding dimensions m indicating that the choice of m (if kept small) has no effect on SampEntP sensitivity to the noise in EGG signal. Moreover, SampEntP is the only parameter that had effect size Cdelta under 0.2 indicating negligible changes for all noise levels ( Table 2). In all parameters, Cdelta and Cd where applicable degraded, i.e., revealed increased difference for increased noise levels.
Autocorrelation zero-crossing and median frequency did not change significantly for positive SNRs. This is expected as the calculation of autocorrelation and median frequency depends on the signal power which is in cases of SNR > 0 dB larger than the power of noise. On the other hand, DF remained stable in noisy conditions down to SNR = −3 dB, and unlike autocorrelation zero-crossing and median frequency, DF depends on just one peak location which could remain the same even in cases when noise power is larger than the signal power as long as it is not larger than the DF peak. SDV is not affected only at SNR of 17 dB, and this is expected as the variability of the signal is not the same as the variability of colored noise. Hence, SDV is easily affected, especially for lower SNRs.
Other parameters (RMS, MagDF, CS, SD1, SD2, and SDEGG) are influenced significantly for all SNR levels. Partly, this is expected as RMS and MagDF depend on the amplitude of the EGG signal which changes with added noise. SD1, SD2, and SDEGG depend on the relation between consecutive EGG samples, which is obviously highly sensitive to the colored pseudo-random noise. What is somewhat surprising is the fact that even relatively low noise levels (SNRs of 17 dB and 7 dB) had such a significant influence on these parameters.

Effect of Noise on Random Forest Classifier for Nausea Detection
Accuracy for the binary RF classifier is satisfactory (>88%). This is almost twice as better than a result presented by Dennison et al. [67] where the unimodal classifier trained only on two EGG features (percentages of band power of slow and fast stomach activity) reached an accuracy of 48.52% for four-scale sickness severity assessment. On the other hand, Dennison et al. [67] reported an accuracy higher than 95% when heterogeneous sensor data or solely EEG features are fed to the classifier for nausea severity classification. Future work should definitely be focused on multi-modal data fusion and properly selected EGG-based features for even better classification accuracies. AUC showed a poor classification result (0.616 for training with the best result of 0.667 for test set) which is expected as classifiers have poor specificity of ≤0.333 (Tables 3 and 4). AUC remained constant for the training set throughout all results, probably indicating less training confidence. Although the AUC for the test set reached a maximum of 0.667 and slightly outperformed on unseen instances in comparison to AUC on the training set (0.616), it still performed poorly. It could be argued that without higher specificity, the data cannot reveal a strong machine learning pattern. The classifier performance remained stable even for noisy datasets ( Table 3). The highest degradation happened at the lowest SNRs of −17 dB and −7 dB as expected. This is in line with previously reported results indicating that with higher noise levels, RF performance degrades while it is reasonably resistant to the noise procedure [62,63]. The reason for this may be in the fact that the majority of EGG-based features changed statistically significantly for these noise levels ( Table 2) and that non-linear relations among parameters probably changed. A similar result is seen when a classifier previously trained on original data is tested with a noisy dataset ( Table 4), indicating that RF may be considered a good candidate for nausea detection in simulated automated vehicles by EGG-based parameters.
Class imbalance may be the problem with the available dataset. Although all applied methods should compensate for imbalance, they cannot eliminate it. For nausea occurrence, overall, 12 out of 68 (~17.6%) EGG segments had positive nausea incidence. We hypothesize that RF deals well with imbalanced data in comparison with other classifiers as it uses data bootstrapping by random sampling with replacement [60]. The problem with nausea occurrence is that although sensitivity/recall is high (100%)-meaning that all subjects with nausea are correctly classified-the specificity is low (33.3%) so the classifier is not good at discerning those without reported nausea. This may not be caused solely by the RF, as it may also be the consequence of subjective reporting of nausea occurrence. We do not exclude the case that some subjects probably failed to report sickness when it actually happened.
Importance plots should be taken with precaution due to the existence of crosscorrelations among introduced parameters that can influence importance. Feature crosscorrelations cannot influence the RF accuracy [68,69], indicating that differences in the importance plot originate from the SNR influence. Importance plots reveal that SampEntP rose to the top five with the highest importance, indicating that it may be one of the most relevant features (Figure 3). This is rather important as SampEntP could not differentiate between nausea and non-nausea data with a classical statistical approach ( Table 5). The reason for this may be in the fact that the statistical test failed to detect non-linear relations in comparison to the ML algorithm.
Autocorrelation zero-crossing and CS appear only in the first five features in the importance plot for the original dataset, while SampEntT, DF, MagDF, and median also appear within the five most relevant features. Interestingly, RMS, SDV, and features derived from the Poincaré plot (SD1, SD2, and SDEGG) do not appear within the top five features in all importance plots incorporating original and semi-synthetic noisy EGG-based parameters (Figure 3). This may be the consequence of RF which fails to detect their influence or of the fact that the influence of parameters derived from the Poincaré plot is minor in comparison to other EGG-based parameters.

Effect of Noise on Detection of Nausea through Statistical Tests
SpectEnt shows the statistically significant difference between those with and without reported nausea (Table 5) for all positive SNR values. This is not in line with the results reported in Table 2 where this feature remained statistically changed for all SNR values. We can argue that the changes in the signals introduced by nausea occurrence are more dominant in comparison to changes introduced by noise, which makes these two parameters mildly robust to the added colored noise. Similarly, CS does not show any tolerance to noise, but its ability to discern among EGG with and without nausea occurrence is stable for a relatively low noise level (SNR = 17 dB). However, this is only true if p is set to 0.001 for testing normal distribution, but not if p is set to 0.05. Median frequency, DF, autocorrelation zero-crossing, as well as SampEntT and SampEntP for all embedding dimensions m show no statistically significant difference between those features with and without nausea occurrence for the original dataset. The difference that arose with added noise is probably merely coincidental or falsely produced as a result of additive synthetic noise. DF is not affected by the nausea occurrence and p value is much lower than in the previous study that reported results on the same dataset [4]. However, these results cannot be directly compared as different independent variables and different statistical tests were used in the current study.
Transformation of SampEntT for three embedding dimensions m into categorical variables for original data and all noise levels did not produce any significant result yielding to a conclusion that SampEntT should be treated as a numerical variable. Table 6 speaks in favor of such a finding as reported proportions are indecisive for the original data, and what is more convincing is that categorical SampEntT appear very sensitive to SNR levels, which is contrasting to the SampEntT robustness for all embedding dimensions m from Table 2.
Effect size reported by Cd and Cdelta tended to decrease with higher SNRs (Table 5) meaning that the higher noise contamination influences parameter sensitivity to nausea occurrence. In all cases where statistically significant differences are reported in Table 5, absolute effect size parameters ranged from small (>0.2) to large differences (>0.8). In all other cases, differences were rather small (<0.4) or negligible (<0.2), except for the SpectEnt which, despite the large effect size (>0.6), did not reveal statistical significance for higher noise levels (−13 dB and −23 dB).

Limitations of the Study
Although we present a detailed analysis of EGG-based parameters for simulator nausea evaluation, we recognize the following limitations: We use a discrete set of predefined SNRs, and one should note that the actual SNRs were much higher, as our data were already contaminated with noises and artifacts. Despite the linear Butterworth filtering applied in the preprocessing stage, the noise with overlapping frequency content probably remains present in the semi-synthetic EGG dataset. Future efforts towards the generation of synthetic noises would provide a firm basis for exact SNR contamination and more reliable analysis.

2.
It should also be noted that sample entropy scaling parameter r is kept constant at the 0.15 of the noiseless data standard deviation. This value was determined empirically based on the recommendations [24]. Adjusting this value for different SNRs may have a further effect on the results and should be investigated in the future.

3.
We apply procedures for automatic feature calculation. However, a guided visual observation and manual corrections are still considered a gold standard for the evaluation of EGG-based parameters especially in cases of excessive noises [10,70,71]. We use visual inspection only for channel selection. Despite this drawback, we obtained promising results in nausea assessment by both statistical and ML approaches.

4.
We select the embedded dimension m for sample entropy calculation empirically. For future selection and discussion on embedding dimension selection, one may look at outstanding reasoning by Matilla-García et al. [72].

5.
We did not apply unimodal or multi-modal machine learning algorithms, and we do not provide comparison of existing machine learning techniques as in [67]. 6.
Our method is applied only for nausea occurrence. Further customization of presented EGG-based parameters and complementary approach by RF and statistical analysis should yield at assessment of sickness levels similarly as in [67]. 7.
The dataset used for the analysis contains more male than female participants. However, we do not consider this to be a major drawback of our study, as we were not interested in the differences between the genders but focused on the relationships between the occurrence of nausea, the EGG parameters, and noise. Moreover, a systematic review performed by Grassini and Laumann [73] showed conflicting results in published studies focused on determining sex differences in experiencing simulator sickness. 8.
We did not use multi biomarkers for the assessment of sickness occurrence as our focus was solely on the direct assessment of gastric activity. However, future studies should be focused on a promising heterogeneous approach as, for example, suggested by Dennison et al. [67].

Conclusions
The presented results highlighted the importance of appropriate EGG parameters selection when the higher levels of noise are anticipated during driving simulation for nausea detection. Although some EGG-based features are sensitive to the nausea occurrence, they may at the same time be sensitive to the higher noise levels. This is important for the study design of EGG-based nausea detection within driving simulators encompassing haptic frameworks.
Feature engineering and decision making by both machine learning and statistical tests may be fully automated for the future adoption of EGG-based nausea detection. These two approaches are complementary, as ML algorithms benefit from non-linear relations that cannot be revealed by statistical tests such as in the case of sample entropy parameters.
Sample entropy of EGG signals stands out among all other parameters due to its exceptional robustness to the colored noise and due to its ability to differentiate between EGG segments with and without nausea occurrence for signals recorded in the driving simulator. The potential of sample entropy to detect nausea in noisy EGG signals should be further explored in other EGG-related dynamic studies. The assessment of EGG signals with the sample entropy feature may open a door to the scientific experiments that were never conducted before as exaggerated EGG sensitivity to noises and artifacts may not present an obstacle anymore.
Sickness-related nausea detection in driving simulators by EGG-based parameters is an important aspect that could assist in the overall comfort improvement in both simulators and automated vehicles. This study emphasized the importance of proper EGG-based feature selection when dynamic and noisy EGG recording is anticipated (e.g., different levels of driving automation resulting in subject's maneuvers interfacing vehicle commands or in-vehicle infotainment). Additionally, our results revealed the superiority of sample entropy in relation to other parameters and in combination with the RF ML algorithm. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The raw data presented in this study and Matlab code for feature extraction are available on request from the corresponding author. R code and CSV tables with relevant EGG-based parameters are available on Zenodo repository and shared under GNU General Public License [59].