An Immersive Virtual Reality Game for Predicting Risk Taking through the Use of Implicit Measures

Featured Application: The tool presented in this article can be applied as an ecological measure for evaluating decision-making processes in risky situations. It can be used in different contexts from both Occupational Safety and Health practices and for research purposes. Abstract: Risk taking (RT) measurement constitutes a challenge for researchers and practitioners and has been addressed from different perspectives. Personality traits and temperamental aspects such as sensation seeking and impulsivity inﬂuence the individual’s approach to RT, prompting risk-seeking or risk-aversion behaviors. Virtual reality has emerged as a suitable tool for RT measurement, since it enables the exposure of a person to realistic risks, allowing embodied interactions, the application of stealth assessment techniques and physiological real-time measurement. In this article, we present the assessment on decision making in risk environments (AEMIN) tool, as an enhanced version of the spheres and shield maze task, a previous tool developed by the authors. The main aim of this article is to study whether it is possible is to discriminate participants with high versus low scores in the measures of personality, sensation seeking and impulsivity, through their behaviors and physiological responses during playing AEMIN. Applying machine learning methods to the dataset we explored: (a) if through these data it is possible to discriminate between the two populations in each variable; and (b) which parameters better discriminate between the two populations in each variable. The results support the use of AEMIN as an ecological assessment tool to measure RT, since it brings to light behaviors that allow to classify the subjects into high/low risk-related psychological constructs. Regarding physiological measures, galvanic skin response seems to be less salient in prediction models.


Introduction
Risk taking (RT) is a component of the decision-making process in uncertain situations, in which the subject rationally knows the probability of each outcome [1,2]. The decisionmaking process is influenced by three main elements [3,4]: decision features, which are the characteristics of the decision itself, such as the ordering of the choice options [5]; situational factors, which refer to the context of the decision, for example, time pressure [6]; and individual differences, which have been identified as the perception of benefits, the perception of risks and risk attitude in the field of RT [7]. In the first stage of RT process, the subject thinks about the possible positive/negative outcomes of his/her actions before acting [8]. During this process, emotional states have an influence on the weighting of cost-benefit assessment [9], and its relation with RT has been widely studied. On the first hand, it has been suggested that people experiencing positive emotions tend

Measurement of RT
RT measurement constitutes a challenge for researchers and practitioners and has been addressed from different perspectives. To date, most of the theoretical constructs used in RT assessment are based on explicit measures such as self-reports, although these measures have been applied from different points of view. While some authors employ self-reported measures to assess risky-related psychological constructs, such as personality, impulsivity and sensation seeking [34][35][36]; other authors use self-reported daily habits as a measure of RT [8,37]. Alternatively, ref. [38] developed a measure of the tendency to engage in real-life RT behaviors in different domains: ethic, financial, health, recreational and social.
However, self-reported measures present some limitations. On the first hand, with the use of these instruments it is assumed that humans are able to think and verbalize accurately about their attitudes, emotions and behaviors, while it has been demonstrated that most of the brain processes that regulate attitudes, emotions and behaviors are not conscious, and consequently, cannot be verbalized [39][40][41]. On the other hand, questionnaires have an important intrinsic bias since individuals need to remind past situations or imagine future experiences to answer, rather than actually undergoing the experiences that the researchers wish to analyze [42].
To overcome these limitations, the approach of "stealth assessment" [43] emerged focusing on the study of how psycho-cognitive states can be assessed in an ecological, non-intrusive, non-biased way. Studies under this paradigm record subjects' performance during a serious game, and then conclusions are drawn about individual competencies based on the data [44,45]. In the field of RT, the Bechara gambling task (BGT) [46] and the balloon analogue risk task (BART) [34] could be considered the most used measures that aim to assess RT under this methodology. In BGT, participants are given four decks of cards and are asked to choose a card from any one of the four decks. Once a card is chosen, it is turned over, and the amount of money won or lost for choosing that card is revealed. This is repeated for 100 times, and the player is never told the distribution of wins and losses associated with each deck, and instead the distributions are learned from experience. In BART, a balloon is presented in the middle of a screen, and subjects are asked to pump it as much as possible, knowing that it could exploit at any time. At the beginning of the task participants are told that they will obtain a financial reward the more they could inflate each balloon without breaking it. Although the reliability of these tools has been retested [47,48], it has been proved that the correspondence between performance in neuropsychological tests and real-life behaviors is very weak [49][50][51].

Virtual Reality for RT Assessment
Conversely, virtual reality (VR) provides the capacity of simulate real experiences in which subjects can interact as if they were in the real world [52], and there is empirical evidence demonstrating similarities between the neural mechanisms that subjects experience when immersed in a virtual environment and in those real life [53,54]. VR allows to record the behavioral responses of the users while they are interacting with a virtual environment [55], making VR an innovative, effective, active, engaging and adaptive tool that has been applied in numerous fields of human behavior research e.g., [52], providing better results than 2D solutions [56].
VR has emerged as a suitable tool for RT measurement, since it enables the exposure of a person to realistic risks, allowing embodied interactions, the application of stealth assessment techniques and physiological real-time measurement [57]. On the basis of this, we developed the spheres and shield maze task (SSMT) [58], a virtual environment for RT measurement. It consisted in an out-of-context maze, through which participants had to pass from start to finish before three minutes, accumulating as much "karma" as possible by collecting spheres down the road. Participants could lose "karma" if they were attacked by a risk. Furthermore, participants had the option of activating a shield, which protected them from the risks. This virtual environment supposed a first approach for the measurement of the risk-related constructs sensation seeking and impulsivity, although it presented two main limitations. First, the practice session was too short and insufficient. Second, it measured only three variables: "karma", distance covered and shield use, ignoring real-time behavioral and psychophysiological measures. In this article, we propose an enhanced version of the SSMT, by which the authors intend to overcome these issues.

Implicit Measures in VR
The interactions of the users with the virtual environment can be also studied by the analysis of their gaze movements, which have shown to be related to information processing in risky decisions [59] and problem solving [60]. The eye tracking (ET) measure can be integrated into a VR set-up, in order to record fixations and eye movements during an experience in a virtual environment. This technology has been applied in combination with VR for the study of the influence of contextual elements in human behavior, such as in street robbery [61], identifying if the presence of particular components of a physical space can influence in decision-making. Furthermore, ET has been employed to study whether if exists a relationship among gaze patterns and human behavior [62,63], or even if these gaze patterns could contribute to predict humans' decisions [64].
In the field of RT, ET has been employed as a reliable indicator of information processing patterns in risky decisions. On the first hand, greater number of fixations, longer fixations and larger quantity of available information fixed have been related to deeper predecision processes, which lead to risk aversion [65][66][67][68]. On the other hand, in a study with construction workers, lower dwell time was connected with a higher risk perception [69]. The authors interpreted this result as follows: participants with higher risk perception identified the hazards rapidly, so they could spend their time searching other possible hazards present in the situation.
In addition to behavioral measures, physiological measures have been proposed as implicit measures of human behavior [42]. Galvanic skin response (GSR) has been successfully used as a measure of implicit processes such emotional arousal [70], which plays a decisive role in the decision-making process. GSR has been employed in combination with VR to evaluate the stress generated by changes in contextual aspects, such as architectural stimuli [71], as predictor of anxiety level [72] and as a measure to discriminate between Autism Spectrum Disorder and typical development populations [73], among others.
In the field of RT, high physiological arousal acts as a "warning signal" in risky situations and tends to lead to safe decisions [1]. This relationship has been demonstrated to be mediated by emotional intelligence, such a way that, low emotional intelligence may lead to maladaptive decision-making, due to an impaired interpretation of physiological arousal [74]. Additionally, situational factors, such as time pressure, have an influence on the relationship between GSR and RT. In an experiment with two kind of decisions (time pressure and time delay), the relationship between GSR and RT was positive in situations under time pressure, and negative in situations under time delay [75].
Despite these measures having been widely adopted in VR-based experiments, to our knowledge, ET and GSR have not been employed in combination with VR to evaluate RT.

The Current Study
Starting from these premises, we present the assessment on decision making in risk environments (AEMIN) tool, as a new interactive virtual environment for RT measurement. Compared to the SSMT, AEMIN has longer duration, which allows a wider and enriched recording of information from the subjects, and contains more elements along the maze, such as spheres of different colors and a pause button. Furthermore, features in AEMIN were rated depending on whether the subject was in a risk zone or in a no risk zone, to provide further information about the subjects behavior depending on the situation. Additionally, the appearance and characteristics of the risks have been improved, in order to provide a more natural experience and consequently, more natural behaviors. A detailed description of AEMIN is provided in the Materials and Methods section.
The main aim of this study is to discriminate participants with high versus low scores in the measures of neuroticism, extraversion, openness to experience, agreeableness, conscientiousness, sensation seeking and impulsivity, through their behaviors and physiological responses during playing AEMIN. Applying machine learning (ML) methods to the dataset we explored: (a) if through these data it is possible to discriminate RT domains, sensation seeking and impulsivity, allowing to qualitatively determinate a general level of RT for each subject; and (b) which parameters better discriminate between the two populations in each variable.

Participants
A group of 98 subjects was recruited to participate in the experiment. They were balanced in terms of gender (56 men and 55 women) and age (35% under 30, 35% among 30-45, 30% above 45; mean age = 37.08, SD = 10.91). Prior to their participation, they received documentary information on the study and gave their written consent for their involvement. The responses were anonymized and randomized to ensure the privacy of the information. The study obtained the ethical approval of the Ethical Committee of the Polytechnic University of Valencia (P4_18_06_19).

Self-Reported Measures
The risk-related constructs were measured by means of a battery of self-reported measures: Personality: Spanish version of the NEO five-factor inventory (NEO-FFI). This comprises 60 items and is composed by the factors neuroticism, extraversion, openness, conscientiousness and agreeableness [76,77]. The reliability coefficients' Cronbach's alphas ranged from 0.75 to 0.83. The internal consistency of the scales in the present study was: neuroticism α = 0.77; extraversion α = 0.85; openness α = 0.79; agreeableness α = 0.75; conscientiousness α = 0.84.
Sensation seeking: Spanish version of the 40-item Sensation Seeking Scale-V (SSS-V) [78,79]. This includes subscales for thrill and adventure seeking, experience seeking, disinhibition and boredom susceptibility, and a total sensation seeking score. The reliability coefficients' Cronbach's alphas ranged between 0.67 and 0.81, which suggests the subscales have acceptable internal consistency. The internal consistency of the scale in the present study was 0.77. Impulsivity: Short Spanish version of the UPPS-P impulsive behavior scale [32,80]. Composed of 20 items, this measures five impulsivity traits: negative urgency, lack of premeditation, lack of perseverance, sensation seeking and positive urgency. The Cronbach's alphas coefficients ranged from 0.66 to 0.81. The internal consistency of the scales in the present study was: negative urgency α = 0.72; lack of premeditation α = 0.77; lack of perseverance α = 0.78; sensation seeking α = 0.79; positive urgency α = 0.60.
As a measure of the sense of presence in the virtual environment, participants responded the Sense of Presence Inventory, which is composed by the dimensions of spatial presence, engagement, ecological validity and negative effects (ITC-SOPI) [81]. Cronbach s alphas coefficients in ITC-SOPI ranged from 0.76 to 0.94. The internal consistency of the scales in the present study was: spatial presence α = 0.91; engagement α = 0.84; ecological validity α = 0.77; negative effects α = 0.86.

The Virtual Environment
We present the assessment on decision making in risk environments (AEMIN) tool, as a new interactive virtual environment for RT measurement. As an extension of the SSMT [58], AEMIN is an interactive virtual environment that is composed by two mazes that participants must pass through from start to finish before the allocated time expires without (virtually) hurting themselves (see Figure 1a). One of the mazes must be solved individually, while in the other one the subject is accompanied by four avatars. The avatars are represented by robots (see Figure 1b), which can express basic emotions through a screen located on their faces.
Participants have 10 min to negotiate each maze and they are instructed to accumulate as much energy as possible, since it is the source of life of their avatar. If a robot is poor of energy, it shows dying breathing and its movements are slower, which implies a waste of time to find the exit of the maze. There are green spheres distributed throughout the maze, which earn participants energy if they collect them. Furthermore, participants can lose energy if they are attacked by a risk. These risks are also distributed throughout the maze and are of four types: bridges, swarms of insects, storms and haunted rooms. Some spheres are close to hazards and others are located in no-risk zones. Participants have the option of activating a shield, which protects them from the risks. When the shield is active, the user's speed is reduced and (s)he cannot collect any spheres. The shield is a finite resource that subjects need to optimize. While passing through the maze, the participants have information about the remaining time (orange circle in Figure 1), their level of energy (green circle in Figure 1), and the battery life of the shield (blue circle in Figure 1). Table 1 shows a brief description of each risk and the consequences of each one for the robots. In addition, there are some purple spheres hidden in some endless roads. Catching one of these purple elements can take uncertain effects, such as simplifying the route or subtracting 10 s to the participant. The game can be paused by the participant at any time, so that (s)he is moved to a virtual relaxing room, until (s)he is ready to return to the game. The reason why we included this virtual room is that the use of it by the participant can be considered as an inhibition strategy, and as an indicator of emotional self-control. The navigation metaphor is indirect walking, in which pushing down on the controller's integrated touchpad moves the user s avatar in the direction (s)he is facing at 2 m/s (speeds above 3 m/s. can increase cybersickness symptoms [82]). Before undertaking the AEMIN game, the participants underwent a guided practice session in which they learned how to travel through the virtual environment, how to collect spheres and how to activate the shield. Participants have 10 min to negotiate each maze and they are instructed to accumulate as much energy as possible, since it is the source of life of their avatar. If a robot is poor of energy, it shows dying breathing and its movements are slower, which implies a waste of time to find the exit of the maze. There are green spheres distributed throughout the maze, which earn participants´ energy if they collect them. Furthermore, participants can lose energy if they are attacked by a risk. These risks are also distributed throughout the maze and are of four types: bridges, swarms of insects, storms and haunted rooms. Some spheres are close to hazards and others are located in no-risk zones. Participants have the option of activating a shield, which protects them from the risks. When the shield is active, the user's speed is reduced and (s)he cannot collect any spheres. The shield is a finite resource that subjects need to optimize. While passing through the maze, the participants have information about the remaining time (orange circle in Figure 1), their level of energy (green circle in Figure 1), and the battery life of the shield (blue circle in Figure  1). Table 1 shows a brief description of each risk and the consequences of each one for the robots. In addition, there are some purple spheres hidden in some endless roads. Catching one of these purple elements can take uncertain effects, such as simplifying the route or subtracting 10 s to the participant. The game can be paused by the participant at any time, so that (s)he is moved to a virtual relaxing room, until (s)he is ready to return to the game. The reason why we included this virtual room is that the use of it by the participant can be considered as an inhibition strategy, and as an indicator of emotional self-control. The navigation metaphor is indirect walking, in which pushing down on the controller's integrated touchpad moves the user´s avatar in the direction (s)he is facing at 2 m/s [speeds above 3 m/s. can increase cybersickness symptoms [82]]. Before undertaking the AEMIN game, the participants underwent a guided practice session in which they learned how to The virtual environment was developed in Unity (version 2018.4.1f1) using c# as programming language. Participants performed the AEMIN game using the HTC Vive Pro-eye head mounted display 1 , with 2880 × 1600 pixels (1440 × 1600 per eye), a field of view of 110 • degrees, working at 90 Hz refresh rate. The ET data were obtained from the Unity VR through the ET SDK (SRanipal), with a maximum frequency of 120 Hz and an accuracy of 0.
Galvanic Skin Response (GSR) is also recorded in the experimentation. Data was collected with the Shimmer3 GSR sensor2, sampled at 128 Hz. We measured skin conductance between two reusable electrodes attached to human fingers.
The computer used was an Intel Core i7-770 CPU 3.60 GHz with an NVIDIA GeForce GTX 1070.

Risk Description Consequences Bridge
Walkway that allows the robots to cross from one side to another to continue the path. Participants can cross it as many times as they like in both directions.
If a robot falls into the pit, it will lose part of the battery of its shield. Additionally, the robot reappears at the beginning of the bridge, and this supposes a little time to cross again.

Swarm of insects
Swarm of flying insects that flits over an area of the maze.
In an insect bites a robot, it will suffer blurred vision a few seconds later, which supposes a little time to recover the normal vision. Furthermore, this makes the robot to lose energy.

Storm
In some areas of the maze the weather is stormy. If a lightning strikes a robot, it will suffer a large loss of energy.

Haunted room
Room that becomes increasingly smaller when someone enters it. The room has an enter and an exit door and participants can cross them as many times as they like, in both directions. Participants are asked to catch the key inside the room to open the doors.
Opening the doors is an investment of time.
The virtual environment was developed in Unity (version 2018.4.1f1) using c# as programming language. Participants performed the AEMIN game using the HTC Vive Proeye head mounted display 1 , with 2880 × 1600 pixels (1440 × 1600 per eye), a field of view of 110° degrees, working at 90 Hz refresh rate. The ET data were obtained from the Unity VR through the ET SDK (SRanipal), with a maximum frequency of 120 Hz and an accuracy of 0.5°−1.1°.
Walkway that allows the robots to cross from one side to another to continue the path. Participants can cross it as many times as they like in both directions.
If a robot falls into the pit, it will lose part of the battery of its shield. Additionally, the robot reappears at the beginning of the bridge, and this supposes a little time to cross again.

Risk Description Consequences Bridge
Walkway that allows the robots to cross from one side to another to continue the path. Participants can cross it as many times as they like in both directions.
If a robot falls into the pit, it will lose part of the battery of its shield. Additionally, the robot reappears at the beginning of the bridge, and this supposes a little time to cross again.

Swarm of insects
Swarm of flying insects that flits over an area of the maze.
In an insect bites a robot, it will suffer blurred vision a few seconds later, which supposes a little time to recover the normal vision. Furthermore, this makes the robot to lose energy.

Storm
In some areas of the maze the weather is stormy. If a lightning strikes a robot, it will suffer a large loss of energy.

Haunted room
Room that becomes increasingly smaller when someone enters it. The room has an enter and an exit door and participants can cross them as many times as they like, in both directions. Participants are asked to catch the key inside the room to open the doors.
Opening the doors is an investment of time.
The virtual environment was developed in Unity (version 2018.4.1f1) using c# as programming language. Participants performed the AEMIN game using the HTC Vive Proeye head mounted display 1 , with 2880 × 1600 pixels (1440 × 1600 per eye), a field of view of 110° degrees, working at 90 Hz refresh rate. The ET data were obtained from the Unity VR through the ET SDK (SRanipal), with a maximum frequency of 120 Hz and an accuracy of 0.5°−1.1°.
Swarm of flying insects that flits over an area of the maze.
In an insect bites a robot, it will suffer blurred vision a few seconds later, which supposes a little time to recover the normal vision. Furthermore, this makes the robot to lose energy.

Risk Description Consequences Bridge
Walkway that allows the robots to cross from one side to another to continue the path. Participants can cross it as many times as they like in both directions.
If a robot falls into the pit, it will lose part of the battery of its shield. Additionally, the robot reappears at the beginning of the bridge, and this supposes a little time to cross again.

Swarm of insects
Swarm of flying insects that flits over an area of the maze.
In an insect bites a robot, it will suffer blurred vision a few seconds later, which supposes a little time to recover the normal vision. Furthermore, this makes the robot to lose energy.

Storm
In some areas of the maze the weather is stormy. If a lightning strikes a robot, it will suffer a large loss of energy.

Haunted room
Room that becomes increasingly smaller when someone enters it. The room has an enter and an exit door and participants can cross them as many times as they like, in both directions. Participants are asked to catch the key inside the room to open the doors.
Opening the doors is an investment of time.
The virtual environment was developed in Unity (version 2018.4.1f1) using c# as programming language. Participants performed the AEMIN game using the HTC Vive Proeye head mounted display 1 , with 2880 × 1600 pixels (1440 × 1600 per eye), a field of view of 110° degrees, working at 90 Hz refresh rate. The ET data were obtained from the Unity VR through the ET SDK (SRanipal), with a maximum frequency of 120 Hz and an accuracy of 0.5°−1.1°.
In some areas of the maze the weather is stormy.
If a lightning strikes a robot, it will suffer a large loss of energy.

Risk Description Consequences Bridge
Walkway that allows the robots to cross from one side to another to continue the path. Participants can cross it as many times as they like in both directions.
If a robot falls into the pit, it will lose part of the battery of its shield. Additionally, the robot reappears at the beginning of the bridge, and this supposes a little time to cross again.

Swarm of insects
Swarm of flying insects that flits over an area of the maze.
In an insect bites a robot, it will suffer blurred vision a few seconds later, which supposes a little time to recover the normal vision. Furthermore, this makes the robot to lose energy.

Storm
In some areas of the maze the weather is stormy. If a lightning strikes a robot, it will suffer a large loss of energy.

Haunted room
Room that becomes increasingly smaller when someone enters it. The room has an enter and an exit door and participants can cross them as many times as they like, in both directions. Participants are asked to catch the key inside the room to open the doors.
Opening the doors is an investment of time.
The virtual environment was developed in Unity (version 2018.4.1f1) using c# as programming language. Participants performed the AEMIN game using the HTC Vive Proeye head mounted display 1 , with 2880 × 1600 pixels (1440 × 1600 per eye), a field of view of 110° degrees, working at 90 Hz refresh rate. The ET data were obtained from the Unity VR through the ET SDK (SRanipal), with a maximum frequency of 120 Hz and an accuracy of 0.5°−1.1°.
Room that becomes increasingly smaller when someone enters it. The room has an enter and an exit door and participants can cross them as many times as they like, in both directions. Participants are asked to catch the key inside the room to open the doors.
Opening the doors is an investment of time.

Experimental Procedure
Each participant responded to the self-report questionnaires on a personal computer. The process took approximately 30 min, and was completed in an experimental room, supervised by a research assistant. The subject was thereafter conducted to a second experimental room where (s)he received a brief contextualization of the VR game. Consecutively, the research assistant equipped the participant with the GSR device and the HMD system in the correct position. After a calibration process of the eye tracking apparatus, the subject was asked to sit and relax during 90 s in order to record a GSR baseline. During this period, the subject listened to a relaxing audio to create a common state of calm. After that, the subject stood up and completed the practice session, which included a brief presentation of the avatars. Hereafter, the participant solved the two mazes (50% of the participants began by the individual scene, and the other 50% started by the group of avatars). Finally, the subjects responded to the presence questionnaires in a personal computer.

Data Processing
The virtual environment (VE) is divided in two areas: risk zone and no risk zone. The defined risk zone areas correspond to the situation where the subject is inside a risk such as bridge, swarm of insects, storm and haunted room. The no risk zone is defined for the situations where the subject is not inside of a risk zone. According with this division, we analyzed two groups of variables: (a) measures in risk zones; and (b) measures in no risk zones. The features were divided depending on the source of data where have been computed. Three different sources of data were established: VR, ET and GSR. Table 2 summarizes the complete set of features that was used from each source. Features study from VR data are divided between navigation and interaction features. The navigation part obtains a set of features related with the trajectory of the subject in the maze whereas the interaction features counts the number of times that the subject uses or touch some element in the maze.
ET data was processed in order to obtain a classification between fixations and saccades, using the dispersion threshold (DT) algorithm with 1 • as a dispersion threshold and 0.25 s as a time window threshold [83]. A complete set of features was obtained from the classification between fixation and saccade.
Before the obtainment of features from GSR, two previous steps were done. The first of them was the manual cleaning of the signal. Commonly, GSR signal could suffer from different types of noises that hide correlations between the signal of the subject and its level of stress [84]. The manual correction was done using Ledalab 3 software in MATLAB. The second step was the division of the signal into phasic and tonic components using continuous decomposition analysis (CDA) [85]. After this pre-processing, a set of features was obtained from the raw signal and the phasic and tonic components including time and non-linear domain analysis [86].
In order to approach a classification problem, the target variables were divided in two groups: high score and low score. The division was done according with the normality of the distribution of each target variable. If the distribution was normal, the target was segmented by the mean target value, whereas if the target distribution was not normal, the target was segmented by the median. The significance level between groups in each target variable was checked through a statistical t-test in features with normal distribution and Mann-Whitney for features without a normal distribution.

Statistical Analysis
Firstly, a multivariate outlier detection was performed by group of variables (VR, ET and GSR). Mahalanobis distance between every subject and the probability that it belongs to a Chi-square distribution was calculated. Subjects that belonged to the most extreme 1% of the data distribution were defined as outliers.
Some pre-processing steps were done before the modelling study. The variables with a Pearson-correlation higher than 0.95 in absolute value were removed. After that, no-normal feature distributions were transformed using logarithms. The variables which after this transformation were normal distributed keep the transformation, the ones that were not normal were not transformed.
A ML method was applied to find the best possible selection of features that classify whether the subject have a high or low score in the studied target variables. The used model was a support vector machine (SVM) [87]. The pipeline for the modeling of the data is equal for every target.
The pipeline is designed to find the best possible features to explore the importance of each one in combination with the rest of features. To address this goal, the ML pipeline removes iteratively the feature which achieves the lowest accuracy for each model in the iteration. Iteration k, computes the mean accuracy in a cross-validation (CV) of 10 folds and 2 repetitions. After that, a backward feature selection (BFS) [88] method removes one feature selecting the set of k-1 features with highest accuracy. This method also uses a CV with 10 folds and 2 repetitions. The process ended-up when only one feature remains. The set of features with highest accuracy are selected. After that, an hyperparameter tunning is performed to the SVM. Finally, the model is validated in a CV of 10 folds with 4 repetitions. The average and standard deviation of the metrics accuracy, kappa, true positive ratio (TPR) and true negative ratio (TNR) were reported. Moreover, the experiment explored the importance of each group of features using four different sets based on the source. Three datasets including VR, GSR and ET features respectively were created. Moreover, an additional dataset which is called ALL that joins all the features was included.
To check the overfitting of the ML pipeline, the obtained results are compared against the ones obtained from a generated random target. The unique condition imposed to the generation of this random target is that it must have a coincidence in its labels lower than a 67.5%, compared with the rest of the real targets, in order to avoid a random target very similar to a real one. The objective is to compare, according with a one-way ANOVA test, the statistical distribution of the set of accuracies obtained from the last CV of the ML pipeline of the random target and each real target. Six random targets are generated to extend the number of accuracy samples from the random targets. Figure 2 shows a scheme of the ML pipeline used and the overfitting check method exposed. If the comparison between both distributions shows a statistical difference (p-value < 0.05), it supports that the ML pipeline is over the chance level. Finally, the dataset with highest accuracy is reported as the best classification model.
Regarding the presence questionnaires, mean and standard deviation for each dimension were calculated. Regarding the presence questionnaires, mean and standard deviation for each dimension were calculated.

Results
From the initial 98 set of subjects, 10 of them were removed due to the not properly collection of data. A total of 88 subjects were processed properly (43 women, 45 men, mean age= 35.33 and SD = 10.50) (for further details, please see Supplementary Material). Outlier studies were performed by data source, dividing between VR, ET and GSR. Finally, 3 outliers were found for the data-source of ET, whereas any outlier was found for VR and GSR. The final dataset, without outliers, had in total 85 subjects (42 women, 43 men, mean age = 35.49 and SD = 10.64).
The 93.33% of the target variables were normal distributed whereas, only one target variable, Thrill and adventure seeking, which represents the 6.67%, was not. All the target variables present statistical differences between the high and low groups. Table 3 shows the statistical description of every subscale.

Results
From the initial 98 set of subjects, 10 of them were removed due to the not properly collection of data. A total of 88 subjects were processed properly (43 women, 45 men, mean age= 35.33 and SD = 10.50) (for further details, please see Supplementary Materials). Outlier studies were performed by data source, dividing between VR, ET and GSR. Finally, 3 outliers were found for the data-source of ET, whereas any outlier was found for VR and GSR. The final dataset, without outliers, had in total 85 subjects (42 women, 43 men, mean age = 35.49 and SD = 10.64).
The 93.33% of the target variables were normal distributed whereas, only one target variable, Thrill and adventure seeking, which represents the 6.67%, was not. All the target variables present statistical differences between the high and low groups. Table 3 shows the statistical description of every subscale.
A total of 4 features were removed due to its no variation between subjects. 31 features (20.53%) were correlated above 0.95 in Pearson coefficient. These variables were removed from the dataset. Moreover, 16 variables were transformed using logarithms. The final dataset ended-up with a total of 120 features were 42 belong to the VR, 60 to ET and 18 to GSR. Table 4 presents the best models obtained by the ML pipeline, according with the dataset used, the balance of the sample, the significant level between the target variable and the generated random distribution of target variables and four different metrics such as accuracy, kappa, TPR and TNR.
According with Table 4, neuroticism, extraversion, openness, thrill and adventure seeking, experience seeking, disinhibition, boredom susceptibility, negative urgency, lack of premeditation and positive urgency, have been well recognized since their accuracy shows statistical differences with random models. On the other hand, agreeableness, conscientiousness, sensation seeking (overall), lack of perseverance and sensation seeking have not been recognized over the chance level. The data source ALL appears 10 (66.67%) times as the data source with highest accuracy, VR data source 4 (26.67%) times and ET once (6.67%). Table 5 summarizes the selected features for each model.

Discussion
In this article, we present the assessment on decision making in risk environments (AEMIN) tool, as a new interactive virtual environment for RT measurement. The main aim of this study is to discriminate participants with high versus low scores in the measures of personality, sensation seeking and impulsivity, through their behaviors and physiological responses during playing AEMIN. Applying ML methods to the dataset we explored: (a) if through these data it is possible to discriminate between RT domains, allowing to qualitatively determinate a general level of RT for each subject; and (b) which parameters better discriminate between the two populations in each variable.
The results are discussed by sections: (1) accuracy of the models to discriminate RT domains; (2) the influence of the features used in each model selected; (3) limitations and further studies; (4) conclusion.

Personality Recognition
Regarding the final models on personality recognition, the dimensions of neuroticism, extraversion and openness to experience have been properly recognized. The validation set using 88 subjects achieved 72.6% accuracy (kappa: 0.447), 75.4% accuracy (kappa: 0.506) and 70.8% accuracy (kappa 0.402) respectively. The selected models for predicting agreeableness and conscientiousness have not overcome the chance level.
Interestingly, these results show that neuroticism, extraversion and openness to experience are the better predicted personality dimensions. On the first hand, neuroticism has been related to negative affect and sensitivity to punishment [15], but its relationship with RT seems to be more complex and context-related. Therefore, although high levels of neuroticism may lead to risk aversion in most domains, as a way of avoiding guilt or anxiety about negative outcomes, the relation between neuroticism and RT seems to be inverse in the health domain [18], in which some studies identified a tendency to take risks to alleviate anxiety and other emotions in subjects with high neuroticism [89]. On the other hand, high extraversion and openness to experience have been related to risk approach across domains, due to a generalized need for stimulation and cognitive risk seeking, acceptance of experimentation, tolerance of uncertainty, change and innovation [14,20]. In the light of these findings, we could conclude that suitably our tool brings out the personality dimensions most context-dependent and related to the approach to risk, and not so much those related to general risk avoidance.

Sensation Seeking Recognition
Regarding the final models on sensation seeking recognition, the dimensions of experience seeking, thrill and adventure seeking, boredom susceptibility and disinhibition were predicted with robust models. The validation set achieved 73.3% accuracy (kappa: 0.456), 72.6% accuracy (kappa: 0.311), 72.1% accuracy (kappa: 0.425) and 73.1% accuracy (kappa: 0.402), respectively. The selected model for predicting overall sensation seeking score seemed to be overfitted.
These results demonstrate that AEMIN is a suitable tool to measure sensation seeking. As mentioned in previous sections, there is a great consensus in the literature regarding the influence of each of the sensation seeking subdimensions on RT [21][22][23][24][25][26], so we consider that AEMIN meets the expectations in this regard.

Impulsivity Recognition
Regarding the final models on impulsivity recognition, the dimensions of negative urgency, lack of premeditation and positive urgency were predicted with robust models. The validation set achieved 77.5% accuracy (kappa: 0.553), 75.1% accuracy (kappa: 0.5) and 70.8% accuracy (kappa: 0.341), respectively. The selected models for predicting lack of perseverance and sensation seeking seemed to be overfitted.
In this case, three of the five subdimensions of impulsivity were well predicted. Interestingly, negative and positive urgency, which are related to context-related behaviorswhen facing negative/positive situations [32]-are included. This result suggests that with AEMIN we could identify RT behaviors in widely varying situations, encompassing negative and positive contexts.

Influence of VR Features
Regarding VR variables, the results show that navigation variables, which are related to the movements of the subject in the virtual environment, seem to be more meaningful in risky zones; while interaction variables, which are related to the interactions of the subjects with the different elements of the virtual environment (buttons and virtual elements), seem to be more relevant in no risk zones. Results of the presence questionnaires are similar, or even better, to those obtained in other works [90,91].
Starting with the navigation variables in risk zone, the results show that the time spent in risk zone has a strong influence on the prediction of variables of personality, impulsivity and sensation seeking. Longer time spent in a risk zone could mean either that the subject has passed through these areas slower, or that (s)he has passed through them a greater number of times. In any case, this may be related to the higher/lower susceptibility to punishment or to negative consequences. The neuroticism and negative urgency variables, in which time spent in risk zone appears as important predictor, are related to the sensitivity to punishment or to negative stimuli [15,32], so this would explain the relationship with the time spent in risk zone in our virtual environment. On the other hand, it also appears as important for the classification of the subjects in thrill and adventure seeking and positive urgency variables, together with the number of visits to each risk. In these cases, it is possible that subjects with greater interest in risky physical activities or with impulsive behaviors when facing situations perceived as positive decide to experiment and spend more time in these risk areas, to see what the consequences are.
The variable of distance covered in risk zone refers to the length of the subject s trajectory in the risk zones of the maze. This variable appears as important in the classification of the subjects in the variables of thrill and adventure seeking and disinhibition, both belonging to the dimension of sensation seeking. This variable was also measured in the previous version of AEMIN [58], and significant correlations were obtained with almost all sensation seeking subdimensions, so our results in both articles seem to be consistent. Covering a greater distance in AEMIN could be interpreted as a greater interest in exploring different areas of the maze, which could be related to the variables thrill and adventure seeking and disinhibition, since both of them are reflected in high engagement in activities that generate new sensations and in rule-breaking behaviors [21].
Velocity and acceleration in risk zones appear as important variables in predicting lack of premeditation, experience seeking, and disinhibition. The action of quickly passing through the risk areas, without stopping to pick up spheres could have a double interpretation. On the one hand, it can be understood as an unpremeditated or risky action. Instead, it could also be interpreted as an intention to pass something bad as quickly as possible, avoiding the possible damage that could be caused by passing through a risk area.
As for the interaction variables in risk zones, the number of green spheres collected in risk zones helps to classify subjects into extraversion and thrill and adventure seeking subdimensions. Picking up spheres that are in risk areas can pose a risk, since the subject must pass through these areas without the protection of the shield to pick them up. Therefore, the decision to take a sphere that is in a risk zone may be related to excitement seeking-which is characterized by an interest for shinny colors and noisy environments [92] -a common feature of the extraversion and thrill and adventure seeking dimensions [93].
The use of the shield only influences in risk zones, to classify subjects in terms of boredom susceptibility. This result seems surprising, since it was expected that the use of the shield would be a somewhat more revealing variable in terms of the behaviors of the subjects, which would add richness to the predictive models of a greater number of variables. In the SSMT [58], the use of the shield was related to subdimensions of impulsivity and sensation seeking, for which similar results were expected in the predictive models of the present article. One possible reason could be that participants did not fully understand the mechanics of the shield and did not use it enough to reflect certain behavioral patterns. This will be considered as one of the limitations of this research, and we will work to improve the understanding of the shield element in enhanced versions of AEMIN.
The use of the pause button in risk situations appears as an important variable in the prediction of negative urgency. The use of the pause button in risky situations may reflect a strategy of psychological distancing from negative stimuli, while the non-use of this resource may be due to thoughtless reactions to risky situations. This could have a strong relationship with the negative urgency variable, defined as the tendency to show impulsive behaviors in negative situations [33].
Regarding the interaction variables in no risk zone, the number of purple spheres collected appears as significant to classify the subjects in high/low lack of premeditation, experience seeking and disinhibition. These purple spheres were included in the virtual environment as elements that generate uncertainty, so collecting these spheres is clearly a risky behavior, which can be taken due to a lack of premeditation, or due to the voluntary search for new experiences or sensations.
The use of the pause button in no risk zone is meaningful for the prediction of thrill and adventure seeking. The use of this button in non-risk areas may be related to wanting to rest from the experience in general or to being curious to try it, and not so much to applying a specific coping technique in a specific moment of stress as occurs in the risk areas.
Total interactions with elements in no risk zones appears as an important variable for predicting neuroticism, negative urgency and experience seeking. A greater or lesser number of interactions with the elements of the virtual environment can be related to very different behaviors or decisions, since in the virtual environment there are very different elements, from the shield, to the spheres or the pause button. What the total interactions variable can be an indicator of, is perhaps a greater or lesser involvement of the subject within the virtual environment, as well as a better understanding of the mechanics of the game. On the other hand, it can also be related to anxious or impulsive behaviors, as well as the search for different experiences and the desire to explore the virtual environment.

Influence of ET Features
Our results show that ET variables have a strong influence in most of the classification models, so we could say that it is an important measure in combination with those variables of the VR dataset, both in risk zones and in no risk zones. The variables that provide the most relevant information to classify the subjects in terms of risk-related dimensions are: fixation duration, number of fixations in no risk zones, visits to keys, green spheres and purple spheres, angular saccade distance, velocity in saccades and distance in saccades.
As mentioned in previous sections, fixations duration could be an indicator of depth of processing [94], and a good predictor of perception [95] and risk aversion [65]. The fixations duration in risk zones appeared as a meaningful variable to classify subjects in boredom susceptibility. This result suggests that participants with high boredom susceptibility show different information processing patterns in risk zones than those with low boredom susceptibility, since these areas arouse a different interest in them, taking them out of the routine of the game. Conversely, fixations duration in no risk zones was an important variable in the classification models of neuroticism, extraversion and experience seeking. In these cases, a deeper processing of information in no risk areas can be interpreted as a state of alert, waiting for something bad to happen, or as a search for new or different elements in areas that apparently are simpler and show a smaller number of stimuli than risk zones.
Regarding the number of fixations and visits to concrete objects, these variables are an indicator of interest in concrete elements [94] and have been related to risk aversion, as a strategy to collect information in the analytical pre-decision process [65][66][67]. Spheres and keys that open the doors are the most important elements when studying the number of fixations in AEMIN. The number of fixations in green spheres located in risk zones appears as an important variable in the classification of subjects regarding neuroticism, while the number of fixations in green spheres in no risk zones is related to lack of premeditation. On the other hand, the number of fixations on keys is a fundamental variable in the classification of the subjects in neuroticism and experience seeking. Finally, the number of fixations in purple spheres is important to classify as high or low neuroticism. In light of these results, we could understand that the elements that can be captured or collected during the game (green spheres, purple spheres and keys) are the most relevant when analyzing the number of fixations. On the other hand, other elements of the game that seem more visually striking and of greater interest within the game, such as risks, do not appear as meaningful variables from the point of view of the number of fixations. Interestingly, these results could help game designers to guide the user's attention to specific elements of the virtual environment, incorporating the interaction of "collecting", as a guarantee that visual patterns related to personality, impulsivity and sensation seeking will come to light.
Angular saccade distance and distance in saccades could discriminate global and focal visual search strategies [94,96]. These variables appear as relevant for classification in risk zones, for predicting extraversion, openness, boredom susceptibility and disinhibition. These variables are also meaningful for the classification in boredom susceptibility when the subject is in a no risk zone. These results could indicate that subjects visual search patterns in risk zones can help to classify them in high or low extraversion, openness and disinhibition. On the other hand, to differentiate subjects with high or low boredom susceptibility, it is necessary to study their visual search strategy throughout the experience, both in risk areas and in no risk areas.
Finally, the velocity in saccades is an indicator, together with the number and duration of fixations, of an adaptive attention process, depending on the uncertainty or the perceived difficulty of each situation, so that slower saccades have been related to information acquisition processes in situations perceived as uncertain or difficult [68,97,98]. In our study, the velocity in saccades appears as an important variable in risk areas for classifying subjects in lack of premeditation, while it is meaningful in no risk areas for classifying subjects in extraversion, openness, boredom susceptibility and disinhibition. This result could be interpreted as follows: the velocity in saccades in no risk areas, as a behavior dependent on the perception of difficulty or uncertainty of a situation, is an indicator of the subjects interpretation of the no risk zones, based on their level of extraversion, openness, boredom susceptibility, and disinhibition. Thus, it is possible that some participants were in a high alert state while passing through these areas, since they identified them as of low certainty, while other subjects crossed these areas with the feeling of being in a safe place.

Influence of GSR Features
The variables obtained from GSR while the subject was in the no risk areas were relevant in the final models, while they were not relevant when the subject was in risk areas. All the GSR variables selected in the final models correspond to metrics of the phasic component of the signal, which is characterized by rapid and event-related changes, so it takes less time to show changes [99]. The GSR signal usually peaks between 2 and 10 s after stimulation and recovers at approximately the same rate [99]. Ayata et al. [100] found that a 3-s time window in the phasic signal is the most optimal for the prediction of valence and arousal. Since the periods of time in which subjects usually remain in risk areas are short (between 1 and 8 s) except in rooms, where they can spend more time, it is possible that changes in the phasic signal, which can be interpreted as a "warn" in risky situations [1], are reflected few seconds after the subject has left the risk areas.
Another possible interpretation of these results is that changes in the phasic component are meaningful to differentiate subjects with high/low extraversion, boredom susceptibility and disinhibition based on their level of activation in the no risk zones. In this regard, decisions in no risk zones present less time pressure, and they are of the type: selection of paths or decision of whether to take spheres or not. Since time pressure has been raised as one of the influential factors in the relationship between GSR and RT [75], it is possible that, in these decisions in which there are no situational biases, decision-making is more guided by individuals personality and temperamental factors than in risk zones.

Limitations and Further Studies
We acknowledge that this study presents some methodological limitations. First, the sample size was not large. Second, we built the high/low target variables basing on the mean or median results of the responses from this study, so it may not be extrapolated to the rest of the population. Third, it could be possible that participants did not fully understand the mechanics of the shield and did not use it enough to reflect certain behavioral patterns. For future investigations, we will recruit a larger sample of participants, we will look for validated reference scales to label the subjects and we will work to improve the understanding of the shield element in enhanced versions of AEMIN.

Conclusions
Concerning the features that better predict each dimension, we could conclude that behavioral measures-interaction with the virtual environment and ET-provide the core information in the classification models. Therefore, the results support the use of AEMIN as an ecological assessment tool to measure RT, since it brings to light behaviors that allow to classify the subjects into high/low risk-related psychological constructs. Regarding physiological measures, GSR seems to be less salient in prediction models. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/ASAPLableni/AEMIN-Dataset.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.