Combining Virtual Reality and Organizational Neuroscience for Leadership Assessment

Featured Application: The use of virtual serious games, biological implicit measures, and machine learning techniques for the assessment of leadership styles can enhance traditional assessment by integrating behavior-based information. Abstract: In this article, we introduce three-dimensional Serious Games (3DSGs) under an evidence-centered design (ECD) framework and use an organizational neuroscience-based eye-tracking measure to capture implicit behavioral signals associated with leadership skills. While ECD is a well-established framework used in the design and development of assessments, it has rarely been utilized in organizational research. The study proposes a novel 3DSG combined with organizational neuroscience methods as a promising tool to assess and recognize leadership-related behavioral patterns that manifest during complex and realistic social situations. We offer a research protocol for assessing task- and relationship-oriented leadership skills that uses ECD, eye-tracking measures, and machine learning. Seamlessly embedding biological measures into 3DSGs enables objective assessment methods that are based on machine learning techniques to achieve high ecological validity. We conclude by describing a future research agenda for the combined use of 3DSGs and organizational neuroscience methods for leadership and human resources.


Introduction
The importance of leadership styles as predictors of organizational effectiveness is an important topic in organizational behavior research. Traditional leadership theories have focused on the relationship between leaders and employees as being passive. However, more recently, contemporary leadership theories have emphasized a mutual relationship between leaders and employees, wherein leaders obtain employee feedback and adjust their own behaviors and thereby increase the level of mutual trust [1,2]. These theories emphasize employee skills and encourage employees to question current systems and situations to solve problems in rational and creative ways [2]. According to this theoretical framework, a dichotomous approach for the classification of leadership styles has been proposed, namely "task orientation" vs. "relationship orientation" [3] or "concern for people" vs. "concern for task" [4].
Task-oriented leadership (TOL) refers to leaders being more focused on getting tasks done for achieving certain performance goals. Related behaviors in TOL include planning

Evidence-Centered Design for Serious Games
With the introduction of computer games in the 1970s, which produced a rapid increase in playing time among children as well as adults, and thanks to their appeal and playability, researchers began developing "serious" games (SGs) with specific purposes that went beyond common entertainment [11]. Serious games (SGs) are currently applied as an assessment methodology in training for a wide range of skills in various fields, including education, health, and the military, and have shown positive results [12][13][14]. In organizational leadership research, few studies have investigated the effectiveness of SGs as educational training tools to improve leadership skills [15,16]. Moreover, leadership training studies have mainly used two-dimensional SGs, characterized by flat graphics and lower realism, as opposed to using three-dimensional SGs (3DSGs), which include stereographic depth graphics and a higher sense of realism [17]. 3DSG design features, based on fantasy or simulated worlds, storytelling, challenges and rewards, and advanced technologies that allow developing interactive three-dimensional environments, can stimulate user involvement and motivational engagement, provide more ecologically valid assessments and training, and reduce test anxiety and biases as compared to traditional approaches. Furthermore, the 3DSG approach is a stealthy and non-verbal way to assess various behaviors (e.g., decision making, time taken to perform a task) during gameplay, is capable of measuring a wide range of skills, and can capture a vast amount of data over time. Because serious games involve achieving a set of goals while navigating complex situations, they can be used in companies for employee training purposes to develop problem-solving skills.
Evidence-centered design (ECD) has been proposed as a valid and reliable framework of reference for the development of automated stealth assessment test designs [16,18]. ECD is built on the premise that a test is a measurement instrument with which specific claims about the test scores are associated, and that a good test is a good match of the test items and the test takers' skills. Hence, ECD was conceived in the educational field to improve the validity and reliability of test measures for learners. The ECD framework defines three interconnected core models, namely the competency model(s), structural model(s), and task model(s). The competency model refers to the theoretical skills and abilities (unobservable indicators) that researchers want to measure using the game; the structural model identifies the multifaceted behaviors (observable indicators) that can reveal theoretical skills and abilities; and the task model refers to the tasks or situations that can activate behaviors related to the skills and abilities that researchers intend to measure. Therefore, ECD and 3DSGs allow for the development of various contextual situations, thereby providing greater predictive validity and less bias than traditional assessments [16,[19][20][21].

Biological Implicit Measures of Leadership Behavior
As mentioned previously, leadership styles have been mainly studied using selfreported measures, and only a limited number of studies have taken into consideration the role of implicit leadership signals. Scholars have deemed such an overreliance on explicit measures to be problematic and have suggested that there are opportunities to research implicit measures complementary to conventional organizational behavior surveys or qualitative methods in order to aid theory building and to better understand the role of physiological processes in leader behaviors [22]. Hence, to further leadership research, we assess ROL and TOL styles, which are often measured with self-reported questionnaires, using organizational neuroscience (ON) methods. Behavioral management scholars have used neuro-physiological techniques like heart rate variability, eye tracking, Electroencephalograms (EEG), and Galvanic Skin Response (GSR) in leadership and team research [23,24]. Following previous scholarly works like those by Hannah et al. [25], we also apply a multi-method approach, combining 3D serious games in a virtual reality environment, ON eye tracking, and machine learning to capture and analyze these leader behavior variables. Indeed, leaders' behaviors could be seen as an outcome of dynamic interactions in social situations, which can depend on various explicit and implicit factors such as decision-making behavior, nonverbal body language, gestures and postures, verbal cues, and eye-gaze patterns. In this regard, the flexibility allotted by the ECD framework is helpful in developing various contextual scenarios in a virtual environment that would require individuals to apply various decision-making behaviors. For this study, we focus on decision-making and utilize Rowe and Boulgarides' decision style theory [26] to generate the hypothesized relationships between leader behaviors and decision-making styles. According to the theory, an individual's decision-making style would depend on how one comprehends and perceives the situation and chooses to respond to stimuli. Individual decision-making styles tend to have a high task focus or relationship focus. The cognitive complexity of one's low tolerance for ambiguity often relates to the individual focusing on tasks and structure, whereas leaders who primarily concern themselves with human and social values tend to focus on relationships [27]. Rowe and Boulgarides [26] linked their decision style typology to individual needs of task and relationship orientation and posited that directive decision-makers are primarily driven by their need for power and behavioral decision-makers are concerned with the need for affiliation [28,29]. Because directive decision-makers have a low tolerance for ambiguity, they have a strong desire for structure, rules, and procedures [27], which is considered similar to initiating structural leadership behavior [30]. An instance of directive decision-making would involve giving clear orders to subordinates and executing the decisions made. While behavioral decision-makers also have a low tolerance for ambiguity, their concern relates to maintaining good relationships by offering psychological support and encouragement to followers in complex situations [27]. An instance of behavioral decision-making involves communicating consistently with followers and seeking and utilizing their feedback when making decisions. Furthermore, we hypothesize that the relationship between leadership and decision-making styles would be detected in an ECD/3DSG framework using ON methods. Social eye-gaze patterns refer to the implicit and automatic tendency to focus one's attention on others' behaviors and to interpret social cues of relevance for decision-making behavior (for a review, see [31]). According to social attention theory, visual attention allows people to recognize each other, communicate their mental states, and predict others' behaviors [32]. Social attention could also be relevant in solving issues relating to teams, such as selecting a leader to follow. Preliminary scientific evidence showed that people were able to predict leadership cues from watching muted speech clips [23,33]. Advanced methodologies like eye-tracking tools can capture subjects' attention orientation toward stimuli and spontaneous eye-gaze patterns during social situations [33,34]. These tools are capable of providing moment-to-moment indicators of an individual's rapid and automatic attentiveness to leader cues. Specifically, eye-gaze patterns can capture three indicators: (a) attention orientation to someone or something through the number of fixations (attention directed to stimuli), (b) attentional engagement through duration of fixations (level of processing), and (c) attention patterns to which the stimuli are explored.
Based on this premise, we conduct a study to assess two leadership styles through the combined use of a serious game (as a new behavioral, interactive, and stealth method), and eye-gaze technology. A detailed description of the serious game is provided in the Section 3. The goal of this study is to differentiate participants with high and low scores in leadership styles, captured through their behavioral responses (i.e., decision-making behaviors and eye-gaze patterns) during gameplay as well as through a traditional selfreported measure. Additionally, we apply machine learning (ML) methods to the dataset to explore (a) if, through these data, it is possible to discriminate TOL/ROL styles, and (b) which parameters better discriminate between the two styles in each variable. We also note that the game presented in this study was used by the Neurosteps company (https://neurosteps.com/, accessed on 15 June 2016), with the goal of evaluating leadership styles of professionals who were responsible for leading teams. The usage of this game produced satisfactory results and provided valuable evaluations on many internal aspects, like 360 degree evaluations through skills interviews.
The two main hypotheses tested were as follows: (1) participants' decision-making behaviors, captured during their entire 3DSG experience, will allow for the classification of TOL and ROL styles; (2) participants' fixation duration time and number of fixations, both captured through eye tracking during their entire 3DSG experience, will allow for the classification of TOL and ROL styles.

Participants
The study sample consisted of 56 subjects, of which 24 were female and 32 were male, with an average age of 38.8 years (SD = 8.2). There were 16 university students and 40 professionals from consulting, pharmaceutical, and banking sectors, all of whom identified as Caucasian. The industry professionals were university educated. Participants submitted their written consent, were well rested, and no participant had consumed alcohol for 12 h or other stimulating beverages like coffee for 4 h prior to their lab appointment. No one reported using medications or drugs. The study was approved by the Universitat Politécnica de Valencia's (Spain) ethics committee and followed the conventions of the 1964 Declaration of Helsinki.

Leadership Assessment
The Blake and Mouton Managerial Grid Leadership Self-Assessment Questionnaire [35], which is an 18-item self-reporting scale and measures a respondent's leadership style across relationship and task orientation, was administered online to each participant. Participants rated the extent to which the scale statements applied to them, using a 5-point Likert scale (1 = never and 5 = always). The task orientation index in the scale comprises scores from 0 to 20, and higher values indicate a greater degree of task orientation. The relationship orientation index comprises scores from 0 to 15, with higher values indicating greater people orientation. The scale was tested for reliability using Cronbach's Alpha (α) and received a score of 82, which is considered a satisfactory indicator of internal consistency.

Serious Game Task Modeling
A story narrative, specifically designed for leadership style assessment, was created in the serious game. We scripted and developed the storyline to capture TOL and ROL responses. The game scenarios take place in a sailing ship, whose goal was to reach a destination by solving several problems during the course of sailing (for more details, see the Supplementary Materials). The 3DSG involved 4 adult (2 women and 2 men) virtual agents, who were created with specific personality traits and individual competencies that pertained to TOL and ROL styles. Each character was defined and named as an expert, an organizer, an emotional and communicative individual, and a strategist ( Figure 1). According to their characteristic profiles, the expert virtual agent (a male) presented logical, analytical, mathematical, and technical thinking styles and behaviors that displayed high intelligence and critical competitiveness (typically classified under TOL). The organizer virtual agent (a female) was characterized by structured and sequential thinking, planning and organizing, presented emotional behaviors, and was introverted, faithful, and tended to display certain manic behaviors (we classified these under TOL). The emotional and communicative virtual agent (a male) was characterized by ideals, was emotional and sensitive to both his own and others' problems, was easy to communicate with, was poorly organized, displayed a lack of control over himself, and had little independence. In addition, this virtual agent was outgoing, talkative, spontaneous, and playful (we classified these under ROL). Lastly, the strategist virtual agent (a female) was characterized by innovative and holistic thinking and presented creative solutions to everyday problems (we classified her as under ROL).

Serious Game Task Modeling
A story narrative, specifically designed for leadership style assessment, was created in the serious game. We scripted and developed the storyline to capture TOL and ROL responses. The game scenarios take place in a sailing ship, whose goal was to reach a destination by solving several problems during the course of sailing (for more details, see the Supplementary Materials).
The 3DSG involved 4 adult (2 women and 2 men) virtual agents, who were created with specific personality traits and individual competencies that pertained to TOL and ROL styles. Each character was defined and named as an expert, an organizer, an emotional and communicative individual, and a strategist ( Figure 1). According to their characteristic profiles, the expert virtual agent (a male) presented logical, analytical, mathematical, and technical thinking styles and behaviors that displayed high intelligence and critical competitiveness (typically classified under TOL). The organizer virtual agent (a female) was characterized by structured and sequential thinking, planning and organizing, presented emotional behaviors, and was introverted, faithful, and tended to display certain manic behaviors (we classified these under TOL). The emotional and communicative virtual agent (a male) was characterized by ideals, was emotional and sensitive to both his own and others' problems, was easy to communicate with, was poorly organized, displayed a lack of control over himself, and had little independence. In addition, this virtual agent was outgoing, talkative, spontaneous, and playful (we classified these under ROL). Lastly, the strategist virtual agent (a female) was characterized by innovative and holistic thinking and presented creative solutions to everyday problems (we classified her as under ROL). The 3DSG consisted of 10 scenarios, with a problem situation being presented by one of the virtual agents to the others as well as to the study participant at the beginning of each scenario (for more details, see the Appendix A). To find a solution to a problem, the participant had to make several decisions along the way, with each decision leading the The 3DSG consisted of 10 scenarios, with a problem situation being presented by one of the virtual agents to the others as well as to the study participant at the beginning of each scenario (for more details, see the Appendix A). To find a solution to a problem, the participant had to make several decisions along the way, with each decision leading the storyline to a different narrative. The new narrative would be based on the virtual agents' personality traits and competencies. Each decision to be made was developed according to a systematic method that was based on four decision-making behaviors in literature: (a) communication; (b) execution; (c) giving orders; and (d) doing nothing. The communication decision-making style consisted of the leader asking for and collecting information from other team members and always considering their position or opinions before making the final decision. In such a case, the decision-making process was highly related to the ROL style, wherein the group's opinions are more important for the leader than the execution of actions and the time it takes to solve the problem. Execution decisionmaking consisted of the leader taking initiative and choosing an action to implement to solve the problem, without considering the opinions or emotional state of the group. In this case, we related the decision-making process to both ROL and TOL styles, as the focus was on solving the problem. Next, in the giving orders decision style, while the opinions of the team are not particularly relevant, the leader does review them. People orientation is low here; however, task orientation is not high, because the leader does not execute the action. We consider this style as being similar to the authoritarian leadership style, wherein there is little consensus from team members. Finally, the user also had the option of doing nothing, where their behavior was based in passivity, which is highly characteristic of passive-avoidant leadership.
The following section introduces the TOL/ROL competency models and their relative indicators. For each competency model, we present a graphic model of the indicators. Unobservable indicators are the theoretical leadership constructs related to decision-making behavior and implicit information processing, whereas observable indicators are the data gathered from user performance and visual attention behaviors (for details, see Appendix B).
The 3DSG system was developed using Unity 5.5.1f1 software with c# programing language in the Visual Studio tool.

Experimental Procedure
The study consisted of a 60 min session, in which the participants played the 3DSG. The room in which the experiment took place was equipped to avoid distractions. At the beginning of the session, the eye-tracking application was started manually, and the calibration was carried out. Then, the 3DSG began. The total play duration depended on participants' decision times. The average play duration was noted to be between 20 and 40 min. The application was carried out on an NVIDIA GeForce Gt 740 laptop, and the subject was at an average distance of 60 cm to ensure the correct collection of eye-tracking information. The size of the monitor was 17 in and had a resolution of 1920 × 1080.

Eye-Tracking Measurement and Data Processing
Visual attention was measured using the eye-tracking device Tobii EYEX [36], which collects eye gaze at 60 Hz. The EyeX SDK for Unity was used to integrate data collection in the virtual environment. The SDK provides user fixations, which is defined as the action of fixing one's eyesight on a particular point of the environment. Number and duration of fixations as well as the established a priori AOIs were interpreted by Unity; it analyzed whether the participant fixed their sight on a particular virtual agent, while considering the agent's role at that moment. Specifically, for each eye-tracking (ET) variable, two feature roles were defined in accordance with the situation and virtual agents. Those roles were (a) basic and (b) behavioral. Basic roles consisted of 3 ET features (active, passive, and background) captured during the entire 3DSG. Active role refers to the virtual agents and/or virtual elements that were the protagonist in a situation; passive role refers to the virtual agents and/or virtual elements that are present in the situation but without a protagonist role in that situation; and background role refers to the remaining virtual elements in the virtual scenario ( Figure 2).
In addition, five behavioral roles were defined, depending on the situation and on the specific behaviors displayed by each virtual agent in each situation. Those behavioral roles were (a) informative role-participant eye gaze focused on the virtual agents from whom the participant could obtain more information to make the decision; (b) empathy role-participant focused their gaze on those agents who explicitly asked for help or agents who had emotional needs; (c) decision-making role-participant focused eye gaze on those virtual agents with whom they could make a direct decision; (d) authoritarian role; (e) supportive role-participant focused eye gaze on the agent with supportive personality traits. We note that these roles could change throughout the game. For example, Figure 3 shows a situation in which the strategist (Susana) informs the other agents that she wishes  In addition, five behavioral roles were defined, depending on the situation and the specific behaviors displayed by each virtual agent in each situation. Those behavio roles were (a) informative role-participant eye gaze focused on the virtual agents fro whom the participant could obtain more information to make the decision; (b) empat role-participant focused their gaze on those agents who explicitly asked for help agents who had emotional needs; (c) decision-making role-participant focused eye ga on those virtual agents with whom they could make a direct decision; (d) authoritari role; (e) supportive role-participant focused eye gaze on the agent with supportive p sonality traits. We note that these roles could change throughout the game. For examp Figure 3 shows a situation in which the strategist (Susana) informs the other agents th she wishes to return to the hotel to pick up something; she does not wish to elaborate it and asks the other agents to consider her feelings (show empathy).  In addition, five behavioral roles were defined, depending on the situation and on the specific behaviors displayed by each virtual agent in each situation. Those behavioral roles were (a) informative role-participant eye gaze focused on the virtual agents from whom the participant could obtain more information to make the decision; (b) empathy role-participant focused their gaze on those agents who explicitly asked for help or agents who had emotional needs; (c) decision-making role-participant focused eye gaze on those virtual agents with whom they could make a direct decision; (d) authoritarian role; (e) supportive role-participant focused eye gaze on the agent with supportive personality traits. We note that these roles could change throughout the game. For example, Figure 3 shows a situation in which the strategist (Susana) informs the other agents that she wishes to return to the hotel to pick up something; she does not wish to elaborate on it and asks the other agents to consider her feelings (show empathy). Several variables were extracted during the course of the game (see Table 1). Specifically, 29 variables pertained to the decisions made by participants in the 10 different situations. These included aspects like which action they chose to perform in a particular Several variables were extracted during the course of the game (see Table 1). Specifically, 29 variables pertained to the decisions made by participants in the 10 different situations. These included aspects like which action they chose to perform in a particular situation, which character they preferred to perform an action with, and the number of times they asked the characters questions during a particular situation. Fifty-six variables were extracted from the eye-tracking sensor. These variables collected four different metrics (see Table 2), which could be related to either which virtual agent the participant was looking at or to the role the virtual agent had in that moment. Mean fixation time (s) 5 Mean time to first fixation (s) 5 Table 2. Eye-tracking metrics extracted during the serious game.

Metric Definition
Mean time (s) Mean time the participant looked at the defined virtual agent/role during the whole experience.

Statistical Analysis
One person did not complete the questionnaire, and hence, answers from the remaining 55 subjects were analyzed. A multivariate outlier analysis [37] was performed to detect and remove any participant whose leadership score could be considered as extreme. For this purpose, the Mahalanobis distance between participants was calculated using their self-reported ROL and TOL scores. Then, the probability of this distance belonging to a Chi-square distribution was calculated. If this probability was below 0.01, the participant's scores were defined as outliers, whereupon the participant was excluded from further analysis. After such an analysis, four participants were removed resulting in a final sample size of 51.
Both leadership subscales were categorized into high or low scores based on the mean of each variable, as they were normally distributed (p > 0.05, Shapiro-Wilk test). This was a necessary step for building the Machine Learning (ML) models, described in the following section. This categorization was then used in describing the variables and calculating the mean and standard deviation of low and high scores in each group. Furthermore, contrasts of hypotheses were performed to find any statistically significant difference that could be related to leadership scores. Using the categorization of the leadership scores to define two groups, different statistical tests were applied to the variables from the RV, depending on their nature (numerical/categorical) and normality (p-value of Shapiro-Wilk test). As a first explorative description of ROL and TOL, statistical tests were performed considering both high and low levels. If the variable was categorical, a Chi-square test was performed. If the variable was numerical, either t-test or Wilcoxon test was performed, according to its Gaussianity. Statistical significance level was set as alpha < 0.05.

Machine Learning
Machine Learning was used to build leadership recognition models based on the behaviors recorded in the serious game. First, a feature selection was performed to reduce the dimensionality. To prevent overfitting, the maximum number of variables that a model could select was set to 15. The feature selection was performed using a backward sequential wrapper [38]. This method starts building a model, using an ML algorithm with all the available features, and measures its performance. Then, at each step, a feature that decreases the performance measure (i.e., Cohen's Kappa) is removed. Once the set of best features was obtained, hyperparameter tuning was performed. Different hyperparameters were optimized for each ML algorithm (Table 3). Ten equal-sized values in the range defined were evaluated. Since this tuning was limited and not continuous, the final hyperparameters were not optimal. This also helped to avoid overfitting.  (3,5,7,9,11) After obtaining the best set of features and hyperparameters for each ML algorithm, the model was trained and validated and its metrics (i.e., accuracy, Cohen's Kappa, Sensitivity (True Positive Rate, TPR) and Specificity (True Negative Rate, TNR)) were obtained. Feature selection, hyperparameter tuning, and model building were all validated using a repeated cross-validation (5 folds, 2 times). Machine learning and statistical analyses were performed in R (version 3.6.1).

TOL and ROL Description
Participants scored 9.43 ± 3.10 on ROL and 6.64 ± 2.19 on TOL. Table 4 shows the description of the two groups (high/low) into which these subscales were divided. A total of 55% of participants had a high score on ROL subscale, while 49% scored high on TOL.

Statistical Significance of Leadership Styles in the Serious Game
The categorized scores were used to analyze the differences between the high and low leadership groups (Table 5). Participants with a high ROL score tended to spend more time looking at virtual agents when they played a passive role. They looked at the organizer (Martina) for less time. They also tended to not pick any team in situation 5, whereas participants with a low ROL score picked the negative team more often. For TOL, participants with higher TOL scores tended to fix their sight a greater number of times on virtual agents who played a supportive role.

Automatic Leadership Recognition Models
The leadership recognition models for ROL and TOL achieved 79% and 76% accuracy respectively, as shown in Table 6. In both cases they achieved a kappa higher than 0.5, which is seen in the balance between TPR and TNR. Therefore, the models could correctly identify each group of participants in terms of TOL and ROL with more than 70% accuracy. In terms of the feature selected, the ROL model selected 9 out of its 14 features from among those variables that provided information about the decisions made by the participant ( Table 7). The TOL model, meanwhile, selected 7 out of 10 features from those that reported information about the role of virtual agents when participants looked at them (Table 8).

Discussion
The current study utilized a 3DSG in a virtual reality environment based on ECD and an ON-based eye-tracking measure to capture the behavioral signals associated with task and relationship leadership. In addition, machine learning was used to build the recognition models, generated from the subjects' behavioral responses recorded in the serious game. Thus, we offered a multi-method approach to capture and analyze these leader behavior variables. A goal of this study was to emphasize neuroscience methods as a promising tool for assessing behavioral styles of leadership in complex organizational situations. A 3DSG provides behavioral researchers with an avenue to simulate complex scenarios in which managerial decision-making occurs. By seamlessly integrating biological measures into the 3DSG, we showed that such a hypothesis-driven research design can provide objective assessments of leader behaviors with improved ecological validity as compared to self-reported measures. The study included a preliminary analysis of the frequency distribution (high vs. low) of TOL and ROL to investigate differences between the two styles and a broad set of supervised ML models combining decision-making behaviors and eye-gaze patterns to evaluate the discriminability between TOL and ROL. Results can be discussed based on five points: (1) the significant differences between TOL and ROL (high vs. low scores) measures; (2) the use of ML methods for TOL/ROL style discrimination and the features that better discriminate between the two styles; (3) theoretical implications; (4) practical implications; and (5) limitations and future directions.

High and Low TOL/ROL Differences between Measures
One goal was to identify differences in terms of capturing TOL and ROL in traditional measures and in the 3DSG. Using traditional measures, our findings show that 55% of participants had a high score on the ROL subscale, while 49% scored high on TOL. Statistically significant differences were seen in high and low ROL and TOL styles, suggesting that there were a higher number of participants with high TOL and ROL styles than with low scores. This suggests that the traditional measure was able to define participants with TOL and ROL styles adequately. Starting from this base point, the 3DSG results showed that both styles could be differentiated mainly through the biological implicit measures related to social eye-gaze patterns. The results explained in Section 4.2 seem to suggest that the interactive and dynamic 3DSG is able to provide deeper realistic behaviors, reflecting the complexity of real life behaviors. Consistent with the theoretical framework of ROL-style behaviors, which relate to supporting and motivating people, fostering communication, and recognizing teamwork [7], our results also revealed a higher attention by participants to the passive role agent, while they focused less on the organizer virtual agent (Martina). On the other hand, participants with high TOL gave greater attention to supportive role agents, suggesting the importance for TOL individuals to plan and organize tasks better, along with monitoring the team [5,6]. Regarding decision-making behaviors, participants with a higher ROL style tended to communicate more during interpersonal conflict situations than in problem-solving situations during the 3DSG. This result is, once again, consistent with leadership literature, wherein people with high ROL focus encourage positive interpersonal relationships, collaborations, and teamwork [7]. According to our results, the decision-making behaviors and the biological implicit measures captured during the 3DSG experience were able to classify TOL and ROL styles, thereby supporting our first hypothesis. Furthermore, our results clearly show that decision-making behaviors seem to be more related to ROL and eye-gaze patterns to TOL.

ML Methods for TOL/ROL Style Discrimination and Features that Better Discriminate between the Two Styles
The leadership recognition models for ROL and TOL confirmed the previous results, achieving 79% and 76% accuracy, respectively, balanced in terms of sensibility and specificity. This suggests a precise identification of each style and supports our second hypothesis. Furthermore, in terms of feature selection, the ROL model selected 9 out of 14 features from participants' decision-making behaviors, of which four related to communication and execution and one related to giving orders. On other hand, the TOL model selected 7 out of 10 features from eye-gaze patterns, which mainly related to attention orientation toward someone through the number of fixations (attention directed to stimuli) and attentional engagement through duration of fixations (level of processing; Eckstein et al., 2017). Specifically, the seven features selected are more related to the behavioral roles according to the situations (5 of 7) and less related to an active/passive role in a specific moment of a situation (2 of 7), suggesting that, depending on the problem that needs to be solved, TOL individuals focused their visual attention more intently before the final decision. This is consistent with TOL, which refers to leaders being more focused on getting tasks done to achieve specific performance goals [5,6].

Theoretical Implications
In applying a non-obtrusive neuroscience method (i.e., eye-tracking measure) and a virtually simulated environment to gauge behavioral responses in complex situations, we extend decision-making theory to integrate directive and behavioral decision-making styles with an individual's task and social orientation. With most empirical studies assessing leader behaviors and decision-based responses using subjective self-reporting measures, we contribute to the related literature by combining neuroscience and VR, following studies like Hannah et al. [25], who also employed ON methods to capture adaptive decisionmaking and leader self-complexity. As strategic decision-making is theorized to be related to an individual's cognitive complexity [39], our application of an eye-tracking mechanism helps reveal socio-cognitive responses (manifested through mean time and count of fixated gaze) when people are faced with a complex situation. Hence, we attempted to establish a non-obtrusive neurological basis for Rowe and Boulgarides' decision style theory [26]. Furthermore, we extend the works of Gerpott et al. [23], who utilized social attention theory to study emergent vs. non-emergent leaders using eye-gazing patterns and suggested conducting future research to test other assumptions about eye gazing relating to effective leadership. In this regard, our study contributes to this theory, as we apply this method for TOL and ROL. Moreover, our research is an empirical examination of the taxonomy of the eye-tracking research proposed by Meißner and Oll [31]. Following this taxonomy, the current study utilized the suggested eye-tracking measures to capture psychological constructs relating to emotional arousal.

Practical Implications
The present study's implications for practice lie in whether and how the neurological foundations of an individual's decision-making style can be retrained in cases where exhibited behaviors are counterproductive to the workplace. Behavioral scholars have argued for and against the relevance of brain plasticity in retraining and developing one's brain in the context of negative work behaviors [40] and have offered that physiological and biological measures could help address negative behaviors through interventions [40]. Furthermore, similar study designs that employ fMRI or qEEG can help us understand how a leader's decision-making styles adapt and evolve over their tenure in the organization. Eye-tracking technology has also been utilized in understanding job interviewer bias, wherein participants' (who were acting as hiring managers) gaze patterns regarding a job candidate's perceived negative attributes, like facial stigmas or physical imperfections, were measured [41]. Lastly, the application of serious games for organizational training and skill development of employees is also a key practical contribution of the current study.

Limitations and Future Directions
We note that the small sample size is a limitation of this study; this could compromise the generalizability of our theorized relationships. However, it should be noted that the goal of the study was not to achieve a predictive model that could be applied to a wide general population but to study the possibility of using the behaviors recorded from a serious game to recognize leadership styles. The study shows that machine learning can be used for this purpose. In terms of future directions, as a multi-method study that employs neurological methods and advanced technologies like VR and machine learning, this research is an initial step in studying individual behavioral responses, for both organizational leaders and employees, using non-traditional and newer techniques. There are also opportunities to study leader behaviors beyond ROL and TOL, such as transformational leadership and the associated cognitive mechanisms. To extend the use of VR and eye-tracking in behavioral management research, Meißner and Oll [31] suggest using 'virtual humans' to simulate interpersonal encounters during the job selection and interview process to understand interviewee eye movements (like avoiding eye contact) when experiencing stress during the job interview.

Conclusions
The current research focuses on the use of VR and implicit measures as a novel paradigm for leadership assessment. We assess ROL and TOL using a combination of psychometric, neurological, and technological methods. Specifically, we employ an interactive serious game in a VR environment as the input variable to assess whether an individual belongs to the ROL or TOL category and then use machine learning to analyze results and create a precise model fit for both leadership styles. In addition, we use an ON-based eye-tracking technique to capture individual subjects' reactions when exposed to situations that necessitate them displaying either task or relationship leadership behaviors. We view the main contributions of this research as utilizing a more advanced 3DSG as opposed to 2DSG through an ECD framework and the contributions to social attention and decision-making theories.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/app11135956/s1, A glimpse of the story narrative can be found here-youtu.be/ks2Paya6x6M (accessed on 24 June 2021). The datasets generated for this study are available on request from the corresponding author.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions of privacy policies on sensitive data categories.
Acknowledgments: This work was supported by the Generalitat Valenciana funded project "Mixed reality and brain decision (REBRAND)" 502 (PROMETEO/2019/105).  Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Polytechnic University of Valencia (Protocol code: P04_04_06_20).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions of privacy policies on sensitive data categories.

Conflicts of Interest:
The authors declare no conflict of interest.  Funding: This research was funded by the Generalitat Valenciana funded project "Mixed reality and brain decision (REBRAND)" 502 (PROMETEU/2019/105).

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of the Polytechnic University of Valencia (Protocol code: P04_04_06_20).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to restrictions of privacy policies on sensitive data categories.