An Assessment of Individuals’ Systems Thinking Skills via Immersive Virtual Reality Complex System Scenarios

: This study utilized the application of authentic Virtual Reality (VR) to replicate the real-world complex system scenarios of a large retail supply chain. The proposed VR scenarios were developed based on an established systems thinking instrument that consists of seven dimensions: level of complexity, independence, interaction, change, uncertainty, systems’ worldview, and ﬂex-ibility. However, in this study, we only developed the VR scenarios for the ﬁrst dimension, level of complexity, to assess an individual’s Systems Thinking Skills (STS) when he or she engages in a turbulent virtual environment. The main objective of this study was to compare a student’s STS when using traditional ST instruments versus VR scenarios for the complexity dimension. The secondary aim was to investigate the efﬁcacy of VR scenarios utilizing three measurements: Simulation Sickness Questionnaire (SSQ), System Usability Scale (SUS), and Presence Questionnaire (PQ). In addition to the three measures, NASA TLX assessment was also performed to assess the perceived workload with regards to performing the tasks in VR scenarios. The results show students’ preferences in the VR scenarios are not signiﬁcantly different from their responses obtained using the traditional systems skills instrument. The efﬁcacy measures conﬁrmed that the developed VR scenarios are user friendly and lie in an acceptable region for users. Finally, the overall NASA TLX score suggests that users require 36% perceived work effort to perform the activities in VR scenarios.


Introduction
The intense competition in today's global economy stimulates the advancement of technologies producing new requirements for future jobs. "The Future of Jobs," published through the World Economic Forum (WEF) in 2016, identified the critical workforce skills needed in the future complex workplace environment. The report indicated that complex problem solving and critical/systems thinking (ST) skills are the most important skills for the next five years, outpacing the need for other skills such as people management, emotional intelligence, negotiation, and cognitive flexibility. In other words, because the skills required are beyond the narrow focus of traditional engineering disciplines, more focused emphasis on holistic thinking modes is necessary and should be emphasized in the training of a future workforce [1,2]. The need for these skills is growing because of the complexity and uncertainty associated with modern systems is increasing remarkably [3,4].

Systems Thinking: Overview and Application
A complex system usually involves high levels of change, uncertainty, and interrelations among the subsystems. Thus, its behavior cannot be deduced from the study of its elements independently since the complexity of a system is determined by the volume of information needed to understand the behavior of this system as a whole and the degree of detail necessary to describe it [26]. Maani and Maharaj [6] support the same notion by agreeing that implementing reductionist methods to solve complex problems is insufficient to interpret systems involving high levels of complexity. It has been proven through the literature and practice that systems thinking can help deal with the increasing complexity of businesses in particular [27,28] and the world in general [6,29]. There is a growing emphasis on systems thinking in almost all fields, including education [30][31][32], engineering [33,34], military [35,36], agriculture [37][38][39], weather [40,41], and public health [42][43][44].
The existing literature is replete with studies, both theoretical and observational, concerning systems thinking and management. Senge [45] stated the benefits of systems thinking in helping to determine the fundamental management goals to build the adaptivemanagement approach in an organization. A case study by Senge [46] validated the relationship and relevance of systems thinking in all levels of leadership. Jacobson [26] explained how systems thinking approaches aid the integration of management systems in an organization. In his study, he presented a list of procedures to be followed to determine the most appropriate way to implement the management systems model. In another study, Leischow et al. [42] discussed the importance of systems thinking in marketing by implementing a systemic approach to marketing management. Systems thinking approach was also adopted in risk management [47][48][49], medicine management [50][51][52], project management [53][54][55], quality management [56,57], and many other domains of management.

Systems Thinking Skills and Assessment
With the popularity and advancement of systems theories and methods, the identification and assessment of systems thinking skills becomes more important. More emphasis is placed on the development of tools and techniques that can effectively determine and measure systems thinking capabilities. For example, Cabrera and Cabrera [22] described systems thinking as a "set of skills to help us engage with a systemic world more effectively and prosocially" [22] (p. 14). Over time, researchers attempted to develop tools and techniques to measure an individual's skillset in both a qualitative and quantitative manner. For example, Dorani and co-authors [58] developed a set of questions to assess systems thinking skills by combining the concepts of the System Thinking Hierarchical (STH) model [59] and the System Thinking Continuum (STC) [60] with Richmond's seven thinking tracks model [61]. This scenario-based assessment process consists of six questions, each measuring important systems thinking skills, i.e., dynamic thinking, causeeffect thinking, system-as-cause thinking, forest thinking (holistic thinking), closed-loop thinking, and stock-and-flow thinking. Cabrera et al. [22] summarized the evolution of systems thinking skills into four waves and embraced the methodological plurality and universality of systems framework in the third and fourth waves. By emphasizing the broader plurality and universality of systems methods, Cabrera and colleagues introduced DSRP (distinctions, systems, relationships, and perspectives) theory [62][63][64] that offers a comprehensive framework of systems thinking. Later, Cabrera et al. [65] introduced an edumetric test called Systems Thinking Metacognition Inventory (STMI) to measure three important aspects of a systems thinker, i.e., systems thinking skills, confidence in each skill, and metacognition. Many other researchers developed different guidelines and assessment tools to assess the systems thinking skills by embedding systems theories and principles [6,[66][67][68][69].

Systems Thinking and Technology
Virtual-Reality (VR) technology dates back to the mid-1940s with the advancement of computer science [70]. In 1965, Ivan Sutherland first proposed the idea of VR when Systems 2021, 9,40 4 of 25 he stated: "make that (virtual) world in the window look real, sound real, feel real, and respond realistically to the viewer's actions" [71] (p. 3). In the 1980s, some schools started adopting personal computers and digital technology, and multiple studies shifted focus towards VR technologies, its applications, and its implications [72]. At the beginning of the 1990s, the area of virtual reality experienced an enormous advancement. The primary purpose of VR technologies was to generate a sense of presence for the users and enable the user to experience an immersive virtual environment generated by a computer as if the user was there [73].
Although the ideas around virtual reality emerged during the 1960s, the actual application of Virtual Reality (VR) has expanded to almost all fields of science and technology. During the last two decades, this growth demonstrates the increasing popularity of this technology and the unlimited potential it has for future research in various sectors such as engineering, military, education, medicine, and business. For example, in academia, the profitableness of VR technologies proved to be of a notable significance. Numerous studies encouraged the use of VR, especially in education, for plenty of reasons. In a study of 59 students, Bricken and Byrne [73] reported results that favored VR technologies in enhancing student learning. Along the same line, Pantelidis [74] demonstrated the potential benefit of VR technology in classroom pedagogy. Another study involving 51 students was conducted by Crosier et al. [75] to assess the capacity of VR technologies to convey the concept of radioactivity. The results of the study showed that students gained more knowledge in the VR environment compared to traditional methods. Mikropoulos [76] also showed that VR advanced imagery features and manipulative capabilities provided by multisensory channels and three-dimensional special representations proved to have a positive impact on students' learning process.
Similarly, Dickey [77] stated that VR technologies generate realistic environments that enable students to enhance their competencies, waive the need for pricey equipment, and avoid hazardous settings sometimes necessary for learning. Echoing Dickey's findings from 2005, Dawley and Dede [78] demonstrated that VR helps in developing students' cognitive capabilities by simulating real-world settings that make the users feel as if they were there. In other words, with VR, students no longer need to "imagine" a situation but are able to be there in real time and interact with different objects and scenarios related to the subject studied using simple gestures. A similar study by Hamilton et al. [79] was performed to examine how VR technology helps students grasp queuing theory concepts in industrial engineering in an immersive environment. The results showed that the virtual queuing theory module is a feasible option to learn queuing theory concepts. Similarly, Byrne and Furness [80] and Winn [81] highlighted the efficacy and usability of VR technology in modern pedagogy. The literature review revealed that the integration of VR technology in education has significant benefits for students.
For a detailed investigation into the application of VR technology across different disciplinary domains, readers are referred to such works as McMahan et al. [82] (entertainment industry), Opdyke et al. [83], Ende et al. [84], Triantafyllou et al. [85] (healthcare), and Durlach and Mavor [86] (military application). On the other hand, the skills of systems thinking come with practice and are applied across different fields, including virtual reality, which is one of the effective ways to practice and learn Systems Thinking. The Systems Thinking Skills recognized in the academic literature include the ability to visualize the system as a whole, develop a mental map of a system, and think in dynamic terms to understand behavioral patterns. In VR games, the participants engage in various experiences and use their systems thinking skills to respond to various complex systems problems based on the real-world scenario. The review of the literature also shows that several systems thinking tools and techniques have evolved over the decades to address complex system problems. Some tools could assess only one or two ST skills [87][88][89] and only to a certain extent. Many of the current tools are purposefully designed for a specific domain, such as education, to measure the students' ST skills [59,[90][91][92]. However, none of the standalone tools could capture the overall systems thinking skills of an individual. These tools and techniques might satisfy a specific need, but they do not facilitate solutions against the backdrop of complex system domains. Moreover, many of the current ST tools neither published their claims nor demonstrated the accompanying evidence of validity and reliability. Enforcing this criticism, Camelia and Ferris [90] stated that "there are over 200 instruments designed to measure any of a variety of attitudes toward science education, but most have been used only once, and only a few shows satisfactory statistical reliability or validity" [90] (p. 3).

Gamification
Games are a combination of many fundamental conditions without which a game cannot be constituted. A game is considered incomplete if one of these conditions is not met; hence, it cannot be carried on [93,94]. Taking that into consideration, games can be developed to help gamers grasp any concept related to any field. Gamification has and is still attracting attention from both industry and academia [93,95,96]. Although a considerable number of games have been invented for gaming purposes, not as many were built to help understand particular concepts in scientific, academic, or business fields [97][98][99]. An example of such games is the beergame. The beergame is an online game to teach operations. It allows students to sense the real, traditional supply chains in which coordination, sharing, and collaboration are missing. This non-coordinated game/system shows the problems that result from the absence of systemic thinking (website: https://beergame.org (accessed on 21 May 2020)) [23]. While the beergame was built to help students gain insight and conceptual background into supply chains, Littlefield Labs was developed to help its users acquire certain skills. Littlefield is an online analytics game simulating a queuing network where students compete to maximize their revenue [100]. Students can see their performance history to examine the impact of their previous decisions and how to manage future decisions. Capstone games are more involved than analytics games because they provide more complex instructions and a wider area for decision making. Capstone simulations are an example of capstone games. Capstone is an online business simulation developed to explain marketing, finance, operations, and others. A taxonomy of online games was developed specifically for these games to classify them based on their pedagogical objective [100]. Table 1 below shows the taxonomy table. In addition to the taxonomy table, the Table 2 below presents a general description of the three games. The table discusses the uniqueness of the games compared to each other, describes their process, and presents different parameters these games have.

Insight
Analysis Capstone

Beer Distribution Game
The role-play game represents a supply chain process where players need to coordinate different departments (Factory, Distributor, Wholesaler, and Retailer) in a beer distribution process. The game requires minimum of 4 players and 60-90 min of play. The objective of the game is to meet customer demand with a minimum total cost in a period of 20 weeks.
Help develop planning skills, management skills, coordination, decision-making skills.
Players are unable to make decisions jointly.

Littlefield Labs
In a normal setting, students form groups and compete to see who will generate the highest cash by making decisions in a blood testing service: alter lot size, control inventory and orders, select schedules, and manage capital. This simulated game includes a two-hour task to be completed in a class and a seven-day task to be played as a non-class assignment. The game is easy to grasp.
The game is designed to encourage participants forecasting skills, process analysis skills, and management skills.
The workload is not efficiently distributed among players, poor understanding of the basics of the simulation, try trial and error rather than following a strategy.

Capstone Simulation
This online simulation game allow students to try entrepreneurship strategies in a game where they can control the whole lifecycle of a product from launching it to disposing of it. Decision rounds vary between 8 and 12 depending on the type of capstone simulation. The optimal game setting includes four to six teams of four to five students with a maximum of eight teams.
The game is designed to develop selection of tactics, strategical thinking, management skills, and cross-functional alignments.
The cost of the game is high.
Since the focus of this study was on the tools and methods of systems thinking (the third theme), we surveyed the literature to study the tools, techniques, and games used to measure systems thinking. Based on the review, we found that (1) several of these tools are survey-based instruments; (2) few tools such as assessing systems thinking by Grohs et al. [101] are developed to measure ST (however, the validity and reliability of these tools are questionable since no sign for validity has been conducted on the theme); and (3) new technology such as virtual reality and mixed reality have not been used in the domain of systems thinking. The motivation of this study was to measure individuals' ST skills using real-case scenarios in which individuals make decisions in uncertain, complex environments while managing different entities in the system. To develop a more valid, real-case scenario, we used Beer game as an inspirational game.

Efficacy Measures of the VR Scenarios
When referring to VR scenarios, the efficacy measures generally indicate the quality of the environment or the ability to perform the intended outcome [72,102]. An extensive literature review showed that many different qualitative questionnaires exist from past research efforts [103,104]; however, studies from literature lack flexibility and are not conducive to be generally applied to any VR study. As a result of this heuristic search, three effective assessments were chosen to collect information of interest from the users. These assessments include simulation sickness questionnaire (SSQ), system usability scale (SUS), and a presence questionnaire (PQ).
Kennedy et al. [105] prepared a simulation sickness questionnaire by including 21 symptoms that can result from virtual environment exposure. These 21 symptoms are grouped into three areas: nausea, oculomotor disturbance, and disorientation. This questionnaire gauges virtual movement sickness by allowing the user to rate their level of feeling from 0-3. The overall score is obtained by summing the weighted score of each category and then multiplying the result by 3.74. The weighted thresholds for nausea, oculomotor disturbance, and disorientation are 9.54, 7.58, and 13.92, respectively. This final score reflects the severity of the symptoms experienced. Table 3 below shows the score categorization of the final SSQ score. Symptoms are a concern >20 A problem simulator The second efficacy measure, the System Usability Study (SUS), comes primarily from a tool developed by Brooke [106]. This tool consists of 10 questions using a 5-point Likert scale to measure the user's expectation of the virtual system. These questions can be reworded positively or negatively and can be modified to be more specific to the environment under question. The final score of the usability study is obtained by summing all the items' scores and then multiplying the result by 2.5.
The presence questionnaire, which is the third measure, is an indicator of the user's feelings about the virtual system. This survey, which includes 22 questions, was introduced by Witmer and Singer [107] and utilizes a 0-6 scale. Similar to the two previous questionnaires, the answers are summed to obtain an overall score for user presence.
These three efficacy measures fill the gap in the literature of a lack of generalized, qualitative questionnaires for the evaluation of VR scenarios. The non-specific nature of the surveys allows for their continued use on future VR studies, while adequately obtaining the necessary research information needed.
It is apparent that, although much has been written in the existing literature about the application of VR across different fields, including education, there is an apparent lack of empirical investigations conducted to measure students' ST skills using the immersive VR complex system scenarios. The rationale of this research was to address this current gap in the literature. To the best of our knowledge, this is the first attempt to appraise the ST skills of students through VR immersive technology. The research will be contributing to the field by: • Developing a set of VR gaming scenarios to measure the ST skills of the students based on the systems skills instrument by Jaradat [15]. In this study, the proposed VR scenarios were developed to measure only the first dimension of the instrument, level of complexity-simplicity vs. complexity (see Table 2). Six binary questions were used to determine the complexity dimension level.

•
Investigating whether or not the proposed VR scenarios can be an appropriate environment to authentically measure students' level of ST skills.

•
Conducting different types of statistical analysis such as ANOVA and post hoc to provide better insights concerning the findings of the research.

•
Demonstrating the efficacy and extensibility of VR technology in the engineering education domain.

Research Design and Methodology
This section has four parts. First, the systems thinking instrument used in the experiment is demonstrated. In the second part, the developed VR scenarios and the environment design are presented. The third part presents the design of the experiment to illustrate the experiment's flow. The research design and methodology section ends with the mitigation Systems 2021, 9, 40 8 of 25 techniques used in the study. The theoretical model of this study is illustrated in Figure 1 and details are provided in the following subsections.

Systems Thinking Instrument-An Overview
The systems thinking instrument was comprised of 39 questions with binary responses [15]. The responses of the participants were recorded in the score sheet. The score sheet had seven letters, each one indicating an individual's level of inclination toward systems thinking when dealing with system problems. The instrument was composed of seven scales identifying 14 major preferences that determine an individual's capacity to deal with complex systems. The seven scales that constituted the instrument are presented below and shown in Table 4. For more details about the instrument, including the validity, readers can refer to [15] (p. 55). Table 4. Systems thinking preferences dimensions.

Dimension
Less Systemic More Systemic f Complexity: Defines an indis comfort zone in dealing with x system problems.

Simplicity (S):
Avoid uncertainty, work on linear problems, prefer best solution, and prefer smallscale problems.

Complexity (C):
Expect uncertainty, work on multidimensional problems, prefer a working solution, and explore the surrounding environment. f Independence: Describes individual deal with the inteof multiple systems.

Autonomy (A):
Preserve local autonomy, tend more to independent decision and local performance level.

Integration (G):
Preserve global integration, tend more to dependent decision and global performance. f Interaction: Indicates the scale an individual will to adopt.

Isolation (N):
Inclined to local interaction, follow detailed plan, prefer to work individually, enjoy working in small systems, and interested more in cause-effect solution.

Interconnectivity (I):
Inclined to global interactions, follow general plan, work within a team, and interested less in identifiable cause-effect relationships f Change: Reflects an ual's inclination in ng changes.

Resistance to Change (V):
Prefer considering few perspectives, over specify requirements, focus more on internal forces, like short-range plans, tend to settle things, and work best in a stable environment.

Tolerant of Change (Y):
Prefer taking multiple perspectives into consideration, underspecify requirements, focus more on external forces, like longrange plans, keep options open, and work best in changing environment. f Uncertainty: Depicts an inal's choice in making decisions sufficient knowledge.

Stability (T):
Prepare detailed plans beforehand, focus on the details, uncomfortable with uncertainty, believe work environment is under control, and enjoy objectivity and technical problems.

Emergence (E):
React to situations as they occur, focus overall, comfortable with uncertainty, believe work environment is difficult to control, enjoy subjectivity and non-technical problems.

Systems Thinking Instrument-An Overview
The systems thinking instrument was comprised of 39 questions with binary responses [15]. The responses of the participants were recorded in the score sheet. The score sheet had seven letters, each one indicating an individual's level of inclination toward systems thinking when dealing with system problems. The instrument was composed of seven scales identifying 14 major preferences that determine an individual's capacity to deal with complex systems. The seven scales that constituted the instrument are presented below and shown in Table 4. For more details about the instrument, including the validity, readers can refer to [15] (p. 55). Level of Complexity: The level of complexity refers to the level of interconnection spawned from systems and their components. In other words, it stands for the level at which the forces acting on a set of processes find a balance. It also indicates which strategy an individual adopts while facing an issue: simple strategy or complex. Level of Independence: The level of independence stands for the level of integration or autonomy an individual will adopt while dealing with a complex system. The individual tends toward a dependent decision and global performance level (integration) or an independent decision and local performance level (autonomy). Level of Interaction: The level of interaction stands for the individual's preference in regards to the manner by which he/she reacts with systems. Level of Change: The level of change indicates the degree of tolerance with which an individual accepts changes.
Level of Uncertainty: Uncertainty refers to the situations where information is unknown or incomplete. The level of uncertainty illustrates how the individual makes decisions when he/she is uncertain about the situation. This level ranges from stability, which means uncomfortable with uncertainty, to emergence, which is the case when dealing with uncertainty without any pre-plan.

Dimension
Less Systemic More Systemic

Level of Complexity:
Defines an individual's comfort zone in dealing with complex system problems.

Simplicity (S):
Avoid uncertainty, work on linear problems, prefer best solution, and prefer small-scale problems.

Complexity (C):
Expect uncertainty, work on multidimensional problems, prefer a working solution, and explore the surrounding environment.
Level of Independence: Describes how an individual deal with the integration of multiple systems.

Autonomy (A):
Preserve local autonomy, tend more to independent decision and local performance level.

Integration (G):
Preserve global integration, tend more to dependent decision and global performance.

Level of Interaction:
Indicates the type of scale an individual will choose to adopt.

Isolation (N):
Inclined to local interaction, follow detailed plan, prefer to work individually, enjoy working in small systems, and interested more in cause-effect solution.

Interconnectivity (I):
Inclined to global interactions, follow general plan, work within a team, and interested less in identifiable cause-effect relationships

Level of Change:
Reflects an individual's inclination in accepting changes.

Resistance to Change (V):
Prefer considering few perspectives, over specify requirements, focus more on internal forces, like short-range plans, tend to settle things, and work best in a stable environment.

Tolerant of Change (Y):
Prefer taking multiple perspectives into consideration, underspecify requirements, focus more on external forces, like long-range plans, keep options open, and work best in changing environment.

Level of Uncertainty:
Depicts an individual's choice in making decisions with insufficient knowledge.

Stability (T):
Prepare detailed plans beforehand, focus on the details, uncomfortable with uncertainty, believe work environment is under control, and enjoy objectivity and technical problems.

Emergence (E):
React to situations as they occur, focus overall, comfortable with uncertainty, believe work environment is difficult to control, enjoy subjectivity and non-technical problems.

Systems Worldview:
Depicts an individual's understanding of system behavior at the whole versus part level.

Reductionism (R):
Focus on particulars, prefer analyzing the parts for better performance.

Holism (H):
Focus overall, interested more in the big picture, interested in concepts and abstract meaning of ideas. Level of Systems Worldview: The world system view depicts how the individual sees the systems' structure, as a whole or a combination of separated parts. There exist two main levels: holism and reductionism. Holism refers to focusing on the whole and the big picture of the system. On the other hand, reductionism consists of thinking that the whole is simply the sum of the parts and its properties are the sum of the properties of the total parts. Therefore, we must break the whole into elementary parts to analyze them.

Level of Flexibility
Level of Flexibility: Flexibility characterizes the capability and willingness to react when there are unanticipated changes in circumstances. The level of flexibility of individuals ranges from flexible to rigid. For some individuals, the idea of flexibility produces considerable anxiety, especially when they have already formulated a plan; for others, the option for flexibility is vital to determine their plan.

VR Scenario Case and Environment Design
The experiment was conducted using three VR-compatible computers for one week. Before engaging participants with the virtual scenarios, they were asked to complete a demographic questionnaire and one detailing any simulation sickness they may have experienced in the past. They were also asked to describe their familiarity with virtual reality, video game-playing experience, and retail store experience using a Likert scale. After filling the two questionnaires, participants were asked to answer six questions constituting the systems skills instrument. These questions assess the participants' ability to deal with complexity and illustrate their preferences. After answering the instrument questions, students were assigned to computers and began the VR scenarios. Following the completion of the VR scenarios, three questionnaires were used to evaluate the user experience (post-simulation sickness, system usability, presence questionnaire). For each participant, the surveys and VR scenarios took approximately 30 min to complete.

VR Supply Chain Case Scenario
The VR case scenario is developed based on real-life situations in which participants have to make decisions and choose between several options. Their answers/preferences indicate how they think in complex situations and this determines their systems thinking skills when dealing with complex system problems. The simulated scenarios were set up using Unity3D game engine (Unity Technologies, San Francisco, CA, USA). To be engaged in the VR environment, the Oculus Rift VR headset (Oculus VR, Facebook Inc., Menlo Park, CA, USA) was used. The VR scenario was composed of five complex scenes in a marketplace and is illustrated in the next section.
The complex system scenario is a decision-making VR game in which participants experience immersive real-life situations in a large retail grocery chain where uncertainty, ambiguity, and complexity exist. This supply chain could represent any non-coordinated system where problems arise due to a lack of systemic thinking. Although these scenarios were developed based on a well-known beer game, the aim and the scope of the developed scenarios were different and purposely designed to measure a participant's skill set in addressing the grocery chain's problems.

The Design of the VR Scenarios
A VR grocery chain was chosen because a majority of the study participants would be familiar with a grocery store and easily grasp concepts such as stocking shelves and displaying merchandise. Furthermore, this type of environment would also allow for multiple scenes and stories to be developed to guide the user through all 39 questions. In each scene, the user assumed the role of the grocery store manager. As the users began each prompted task, they were asked to make decisions that they thought were best for the store. Each decision they were prompted to make corresponded to a question and recorded the user's decisions. Each decision-making event was presented in a non-biased, binary way that allowed the users to choose their personal preference and give genuine reactions to their decisions without feeling that a wrong decision was made because, within the ST practice, there are no "bad" or "good" decisions.
Before starting the VR experience, an ID identifier and a computer were assigned to each participant so that the information from the systems thinking questionnaire would be matched to the data from the VR scenario. Each student was assisted by a member of the research team to ensure that the experiment was conducted properly. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Institutional Review Board of Mississippi State University (IRB-18-379).
The VR scenarios started with an identification window where the assistant entered the participant ID and selected a mitigation type. To begin the VR scenarios, the student was asked to "Click the A button to start" by pointing the laser at the caption, as illustrated in Figure 2a. Oral instructions that were provided to facilitate and direct the participants' interactions could be activated/deactivated during the simulation. The first audio recording began after pressing the caption and indicated how to move and interact in the scenes using the buttons and triggers on the touch controllers (see Figure 2b).
To start with the first VR scene, main store I, the participant was required to target and press a blue orb to continue with the simulation (see Figure 3a). Once the orb was selected, an audio recording was prompted to illustrate what the participant was supposed to do in the virtual supply chain store. The VR scenarios were reformulated, real-life events of the systems thinking instrument, which ensured a better output since the participants would respond to the situations based on their understanding (see Figure 3).
The VR scenarios started with an identification window where the assistant entered the participant ID and selected a mitigation type. To begin the VR scenarios, the student was asked to "Click the A button to start" by pointing the laser at the caption, as illustrated in Figure 2a. Oral instructions that were provided to facilitate and direct the participants' interactions could be activated/deactivated during the simulation. The first audio recording began after pressing the caption and indicated how to move and interact in the scenes using the buttons and triggers on the touch controllers (see Figure 2b).  To start with the first VR scene, main store I, the participant was required to target and press a blue orb to continue with the simulation (see Figure 3a). Once the orb was selected, an audio recording was prompted to illustrate what the participant was supposed to do in the virtual supply chain store. The VR scenarios were reformulated, reallife events of the systems thinking instrument, which ensured a better output since the participants would respond to the situations based on their understanding (see Figure 3). Depending on the user's preference, the following scene could be either the can stacking or the Christmas decoration scene. This revealed one trait of the individual's systems skills. The can-stacking scene consisted of three shelves with three different types of cans on each. The user was asked to place the rest of the cans on the shelves; he/she would either place the cans the same way they were given or as he/she pleased. Based on the way the user performed, another trait of their ST skills was reflected. Similarly, the Christmas scene was developed to identify individuals' systems skills/preferences. In this scene, there were two trees: one decorated and one undecorated. The participant was asked to decorate the undecorated tree using the same ornaments used in the first tree.   Depending on the user's preference, the following scene could be either the can stacking or the Christmas decoration scene. This revealed one trait of the individual's systems skills. The can-stacking scene consisted of three shelves with three different types of cans on each. The user was asked to place the rest of the cans on the shelves; he/she would either place the cans the same way they were given or as he/she pleased. Based on the way the user performed, another trait of their ST skills was reflected. Similarly, the Christmas scene was developed to identify individuals' systems skills/preferences. In this scene, there were two trees: one decorated and one undecorated. The participant was asked to decorate the undecorated tree using the same ornaments used in the first tree.

Figures 4 and 5 demonstrate both Christmas and can-stacking scenes, respectively.
At the end of the second or third scene, the user was required to click on the appearing blue orb, which took him/her back to the main store scene II. In this scene, he/she responded to further questions and was then transferred to the final scene, the Christmas inventory scene, where the user interacted with three animated characters, as shown in Figure 6. These characters were employees working in the retail grocery supply chain system. To start with the first VR scene, main store I, the participant was required to target and press a blue orb to continue with the simulation (see Figure 3a). Once the orb was selected, an audio recording was prompted to illustrate what the participant was supposed to do in the virtual supply chain store. The VR scenarios were reformulated, reallife events of the systems thinking instrument, which ensured a better output since the participants would respond to the situations based on their understanding (see Figure 3). Depending on the user's preference, the following scene could be either the can stacking or the Christmas decoration scene. This revealed one trait of the individual's systems skills. The can-stacking scene consisted of three shelves with three different types of cans on each. The user was asked to place the rest of the cans on the shelves; he/she would either place the cans the same way they were given or as he/she pleased. Based on the way the user performed, another trait of their ST skills was reflected. Similarly, the Christmas scene was developed to identify individuals' systems skills/preferences. In this scene, there were two trees: one decorated and one undecorated. The participant was asked to decorate the undecorated tree using the same ornaments used in the first tree.     At the end of the second or third scene, the user was required to click on the appearing blue orb, which took him/her back to the main store scene II. In this scene, he/she responded to further questions and was then transferred to the final scene, the Christmas inventory scene, where the user interacted with three animated characters, as shown in Figure 6. These characters were employees working in the retail grocery supply chain system.
(a) (b)  Table 5 provides a glimpse of the existing scenes. These developed scenes acted as a baseline in measuring the systems thinking skills. The objective of this scene is to examine the individuals' preferences in dealing with small vs. large systems. Fifth Scene: The Christmas Inventory In this scene, three workers give their point of view regarding the low inventory levels. Based on the given opinions, the participant determines the best solution for this issue. Finally, the participant verifies the performance of the process.
The scene assesses individuals' tendency to solve a complex system problem based on few people vs. many people judgments. The scene indicates participants' approach to determine the right solution vs. an apt solution for a complex system issue. The last scene pinpoints participants' inclination to quit or accommodate the system when the desired performance is reached.

Experimental Design Steps and Study Population
A total of 30 participants participated in this study based on immersive VR scenarios for systems skills thinking. The users were not allowed to wear the headset until they felt comfortable in the virtual environment before beginning the assessment. The first scene (the main store I) began with audio and closed-captioned instructions for the user, and  Table 5 provides a glimpse of the existing scenes. These developed scenes acted as a baseline in measuring the systems thinking skills. Table 5. Systems thinking preferences' dimensions.

Scene Description ST Measurements
First Scene: The Main Store I Participant selects one of the offered options: Christmas scene or can stacking scene.
The aim of this scene is to determine the participants' preference regarding typical versus peculiar complex systems.
Second Scene: The Can Stacking In this scene, the participant fills the shelves with the given cans.
The scene evaluates the participant's inclination toward working in standardized vs. working in unique complex systems.

Third Scene: The Christmas Decoration
Participant decorates the Christmas tree. The scene evaluates the participant's inclination toward working in standardized vs. working in unique complex systems.
Fourth Scene: The Main Store II Participant selects the system view to examine the inventory levels.
The objective of this scene is to examine the individuals' preferences in dealing with small vs. large systems.

Fifth Scene: The Christmas Inventory
In this scene, three workers give their point of view regarding the low inventory levels. Based on the given opinions, the participant determines the best solution for this issue. Finally, the participant verifies the performance of the process.
The scene assesses individuals' tendency to solve a complex system problem based on few people vs. many people judgments. The scene indicates participants' approach to determine the right solution vs. an apt solution for a complex system issue. The last scene pinpoints participants' inclination to quit or accommodate the system when the desired performance is reached.

Experimental Design Steps and Study Population
A total of 30 participants participated in this study based on immersive VR scenarios for systems skills thinking. The users were not allowed to wear the headset until they felt comfortable in the virtual environment before beginning the assessment. The first scene (the main store I) began with audio and closed-captioned instructions for the user, and the user was shown how to select the "Begin" button to start the assessment. This method of selecting objects and progressing forward in the assessment was repeated throughout the entire VR scenarios. Figure 7 below shows the data collection flow of the entire study and Figure 8 depicts the flow of VR scenarios.
Systems 2021, 9, x FOR PEER REVIEW 13 of 25 the entire VR scenarios. Figure 7 below shows the data collection flow of the entire study and Figure 8 depicts the flow of VR scenarios.  After the participants completed the VR scenarios, they were asked to complete three questionnaires (post-simulation sickness, system usability, presence questionnaire) and the NASA TLX [108] survey to evaluate their perceived workload of the environment. Six demographic questions (Gender, Origin, Race, Birth Year, Classification, and Major), three  the entire VR scenarios. Figure 7 below shows the data collection flow of the entire study and Figure 8 depicts the flow of VR scenarios.  After the participants completed the VR scenarios, they were asked to complete three questionnaires (post-simulation sickness, system usability, presence questionnaire) and the NASA TLX [108] survey to evaluate their perceived workload of the environment. Six demographic questions (Gender, Origin, Race, Birth Year, Classification, and Major), three After the participants completed the VR scenarios, they were asked to complete three questionnaires (post-simulation sickness, system usability, presence questionnaire) and the NASA TLX [108] survey to evaluate their perceived workload of the environment. Six demographic questions (Gender, Origin, Race, Birth Year, Classification, and Major), three background questions (Virtual reality experience, Video game playing experience, and Retail store experience), and two dependent variables (The Level of Complexity scores in the ST skills instrument and the Level of Complexity scores in the ST skills VR gaming scenario) were collected from the participants.
The male students made up 63% of the sample and the majority (63%) of students were domestic. Among them, 50% of the participants were born between 1995 and 2000. Over 60% of the participants were pursuing bachelor's degrees and 70% reported as Industrial Engineering students. For the background questions, students were asked to respond to a 5-point Likert scale (0-4) to describe their previous experiences. For the Video Game Experience, the scale was formed as: 0 = Never Played, 1 = Once or Twice, 2 = Sometimes, 3 = Often, and 4 = Game Stop Second Home. The description for the other two background questions was: 0 = None, 1 = Basic, 2 = Average, 3 = Above Average, and 4 = Expert. Around 36% of the participants had an average experience toward virtual reality and none of them described themselves as an expert. Regarding the video game-playing experience, 50% of the students rated themselves as occasional players and 16% as expert players. The majority of the participants (53%) declared that they had average knowledge regarding the retail store experience.

Simulation Sickness Mitigation Techniques
In previous VR studies, Hamilton et al. [79] and Ma et al. [72] identified some users who were unfamiliar with video games, VR technology, and other immersive environments. To improve the VR experience and minimize simulation sickness among participants, three mitigation techniques were employed with this study's VR immersive scenarios. For the first mitigation technique, the regular Unity field of view was assigned and researchers designed an increasing reticle for the second option. The increasing reticles were designed in a way that three glowing rings appeared in the center field of view based on the velocity of user rotation in the VR simulation. Starting with one small ring with slight movements, users would glimpse three glowing rings with higher speeds. The reason for implementing the increasing reticle was to reduce simulation sickness by maintaining the user focus on the center field of view. In the third mitigation technique, users experienced a peripheral view during intense motion. To implement the third option, researchers used VR Tunnelling Pro asset from the Unity Asset store. The asset comes with multiple tunneling modes (3D cage, windows, cage, etc.) that work by fading out users' peripheral view without significant information loss. This tunneling technique is capable of reducing simulation sickness when users engage with intense thumbstick movements in VR simulations.

Results
In this section, students' systems thinking skills was assessed along with three efficacy measures results: simulation sickness, system usability, and user experience. In addition to the efficacy measures, NASA TLX assessment was used to measure users' perceived work effort to finish VR scenarios.

The Assessment of Participants' Systems Thinking Skills
The study used two ST skills score sheets. The first one was prepared from the student's preferences in the ST skills instrument and the second one was prepared from the student's decisions in the VR scenarios. The primary focus of preparing score sheets was to investigate the student's responses toward a high-systematic approach to evaluate their level of systems skills. All responses were captured from binary-coded questions, and the responses for high-systematic skills were referred to as irregular patterns, unique approach, large systems, many people, a working solution, and adjusted system performance in the scenes.
The score of the high-systematic skills for each participant ranged between 0-6, and the distribution of scores is shown in Figure 9. The shape of the distribution of the ST skills instrument and ST skills VR scenario scores were non-normally distributed. The average score of the ST skills instruments (Mean (M) = 3.57, Standard Deviation (SD) = 1.736) was higher than the participant's average score in the VR scenarios (M = 2.97, SD = 1.245). To investigate the mean score differences between the ST skills instrument and the VR scenarios, a paired sample t-test was performed under the normality assumption. First, the Shapiro-Wilk normality test confirmed the normality of score differences for the matched pairs in both scoring sheets (W = 0.954, df = 30, p = 0.213). Additionally, the Q-Q plot was plotted to confirm the normality of the score differences, as shown in Figure 10. Both the plot and the normality tests confirmed that the distribution of data was not significantly different from a normal distribution. With the confirmed normality, a paired samples t-test was carried out and the test confirmed that students' ST skills scores in VR scenarios were not significantly different from their ST skills scores via the ST skills instrument (t (29) = −1.469, p = 0.153). The results verified the construct validity of VR scenarios used to measure the students' ST skills in the study. These results also confirmed the validity and usability of the ST skills instrument, where the same binary questions were presented via VR scenarios in a different setting with no significantly different results. Furthermore, a one-way Analysis of Variance (ANOVA) was carried out to investigate the impact of demographic and knowledge variables on VR ST skills scores. Again, the Shapiro-Wilk normality test confirmed the To investigate the mean score differences between the ST skills instrument and the VR scenarios, a paired sample t-test was performed under the normality assumption. First, the Shapiro-Wilk normality test confirmed the normality of score differences for the matched pairs in both scoring sheets (W = 0.954, df = 30, p = 0.213). Additionally, the Q-Q plot was plotted to confirm the normality of the score differences, as shown in Figure 10. Both the plot and the normality tests confirmed that the distribution of data was not significantly different from a normal distribution. To investigate the mean score differences between the ST skills instrument and the VR scenarios, a paired sample t-test was performed under the normality assumption. First, the Shapiro-Wilk normality test confirmed the normality of score differences for the matched pairs in both scoring sheets (W = 0.954, df = 30, p = 0.213). Additionally, the Q-Q plot was plotted to confirm the normality of the score differences, as shown in Figure 10. Both the plot and the normality tests confirmed that the distribution of data was not significantly different from a normal distribution. With the confirmed normality, a paired samples t-test was carried out and the test confirmed that students' ST skills scores in VR scenarios were not significantly different from their ST skills scores via the ST skills instrument (t (29) = −1.469, p = 0.153). The results verified the construct validity of VR scenarios used to measure the students' ST skills in the study. These results also confirmed the validity and usability of the ST skills instrument, where the same binary questions were presented via VR scenarios in a different setting with no significantly different results. Furthermore, a one-way Analysis of Variance (ANOVA) was carried out to investigate the impact of demographic and knowledge variables on VR ST skills scores. Again, the Shapiro-Wilk normality test confirmed the With the confirmed normality, a paired samples t-test was carried out and the test confirmed that students' ST skills scores in VR scenarios were not significantly different from their ST skills scores via the ST skills instrument (t (29) = −1.469, p = 0.153). The results verified the construct validity of VR scenarios used to measure the students' ST skills in the study. These results also confirmed the validity and usability of the ST skills instrument, where the same binary questions were presented via VR scenarios in a different setting with no significantly different results. Furthermore, a one-way Analysis of Variance (ANOVA) was carried out to investigate the impact of demographic and knowledge variables on VR ST skills scores. Again, the Shapiro-Wilk normality test confirmed the normality of the distribution of VR scores (W = 0.921, p = 0.059). All independent variables had no statistically significant influence on students' ST skills except knowledge in VR (F (3) = 3.041, p = 0.047). The post hoc Scheffe test revealed that the level of average knowledge in VR was significantly different from the level of above-average knowledge in VR and the latter group showed a higher average systematic score (M = 3.71, SD = 1.380) than the level of average knowledge in VR (M = 2.18, SD = 0.982).

Simulation Sickness Assessment
Simulation sickness is a type of motion sickness that can occur during a VR simulation that results in sweating and dizziness. The user's inability to sync between the visual motion and the vestibular system was the main reason for such discomfort in virtual environments. To reduce simulation sickness in this study, students were permitted to play with VR headsets and Unity before engaging with the actual study. As demonstrated in the data collection flow, participants marked their prior experience with simulationrelated activities on a pre-simulation sickness questionnaire and responded to the postquestionnaire with the new experience at the end of the study. The questionnaire captured 16 probable symptoms that can be placed into three general groups through factor analysis: Nausea, Oculomotor, and Disorientation [105]. For each symptom, a four-point Likert was used to capture the degree of user discomfort (0 = none, 1 = slight, 2 = moderate, 3 = severe).
The scores for nausea, oculomotor, and disorientation in the pre-questionnaire were 7.63, 12.38, and 12.99, respectively, and the overall SSQ score was 12.59. This indicated that users experienced significant symptoms on previous simulation-related activities. The scores for three sub-symptoms in the post SSQ questionnaire were 13.67, 17.43, and 22.72, respectively. The overall SSQ score was 19.95 and the score verified the VR module was in an acceptable range and no immediate modifications were needed. However, the score triggered a necessity for design modifications to ensure a smooth simulation for future studies. The paired sample t-test confirmed that post-simulation scores were not significantly different from their prior simulation sickness scores, at a 0.05 significance level. Table 6 presents the SSQ scores concerning independent variables. The ANOVA results revealed that none of the demographics or knowledge-based questions significantly impacted the SSQ score, at 0.05 significance. This means gender, field of study, age, or any previous knowledge in similar technology/content made no difference in simulation sickness in this study. Furthermore, the three mitigation techniques were not significantly different from each other; however, the mean of no mitigation technique indicated fewer simulation sickness symptoms than the other mitigation techniques.

System Usability Assessment
System usability is a measurement used to assess the easiness of a given system to its users. The SUS questionnaire, which is used widely to capture user response toward the usability of a system, covers four important factors: efficiency, satisfaction, ease, and intuitiveness [106]. In this study, the SUS questionnaire was prepared with 10 items, including six positively worded items (1,2,3,5,7,9) and four negatively worded items (4,6,8,10). A five-point Likert scale was initially used and then converted to a scale ranging from 0 to 4. For the positively worded items, the scale was developed by subtracting one from the user response, and the scale for negatively worded items was developed by subtracting user response from five. The final SUS score was calculated by multiplying the sum of the adjusted score by 2.5. The new scale ranged from 0 to 100, and a score above 68 was considered as an above-average user agreement while any score less than 68 was deemed as below-average user agreement [109].
As shown in Table 7, all items were above 2, indicating the average user agreement of system usability of the developed VR scenarios. The total SUS score, which was 74.25 and above-average agreement, confirmed that users considered the VR scenarios to be effective and easy to use. Table 8 presents the SUS score with respect to independent variables. Similar to SSQ, independent variables did not significantly affect SUS scores, at 0.05 significance level. Interestingly, prior experience in VR or mitigation techniques did not impact the usability of the developed VR scenarios.

User Presence Experience Assessment
User presence can be defined as the sense of 'being there' in a computer-simulated environment. Similarly, Witmer and Singer [107] (p. 225) described the presence as "experiencing the computer-generated environment rather than the actual physical locale." PQ consists of 22 six-point Likert scale questions to capture user agreement covering five subscales: involvement, immersion, visual fidelity, interface quality, and sound. The first 19 questions, excluding sound items, were used to calculate the total PQ score and the total score, ranging from 0 to 114. Table 9 demonstrates the average score for five subscales in PQ and average score indicates all subscales had "above average" (>4) user agreement except interface quality. The below-average score for interface quality emphasized the need for a better visual display quality that did not interfere with performing tasks in future VR modules. The average PQ score for 19 items was 78.7, indicating the user experience for the developed VR scenarios was in an acceptable range. The impact of independent variables on PQ is shown in Table 10. The ANOVA result showed only nationality and age had a significant impact on the PQ score, at 0.05 significance level. The post hoc Scheffe test revealed that international students perceived a higher user experience than domestic students. Also, the post hoc test showed that students born in 1986-1990 were significantly different from students who were born in 1996-2000, at 0.05 significant level. The mean PQ value suggested that older participants perceived greater user experience than the younger students in the VR simulation.

NASA Task Load Index (NASA TLX) Assessment
NASA TLX is a multi-dimensional scale that assesses a user's perceived workload for a given task [108]. Six subscales were used to calculate the overall workload estimates regarding different aspects of user experiences: mental demand, physical demand, temporal demand, performance, effort, and frustration. These subscales were designed to represent the user workload after a thorough analysis conducted on different types of workers performing various activities [110]. The overall score varied between 0-100, and higher scores indicated greater perceived workload for the given task.
The mean and median measures for overall NASA TLX scores suggested that participants required approximately 36% work effort to perform the activities in the developed VR scenarios. The interquartile range reported 50% of the user overall scores ranged between 26% and 47%. Figure 11 displays the weighted scores for six subscales based on user responses. The performance dimension represented the highest contribution to overall index scores. Mental demand was the second-highest contributor and physical demand represented the lowest contribution for perceived workload in regard to performing the tasks in the VR scenarios.
Systems 2021, 9, x FOR PEER REVIEW 19 of represented the lowest contribution for perceived workload in regard to performing t tasks in the VR scenarios. For further investigation, the impact of independent variables on the overall ind score was analyzed. The normality of the distribution of the overall index score was test with the Shapiro-Wilk normality test, and the p-value recommended that the scores we normally distributed (W = 0.9650, p = 0.4137). A one-way ANOVA was used to evalua the differences in the levels of independent variables on the overall scores. As shown Table 11 below, the results indicated none of the independent variables significantly i pacted the overall index score.

Conclusions and Future Directions
The main focus of this study was to assess a student's systems thinking skills throu developed VR scenarios. Researchers used a valid systems thinking skills tool develop by Jaradat [15] to construct VR scenarios representing complex, real-world problems. VR retail store combined with realistic scenarios was used to evaluate students' level complexity using six binary questions. Two scoring sheets were prepared to record a st dent's high-systematic approach, and the result showed the approach in which studen reacted to the VR scenarios were not significantly different from their response obtain from the traditional systems skills instrument. This confirmed the construct validity of t ST skills instrument and the reliability of using VR scenarios to measure students' hi For further investigation, the impact of independent variables on the overall index score was analyzed. The normality of the distribution of the overall index score was tested with the Shapiro-Wilk normality test, and the p-value recommended that the scores were normally distributed (W = 0.9650, p = 0.4137). A one-way ANOVA was used to evaluate the differences in the levels of independent variables on the overall scores. As shown in Table 11 below, the results indicated none of the independent variables significantly impacted the overall index score.

Conclusions and Future Directions
The main focus of this study was to assess a student's systems thinking skills through developed VR scenarios. Researchers used a valid systems thinking skills tool developed by Jaradat [15] to construct VR scenarios representing complex, real-world problems. A VR retail store combined with realistic scenarios was used to evaluate students' level of complexity using six binary questions. Two scoring sheets were prepared to record a student's high-systematic approach, and the result showed the approach in which students reacted to the VR scenarios were not significantly different from their response obtained from the traditional systems skills instrument. This confirmed the construct validity of the ST skills instrument and the reliability of using VR scenarios to measure students' high systematic skills. The student's prior knowledge in VR significantly impacted his/her systematic skills, in which students with above-average prior VR exposure advanced the higher-systematic skills in the study.
The study showed gender does not affect the students' systems thinking skills. This result is consistent with other studies in the literature. For example, Stirgus et al. [111] showed that both male and female engineering students demonstrated a similar level of systems thinking skills in the domain of complex system problems. Cox et al. [112] also showed that gender had no effect on student's systems thinking ability based on a study conducted to investigate the systems thinking level of last-or penultimate-year of secondary-school (age 16-18 year) students in Belgium. The literature shows that the level of education is considered a significant factor in assessing individuals' systems thinking skills. For example, Hossain et al. [113] and Nagahi et al. [114] explained that individuals with higher education backgrounds tend to be more holistic thinkers. The findings of this study are consistent with previous results with regards to the individuals' simple average ST scores.
The efficacy results revealed that the developed VR scenarios are an efficient mechanism by meeting user expectations. The post-simulation sickness results indicated that the VR scenarios are in an acceptable range for users to access with no immediate modifications. The simulation sickness associated with the developed VR scenarios made no difference with their previous simulation-related experiences. The regular unity field of view indicated lower simulation sickness symptoms compared to the two new mitigation techniques (increasing reticle and peripheral view) employed with VR simulation. The participants indicated 'above-average' user agreement for the usability of the VR scenarios. This result also implied the user friendliness and ease of use of the VR scenarios. Furthermore, users experienced the virtual environment with no technical interference except lesser interface quality. The PQ results also showed that age positively influenced the user experience in the study.

Managerial Implications
Knowing an individual's systems thinking skill is vital for many organizational personnel, including recruitment managers and decision makers. A thorough review of the literature showed that practitioners use limited tools and techniques to measure an individual's systems thinking skills for making decisions in their organizations. In this study, researchers used an advanced, multi-dimensional tool to capture users' systems thinking characteristics in a complex supply chain store. The developed VR scenarios reflect the effective usage of advanced technologies to measure individual systems thinking skills. Unlike traditional paper-based evaluations, VR technology provides an opportunity for users to interact with scenario-based, real-world, complex problems and respond accordingly. Some of the potential research implications can be categorized as follow.

•
There are many related theories, concepts, perspectives, and tools that have been developed in the systems thinking field. Still, this study serves as the first-attempt research that bridges the ST theories and latest technology to measure an individual's ST skills by simulating real-world settings. • This research used VR to replicate the real-world, complex system scenarios of a large retail supply chain; however, researchers/practitioners can apply the same concept to other areas such as military, healthcare, and construction by developing and validating different scenarios relevant to their field.

•
The findings of this research confirmed that modern technology is safe and effective to measure individual's level of ST skills. These VR scenarios work as a recommender system that can assist practitioners/enterprises to evaluate individual's/employees' ST skills.

Limitations and Directions for Future Studies
The current study measured high systematic skills using the Level of Complexity dimension in the ST skills instrument. To provide a complete assessment of an individual's ST skills, all seven dimensions will be modeled into a VR simulation in future studies. The user responses toward efficacy measures heightened the researchers' interest and attention toward new features in future studies. More, new mitigation techniques will be integrated with future VR modules to investigate lower simulation sickness complications. To provide better interface quality for users, advanced graphics will be included in future studies. Moreover, new evaluations will be used to assess the efficiency and effectiveness of the VR modules, along with alternative simulation technologies. Alternative multi-paradigm modeling tools and cross-platform game engines (Simio or Unreal Engine) can be used to evaluate user satisfaction in future studies. For this study, researchers used Oculus Rift S to connect users with modeling software. Other, cheaper VR devices such as HTC Vive and PlayStation VR can be compared with Oculus to explore possibilities for better user experience. Since the sample size of this study is considered small, more studies are needed to collect more data sets to better draw conclusions of the proposed methodology.