Impact of Virtual Reality-Based Design Review System on User’s Performance and Cognitive Behavior for Building Design Review Tasks

: Virtual reality (VR) can potentially enhance various design and construction assessment intensive tasks, such as construction design and review. However, it may lead to cognitive overload, adversely affecting the participants’ performance. It is critical to understand the effects of VR cognitive behavior for implementing VR technology in the construction industry. The principal objective of this study was to investigate the participants’ cognitive load (CL), task performance (TP), and situational awareness (SA) in the VR environment for the evaluation of building design review tasks. Participants were asked to review the design task based on their memory knowledge and understanding in one of the three environments: paper-based, monitor-based, and immersive virtual environment. Participants’ CL was measured using the National Aeronautics and Space Administration Task Load Index (NASA TLX), TP was evaluated on completion time and the number of errors correctly detected, and situational awareness (SA) was assessed using the Situational Awareness and Review Technique (SART). The statistical results show a high CL and better performance in the immersive virtual environment. These ﬁndings can contribute to a better understanding of cognitive process characteristics and capabilities for design review activities in the VR environment. NASA-TLX, the results show that NIVE a lower CL than who IVE The results of this study indicate that participants have better work performance in the virtual environment as they have identiﬁed more errors in virtual environments than in traditional drawings. However, the total cognitive score was greater in virtual environments. The IVR increased the participants’ understanding and they were less aware of their surroundings. Participants’ task completion time was reduced in the latter two virtual environments than in the traditional paper-based drawings. The key ﬁnding of this study is that the virtual environment affects the participants’ TP, CL, and SA in design review tasks. The ﬁndings from this research suggest that these VRE aided the construction professional in terms of exhaustive information provided to them in different formats. Another ﬁnding of this research is that it helps us better understand cognitively demanding problems and helps design the construction documents more appropriately, which will help professionals work more efﬁciently in virtual environments.


Introduction
The Architecture Engineering and Construction (AEC) industry produces complex, customized, temporary, and unique products. The design process is iterative and strongly relies on individual experience [1]. The designer uses their mental abilities or supporting informational documents to develop the AEC design model. One major challenge the ACE industry faces is the inefficient construction design approval process due to the slow adoption of modern technologies such as Building Information Modelling (BIM), Virtual Reality (VR), Augmented reality (AR), and cloud computing [2][3][4]. These modern technologies have the potential to address these paucities, and its multifaceted industry implementations make it more suitable for construction projects [5]. BIM allows to look into the design and functional properties of the building as well as perform other tasks such as cost estimate, project planning, scheduling, resource management and structural analysis etc. [6][7][8]. It can also help the construction industry improve safety planning, on-site communication, constructability, and design review meetings [9][10][11]. Furthermore, VR technology has emerged recently and revolutionized multiple sectors. The AEC sector has embraced the VR application with BIM to improve the visualization of a virtual world and interaction with the real world and its components as VR technologies combine the information system and immersive environments.
BIM-based VR technologies enable the project stakeholders to walk through a virtual environment (VE) while viewing the 1:1 scale three-dimensional (3D) model. Users can navigate the model with the same scale as the actual one and review the design of buildings, e.g., sill level of the windows and doors, ceiling height, and beam-column size [12]. Understanding the significance of the Virtual Reality Environment (VRE) is important because it creates the intermediate design for the organization that can evaluate design at the time of critical analysis [4]. Most of the time, design review meetings in the VRE require modifications between different components of architecture design [13]. During the project phase, the important decision regarding cost, quality, and schedule influence the overall construction estimate. So, in the design review meeting, each activity and its related material and specifications are discussed. In the end, amendments are made in the initial draft before the commencement of actual construction.
VR in the construction design visualization represents the experiential architecture human experience in the real world and is expected to boost construction efficiency and save cost and time [14]. As a result, VR helps the cognitive-based construction tasks, including assembly placing [15], arrangement and inspection [16], and minimizing mental effort and task completion time. VR can improve spatial and conceptual learning, immersion and presence, and cognitive and psychomotor outcomes. However, studies have demonstrated that when it comes to cognitive effects such as knowledge and understanding, Immersive Virtual Reality (IVR) does not outperform the traditional approaches or Non-Immersive Virtual Reality (NIVR). Since IVR is a 3D, 360 • experience, this would likely provide more information than the traditional methods [17]. Immersive interaction can lead to cognitive effects, waste of time, loss of access to reality, and powerful emotions [18]. Augmented reality frequently supports users' cognitive abilities by providing superimposed information. However, such knowledge can cause cognitive overload, which might negatively impact the participants' performance [19]. Zhong studied that VR training may help individuals with their cognitive and executive function [20]. Another study stated that CL will play a major part in high IVR device applications in the future and many researchers want to explore CL in these new environments [21]. Cognitive Load (CL) is concerned with the transmission of knowledge from working memory to long-term memory. Up to now, there have been no comprehensive studies investigating the impact of VR on user performance and cognitive behavior for design review tasks in the building construction industry.
This study is set out to explore the impact of VR-based construction design review tasks on construction professionals by investigating the CL, Task Performance (TP), and Situational Awareness (SA) of participants in three distinct environments: 3D monitorbased VR, head-mounted-based VR, and paper-based design review techniques. An experimental methodological approach was adopted to achieve the research objective; three participant groups were provided residential building design review tasks using one of the techniques: VR headset, monitor screen, and traditional paper drawings. The TP is evaluated on task completion time and error rate. The CL is calculated using the National Aeronautics and Space Administration Task Load Index (NASATLX) [22], and SA is assessed using the Situational Awareness and Review Technique (SART) [23] in the sitelike design simulation setting. The results discuss important insight into participants' TP, CL, and SA in three distinct environments and the impact of VR-based construction design tasks. The present research makes an important contribution and is the first extensive study to examine the user performance and cognitive behavior in VRE for design review tasks in the construction industry.

Literature Review
VR is a simulation of an environment or computer-generated VE that allows participants to experience a place or event differently than where they are physically present; a flight simulator is an early example of VR technology [24]. In 1838 Charles Whetstone's work featured two mirrors positioned at a 45 • angle to the user's eye, each reflecting an image located offsite as it was the first concept to provide VR a sensory feeling of immersion [25]. In 1950 Sensorama was the first sensory display invented by Morton Heilig [26]. It was a scripted arcade-like experience, and after 11 years, he also invented the first head-mounted display (HMD) prototype that provided a stereoscopic image with stereo sound. However, there was no interactive response or motion tracking. According to previous studies, VR commercial development began in 1988, but in 1991, the first commercial VR entertainment system was unveiled called "Virtuality" [27]. In 1992, Steuer defined VR as a type of human experience enabled by the sensation of being present in a given environment [28].
VR is used in many applications because of technological advancements in medical sciences, video gaming, cinema and entertainment, education and training, engineering, architecture, and urban planning. Palmer Luckey designed the prototype of the Oculus rift, which had the capability of rotational tracking [29]. In 2015, HTC and Valve corporation collaborated on developing the HTC VIVE VR headset and motion controller, and both were built on Valves' steam VR platform. Novel positional tracking technology was introduced in this release, which used infrared light and specially designed wall-mounted base stations to track the user's location. At the start of 2017, Sony developed a similar tracking system for PlayStation VR and used the same technology to create a wireless headset. In 2019, the standalone headset Oculus Quest and the Oculus Rift S were launched by Oculus. These headsets used inside-out tracking, which differed from the outside-in tracking used in earlier headsets [30]. Later in 2019, Valve introduced notable features of a 130 • field of vision and off-ear headphones for comfort and immersion. These openhanded controllers support individual finger tracking, front-facing cameras, and a front expansion slot designed for extensibility [31]. Oculus introduced the Oculus Quest 2 in 2020 with improved performance, a lower price, and a better screen. To use this new headset, Facebook users must sign in using a Facebook account [32]. In 2021, the European Union Aviation Safety Agency (EASA) approved the first Flight Simulation Training Device based on VR. The device makes rotorcraft pilots safer by letting them practice dangerous moves in a virtual environment [33]. As COVID-19 regulations were enacted in 2020 and 2021, the virtual reality industry witnessed a rapid boom.

Virtual Reality in the Design and Construction
Recent technological advancements have enabled construction practitioners to improve the project design's construction methodology and quality to achieve success. In the early design process, 2D architectural drawings cannot represent and communicate the number of possible solutions. Evaluating a design against construction requirements and specifications is known as a design review [34]. Previously, a common way to conduct a design review was using two-dimensional (2D) computer-aided design and physical assets [35]. Design review has evolved to include different visualization tools, owing to the rapid development of technology in the construction industry [36,37]. The use of computergenerated designs and visualizations has been improved in the recent past. This continual improvement process in visualization has reduced design review problems [36,38].
A visualization technology gaining interest in design reviews is IVR. The study of [39] found that design reviews in the VE result in a better understanding of the proposed design, more efficient meetings, and team management. The study of [40] proposed that VR engages reviewers by reducing the effort required to contemplate the design; they also conclude that level of detail in the VR model is important because too many details may disproportionately affect the original purpose of reviewing. VR application has been seen in industries other than construction, such as reviewing the performance of nuclear power plants [41], medical science patient rooms [42], education [43], and safety training [44]. Paes and Irizarry compared the traditional workspace with IVR platforms and found that users' spatial perception improved in an immersive virtual environment (IVE) [45]. Florio suggests that the design reviewer uses the visualization tools of the models and prototypes to "confirm or reject each hypothesis" during this experimental process, known as design review or critical analysis [14]. On the other side, the use of virtual 3D models helps the stakeholders to understand the design rather than those who understand the symbol and notations of 2D, resulting in improved communication, collaborative task, and the development of more integrated solutions [46][47][48].

Impact of Virtual Reality on the Cognitive Load, Task Performance, and Situational Awareness
According to research, cognition arose in tandem with the advancement of computers and artificial intelligence (AI) [14]. The term cognition is associated with computing and analyzing data information. Researchers define the ability to acquire knowledge that involves rich information through reasoning and perception. Human cognition involves gathering information and developing experiences from their interactions with the environment, as shown in Figure 1 [49]. Every human perceives, processes, and creates a mental portrayal of their particular reality. According to the author [50], the designer thinks about what he is doing, calling it the "reflection-in-action" process.
disproportionately affect the original purpose of reviewing. VR application has been seen in industries other than construction, such as reviewing the performance of nuclear power plants [41], medical science patient rooms [42], education [43], and safety training [44]. Paes and Irizarry compared the traditional workspace with IVR platforms and found that users' spatial perception improved in an immersive virtual environment (IVE) [45]. Florio suggests that the design reviewer uses the visualization tools of the models and prototypes to "confirm or reject each hypothesis" during this experimental process, known as design review or critical analysis [14]. On the other side, the use of virtual 3D models helps the stakeholders to understand the design rather than those who understand the symbol and notations of 2D, resulting in improved communication, collaborative task, and the development of more integrated solutions [46][47][48].

Impact of Virtual Reality on the Cognitive Load, Task Performance, and Situational Awareness
According to research, cognition arose in tandem with the advancement of computers and artificial intelligence (AI) [14]. The term cognition is associated with computing and analyzing data information. Researchers define the ability to acquire knowledge that involves rich information through reasoning and perception. Human cognition involves gathering information and developing experiences from their interactions with the environment, as shown in Figure 1 [49]. Every human perceives, processes, and creates a mental portrayal of their particular reality. According to the author [50], the designer thinks about what he is doing, calling it the "reflection-in-action" process. Virtual headsets are consumer-grade products that are scarce; thus, measuring the TP of these commercial VR is difficult. Further, these VR systems are composed of various components, including VR headsets, desktop monitors, smartphones, and VR applications. Each of these elements has a direct impact on the user's performance. TP measures assume that an individual's mental workload while interacting with the system during a task is a good indicator of CL [51]. Task completion time and error identification rate are examples of CL and TP metrics [52].
The NASA Task Load Index (TLX) is a subjective workload assessment tool that lets users perform subjective workload assessments on participants working with different Virtual headsets are consumer-grade products that are scarce; thus, measuring the TP of these commercial VR is difficult. Further, these VR systems are composed of various components, including VR headsets, desktop monitors, smartphones, and VR applications. Each of these elements has a direct impact on the user's performance. TP measures assume that an individual's mental workload while interacting with the system during a task is a good indicator of CL [51]. Task completion time and error identification rate are examples of CL and TP metrics [52].
The NASA Task Load Index (TLX) is a subjective workload assessment tool that lets users perform subjective workload assessments on participants working with different human-machine interface systems. In 1988, Hart and Staveland developed the NASA TLX questionnaire to quantify the physical and mental load associated with performing a given task [53]. NASA TLX uses a six-dimensional rating system to calculate an overall CL. This score is based on the weighted average of ratings on six subscales: mental demand, physical demand, temporal demand, performance, effort, and frustration. NASA TLX has measured CL in physical, virtual, simulations, and lab tests [15,19,54,55].
In the examination of SA measurement, Salman et al. [56,57] categorized existing techniques into five categories: (1) the physiological method [58] corresponds to heart rate, electroencephalography (EEG) [59,60] and now most recent electrodermal activity (EDA) [61]; (2) the performance method such as task success or failure, detection of hazards; (3) the observer rating technique such as the situational awareness behavioral rating technique [62]; (4) the self-rating technique such as the crew and mission awareness scale and situational awareness review technique (SART) [23]; (5) the freeze rating technique that is a situational awareness global assessment technique (SAGAT). All these above techniques have some benefits and drawbacks. According to researchers, SART is generally acknowledged as low cost, simple to perform, and easy to analyze [63,64]. This technique has three dimensions: (1) demand on the attentional resource (D), (2) supply of attentional resource (S), and (3) understanding of the situation (U).

Research Methodology
This paper proposes a new methodology to achieve the research objectives. For this reason, this study created the real-like experiment of a residential building to perform design review meetings of construction experts to find the design errors. VRE was created in the university BIM laboratory. Design review tasks are assigned to the participants in one of three modalities shown in Figure 2. One group used the Oculus Quest 2 headsetbased IVE for design review, the second group used the monitor-based non-immersive virtual environment (NIVE) for design review, and the third group used the traditional paper-based drawings. The experiment was performed to measure the impact of VE and traditional paper-based review on the participants' task performance (TP) (number of errors and task completion time) CL using NASA-TLX and SART. The experiment steps and how it is performed are shown in Figures 2 and 3.

Participants
Participants were selected based on their knowledge of the AEC from post-graduate students of the Civil Engineering Department of the National University of Science and

Participants
Participants were selected based on their knowledge of the AEC from post-graduate students of the Civil Engineering Department of the National University of Science and Technology. Ninety-six participants accepted the invitation to participate in the research after being informed through email and face-to-face interaction. All participants had civil engineering knowledge and were post-graduate students; for instance, 43 were from the construction engineering and management department, 29 were from the structural engineering department, and 24 were from the transportation engineering department. Among all these participants, 33 had field experience of one to four years. Participants were 22-30 years old, with an average age of 26. Of these 96 participants, 64 were male and 32 were female. A total of 22 participants had prior experience with virtual reality. Total participants were divided into three equal groups. Each group contained thirtytwo members; one group was for the immersive environment using Oculus Quest 2, the second group for the monitor-based VE and the third group for the paper-based drawings. The immersive group included 22 participants with prior VR experience and 10 willing participants without any prior VR experience. These 10 non-VR experienced participants were provided VR experience of 25-30 min through games at least one day before the experiment to avoid biases in the data. The participants' demographics in this study were gathered to see how they would affect the investigation's findings. Participants were asked about their knowledge of VR games because it has the same virtual interface as VR games. Participants who had played the game interacted with or knew this technology were recruited. These environments affect the participants' performance and presence in these environments.

Task Overview
All participants experimenting were asked to find out the design error in the drawing and design of the residential four-story building. The typical design errors and their categorization were collected from industry experts through interview and literature review [65]. Construction industry experts' work were from various construction sectors such as clients, consultants, contractors, and education. These design errors were incorporated into the building model of our study. The participant played the role of the construction design reviewer with a task performing on 12 types of design errors such as: (1) stair not connected to the upper floor, (2) slab and door/window clash, (3) column and door/window clash, (4) stair and beam clash, (5) stair and slab clash, (6) stair and column clash (7) sill height error, (8) sill height of windows error, (9) beam size changed, (10) column size changed, (11) extra beam, (12) and floor level changed error.

Experimental Procedure
In a paper-based design review experiment, all participants were asked to determine the design errors of each of the twelve types discussed above, using their mental abilities, as shown in Figures 4 and 5. The second group of participants performed the same task in the NIVE, which is a monitor-based design review, a 3D model of the building in which they navigate and can assess the errors in the building. The 3D building design model, was drawn in the Revit version 2020 and converted into a game-like VE. The participants navigated with the help of computer hardware devices. The last group of participants did the same task in an IVE using Oculus Quest 2 ( Figure 6).

Task Overview
All participants experimenting were asked to find out the design error in the drawing and design of the residential four-story building. The typical design errors and their categorization were collected from industry experts through interview and literature review [65]. Construction industry experts' work were from various construction sectors such as clients, consultants, contractors, and education. These design errors were incorporated into the building model of our study. The participant played the role of the construction design reviewer with a task performing on 12 types of design errors such as: (1) stair not connected to the upper floor, (2) slab and door/window clash, (3) column and door/window clash, (4) stair and beam clash, (5) stair and slab clash, (6) stair and column clash (7) sill height error, (8) sill height of windows error, (9) beam size changed, (10) column size changed, (11) extra beam, (12) and floor level changed error.

Experimental Procedure
In a paper-based design review experiment, all participants were asked to determine the design errors of each of the twelve types discussed above, using their mental abilities, as shown in Figures 4 and 5. The second group of participants performed the same task in the NIVE, which is a monitor-based design review, a 3D model of the building in which they navigate and can assess the errors in the building. The 3D building design model, was drawn in the Revit version 2020 and converted into a game-like VE. The participants navigated with the help of computer hardware devices. The last group of participants did the same task in an IVE using Oculus Quest 2 ( Figure 6).   The participants in all three groups were asked to complete the design review task as quickly and effectively as possible, with their reviewing speed and number of errors identified recorded in the meantime. After that, NASA-TLX was used to calculate the CL at the end of each group experiment. The measure of the SA of participants in these two The participants in all three groups were asked to complete the design review task as quickly and effectively as possible, with their reviewing speed and number of errors identified recorded in the meantime. After that, NASA-TLX was used to calculate the CL at the end of each group experiment. The measure of the SA of participants in these two VE and the real-like construction environment is created using the sound of a construction site. Participants' CL and TP were measured using the same technique discussed prior, while their SA was measured using the SART at the end of the later-described modality.

Measurements
The NASA-TLX method was used to measure the CL of the participant. It is widely adopted because it is low-cost and measures the subjective mental workload (MWL) The participants in all three groups were asked to complete the design review task as quickly and effectively as possible, with their reviewing speed and number of errors identified recorded in the meantime. After that, NASA-TLX was used to calculate the CL at the end of each group experiment. The measure of the SA of participants in these two VE and the real-like construction environment is created using the sound of a construction site. Participants' CL and TP were measured using the same technique discussed prior, while their SA was measured using the SART at the end of the later-described modality.

Measurements
The NASA-TLX method was used to measure the CL of the participant. It is widely adopted because it is low-cost and measures the subjective mental workload (MWL) assessment. It contains the six elements to measure: mental demand, physical demand, temporal demand, effort, frustration, and performance. All these elements are applied to measure the CL of participants except the physical demand, which means "physical effort required to do a task," which was not required in any of the three types of environments in our study. Performance, already present in the NASA-TLX elements, was also measured directly because the NASA TLX performance incorporates self-esteem, satisfaction, and motivation. As a result, participants in each experiment were rated on a scale of 1 = Low to 5 = High, based on mental demand, temporal demand, effort, frustration, and performance, as shown in Table 1. Table 1. Questions that were asked to measure cognitive load using NASA-TLX.

Mental demanding
Was the task mentally demanding? Temporal demanding Was the task temporally demanding (time pressure for completing the task)? Performance How successful were you in completing the task? Effort How much has hard work been performed to achieve the task?

Frustration
How much were you insecure, discouraged, irritated, or stressed during the task?
SART is a widely renowned technique to measure SA. It is a subjective rating technique for assessing a participant's SA after a trial. SA was measured at the last of an experiment using the seven-point Likert scale ranging from 1 = low to 7 = high. This technique contains the ten elements, which are: (1) Information quantity, (2) Information quality,  Table 2. Furthermore, these factors are divided into three categories: the allocation of attentional resources to the present situation (S), attentional resource demand (D), and the knowledge of the surrounding conditions (U). Where U represents the summation of (1)-(3), D represents a summation of (4)- (6), and S represents the summation of (7)- (10). The participants' overall SART score can be calculated using Equation (1). Finally, the TP was measured directly using the task completion time, such as how much time participants required to complete their design review task and the number of errors identified correctly in each type of environment during the experimental session.
where SA is situational awareness, U is understanding, D is attentional demand, and S is attentional supply.

Information Quantity
How much information about the surrounding did you take in?
Information Quality How well did you understand/comprehend the information about the surroundings that you took in? Familiarity How familiar were you with the surroundings during the task?

Data Analysis Techniques
Normality tests are sensitive to the sample size. The Shapiro-Wilk and Kolmogorov-Smirnov tests are the most well-known normality tests [66]. The Shapiro-Wilk test is a commonly used approach for determining data normality in samples size of fewer than 50 participants [67]. This test has become a famous normality check test because of its good power properties [68]. It determines the deviation from normality due to either skewness or kurtosis, or both [69]. It leads us to good results even with a small sample size. The Kolmogorov-Smirnov test also checks the normality of data, which is more general but less powerful than the first one [68]. In this test, the distribution of the statistic is independent of the cumulative distribution function being tested, and the test is precise. In this research, both tests were performed as the sample size was 32. The null hypothesis for both tests is that data are normally distributed, and the alternative hypothesis is that data are not normally distributed. The significance value (p) was taken (0.05) for the sample size to test the normality. If the p-value comes out greater than 0.05, we must fail to reject the null hypothesis that data are normally distributed. If it is less than 0.05, we must reject the null hypothesis that data are not normally distributed. We performed the parametric or non-parametric test based on these test results.
A non-parametric Kruskal-Wallis H test determines statistically significant differences between three or more independently sampled groups [70]. This test has four assumptions: (i) the dependent variable is measured at the ordinal level, (ii) the independent variable should have two or more categories, (iii) there is no relationship between the observations in each group or among the group themselves, and (iv) the determination shape of each distribution is necessary for the interpretation of results. The null hypothesis Kruskal-Wallis H test is that there is a significant difference in sample distribution, and the alternative hypothesis is that there is no significant difference in sample distribution. The significant p-value was taken (0.05) to check the significant difference. If the p-value comes out less than 0.05, we must fail to reject the null hypothesis that there is a statistically significant difference in the sample distribution. If it is greater than 0.05, we must reject the null hypothesis that there is no statistically significant difference in the sample distribution. These tests assess any statistically significant difference between paper, monitor, and Oculus Quest 2-based design review tasks.

Data Analysis
After collecting data from the different participants, the data were analyzed. The significance values (p) that the Shapiro-Wilk and Kolmogorov-Smirnov test produced for this study's data were less than 0.05, meaning that data are not normally distributed, as shown in Table 3. All the p-values in these five elements of CL were less than 0.05 in both tests across three instructional media. A non-parametric Kruskal-Wallis H test was applied to the rest of the data to analyze, which is more appropriate for the non-normally distributed data.

Experiment
As discussed above, the Kruskal-Wallis H test was carried out to determine how traditional paper-based, monitor-based, and Oculus Quest 2-based design review tasks would impact the user's CL. The results are shown in Figure 7. An insignificant difference was found in the mental demand of the paper-and monitor-based environments, as well as in the monitor and Oculus Quest 2-based environments. However, there was a statistically significant difference between paper and Oculus Quest 2 environments. No significant difference was in the first two media in temporal demand, and all other medium combinations had significant differences. CL's performance, effort, and frustration elements had no significant difference in all three media. The overall CL results show a statistically insignificant difference (p < 0.05) in all three environments for design review tasks. On the other hand, the immersive environment of Oculus Quest 2 was the most cognitively demanding of these three modalities. Sweller and Rogers explained that the review task will be impaired or fail if the required CL exceeds the limits of working memory. After the detailed comparison of three design review groups' data based on NASA-TLX, the results show that NIVE participants perceived a lower CL than those who used the IVE methods. Appl    Based on the above experimental data, participants' mean (average) completion time shown in Figure 8 and error identification in Table 4 were calculated using the Kruskal-Wallis H test. Figure 8 compares the completion times between the three design review groups; Oculus Quest 2 group takes (17.49 min) to complete the task which is significantly less than other two groups, monitor-based and paper-based design task, which take (19.37 min) and (24.72 min), respectively. However, the average completion time of paper-based, monitor-based, and Oculus Quest 2-based groups (24.72, 19.37, and 17.49 min) and p-value less than 0.05 show a statistically significant difference. The Kruskal-Wallis H test examines these three groups' error identification, as shown in Table 4.
The SART score was calculated by using Equation (1); when examined, the Oculus Quest 2 (12.34) had the highest cumulative score of SA, and monitor-based drawing had (12.0), and Oculus Quest 2 (10.00) had the lowest score. Lastly, statistically significant differences in the SA values of these three media (p < 0.05) were analyzed. For every error type in each medium, the numbers of errors placed, values of the mean (SD) and Kruskal-Wallis H, and significant difference (p-value) in the three media are shown in Table 4. As discussed above, 22 errors were placed with 12 types.
A statistically significant difference was found in the average performance of the three groups in detecting the error in the changed stair beam and column size. However, on the other hand, there was no statistically significant difference between the three groups in the rest of the errors such as: stair not connected to upper floor, slab and door/window, column and door/window, stair and beam, stair and column, sill height error, sill height of bathroom windows/exhaust fan, beam size changed, floor level changed, and extra beam. Overall, out of 22 design errors intentionally placed in the building design model, the Oculus Quest 2 groups identified 12.28 average errors, the monitor-based group identified 12.13, and the paper-based group identified 10.42. The p-value was greater than 0.05, which means no significant difference was found in detecting the total number of errors in these three media, as shown in Table 4. The Kruskal-Wallis H test was applied to the experiment to determine SA. It is the ability to know, precept, and predict factors and variables that can affect the participants' performance in a specific situation or environment [23]. Lastly, we determined the SART score from the same experiment and asked the participants to rate themselves. Table 5 shows the cumulative mean of SA and their standard deviation, Kruskal-Wallis H test values and their respective level of significance. Figure 7. Cognitive load score of participants in the different groups.    As stated in the above section, measurement SART elements are divided into three main groups, D, S, and U, and further into ten subgroups. A statistically significant difference was found in the understanding U and attentional supply S of SART main groups (p < 0.05), but on the other hand, there was no statistically significant difference in the attentional demand D (p > 0.05). Out of the ten elements of SART, seven (instability of situation, variability of situation, division of attention, spare mental capacity, information quantity, information quality, and familiarity) showed statistically insignificant differences. The three elements, complexity of the situation, arousal, and concentration, depicted significant differences. When we examined the cumulative mean of these three main groups, we found that the Oculus Quest 2 had a higher understanding U (10.69) than the monitor-(10.00) and paper-based environment (8.13). The cumulative mean values of attentional demand D were higher in the paper-based drawings (9.25) than in monitorbased (8.88) and Oculus Quest 2 (8.46), and also attentional supply S was highest in the paper-based drawing (11.13) compared with the monitor-based drawing (10.88) and Oculus Quest 2 (10.12). Attentional demand had a p-value greater than 0.05, which is an insignificant difference, and the other two attentional supply and understanding had significant differences among these three medias. The overall SA score was high ranking in the Oculus Quest 2 (12.34) and the monitor and paper-based media have a lower score (12.0 and 10.0), respectively. There was also a significant difference in the SA among all these media.

Cognitive Load and Task Performance
The NASA TLX experiment results showed that the CL of participants in immersive and non-immersive environments increases compared with paper-based drawings. Because both VEs can be very distractive and over stimulative as a realistic three-dimensional 360 • experience, learners will obtain a lot more information from these VEs than they will obtain from the traditional medium [71]; that is why in our experiment, participants' mental demand was slightly high in the VEs. Temporal demand was also less for the paperbased and monitor-based medium than for the Oculus Quest 2 because the immersive environment has a higher level of immersion and temporal disassociation [72].
Participants obtained higher CL in the Oculus Quest 2 at 2.15 compared with monitorbased and paper-based at 2.045 and 2.005, respectively. Still, their performance was better in the VRE considering the time required to complete the task and identifying the number of errors in the design review process because participants had to focus on a single source of information or display system at a time. Nevertheless, the simulated VEs reduced the participants' design review effort because, in the simulated environment, one should not shift the gaze between the pages. That is why the effort was less for these environments than a traditional paper-based review. Virtual display systems have reduced the effort to change page shifts of the drawings, which positively impacts the participants' performance. It provides the idea of dimensions or depth information of drawing such as the distance between the slab and door and windows, slab and the floor height, size of beam and column, and whether their size is optimum or not because participants can adjust their height in the VE. This led to an increase in participants' performance in the VEs compared with the traditional environment when it came to finding errors in the design of the building. Moreover, the participants of the latter two environments completed the task earlier than the first group of participants.

Situational Awareness
In our experiment, SART results showed that participants using the Oculus Quest 2 and the monitor-based group performed better than a traditional paper-based group. They were also aware of their surroundings because of Oculus Quest 2 ability to switch on the camera when double tapping on the gear. This feature makes it unique and assists it in better performance. The 3D design model of the simulated VE on the monitor, and Oculus Quest 2 seemed to help the participants understand their task; concerning SA, both VE groups performed better than the traditional group. Attentional D and attentional supply S were less for the latter two environments than in the first one but understanding (U) was vice versa. Participants who used the Oculus Quest 2 focused solely on the immersive VE, making it easy to comprehend their surroundings by utilizing the cognitive resources of attentional supply (arousal, concentration, attentional division, and mental capacity). It is a fact that the design reviewer must perform various cognitive tasks simultaneously, such as analyzing, comprehending, remembering, and making assumptions of alternatives, if any. They must be fully aware of their surroundings to perform better. In general, using these virtual headsets or HMD would potentially impact the industry with better performance and reduce the risk and errors coming forth in the execution of the project.

Limitations
Although the study was conducted successfully, some limitations must be considered. Firstly, although it is concluded with the mixed type of results from the experiment as they are both statistically significant and insignificant, fewer participants may hinder the generalizability of the study due to human heterogeneity. i.e., how someone performs when the given task uses a new technology, such as virtual reality, can be contingent on participants' acceptance of technology and how well they performed before. Although participants were given comprehensive pre-training sessions, varying learning abilities still exist. A participant's cognitive ability to find the design errors in the building model is different, leading to misinterpretation of the result. The questionnaire provided to participants to assess themselves may skew the results. The participants might interpret the questions differently and answer them according to their understanding which may also mislead the result. Future studies must ensure that there should be minimum human variation issues in the experiment, which will support the results from our analysis. Electroencephalography EEG measures the mental stress-inducing task and identifies optimal task allocation, workplace efficiency, and workspace safety to measure skin response during cognitively demanding activity. We subjectively measured the participants' CL, SA, and TP due to limited resources, as these can all be measured by the new advanced techniques such as electrodermal activity EDA. This research is also limited by the fact that it only uses Oculus Quest 2 VR handset. Other handsets with different resolution and refresh rate may produce different results. A future study investigating the effect of user performance for using different VR handsets technology with different resolution and refresh rate would be interesting. Despite the limitation mentioned earlier in this study, this research extends knowledge and understanding of the impact of VR and cognitive issues for design review tasks in the construction industry. Further research might explore more dimensions by considering a more realistic environment (increase more dimensions to a simulated environment such as hearing, feeling, smell, interaction with, and affect the surroundings) to measure SA.

Conclusions
In recent years, VR has been adopted in the AEC industry; it helps the construction stakeholders effectively collaborate and better understand and visualize the information. However, this additional information advantage may lead to cognitive stress for participants, negatively impacting their performance.
This research examined the impact of VR-based design review tasks on construction professionals. This study analyzed TP, CL, and SA for building design review tasks in three distinct environments of three groups working in one of the environments: paper-based, 3D monitor-based VR (non-immersive), and head-mounted-based VR (immersive). The design model of the building was created in BIM using 2D paper drawings. The design errors were incorporated into paper-based drawings and the BIM model. Participants were tasked to identify the design errors in one of the environments. Before performing the task, participants demonstrated the medium they were using and the task they had to perform. At the end of the task, they asked questions, and their performance was also analyzed.