The Effect of Task Complexity on Time Estimation in the Virtual Reality Environment: An EEG Study

: This paper investigated the effect of task complexity on time estimation in the virtual reality environment (VRE) using behavioral, subjective, and physiological measurements. Virtual reality (VR) is not a perfect copy of the real world, and individuals perceive time duration differently in the VRE than they do in reality. Though many researchers have found a connection between task complexity and time estimation under non-VR conditions, the inﬂuence of task complexity on time estimation in the VRE is yet unknown. In this study, twenty-nine participants performed a VR jigsaw puzzle task at two levels of task complexity. We observed that as task complexity increased, participants showed larger time estimation errors, reduced relative beta-band power at Fz and Pz, and higher NASA-Task Load Index scores. Our ﬁndings indicate the importance of controlling task complexity in the VRE and demonstrate the potential of using electroencephalography (EEG) as real-time indicators of complexity level.


Introduction
Virtual reality (VR) technology provides an immersive computer-generalized environment in which individuals are able to interact with multisensory input [1,2]. As VR technology has become increasingly portable and inexpensive in recent decades, VR applications have expanded into diverse education and training areas, where they are used to reproduce scenarios that may be expensive or hard to mimic in real life [3]. However, because the head-mounted display (HMD) that is part of VR technology blocks real-world external stimuli, it is challenging for VR users to estimate the amount of time they have spent inside the virtual reality environment (VRE) [1].
Time estimation refers to the process by which an individual subjectively judges duration. Two categories of time estimation have been defined. In the prospective time estimation paradigm, participants are informed that they are required to estimate duration prior to starting a task; in contrast, in the retrospective time estimation paradigm, participants are not informed that they are required to estimate duration prior to starting a task. Both paradigms ask participants about duration after the task [4,5]. Due to the limitations of the experimental settings in which participants surmise that they may be asked to judge duration, prospective time estimation has been used more widely in studies than retrospective time estimation [5]. Prospective time estimation is explained by the attention allocation model that fewer resources are available for processing time estimation when other cognitive tasks require more attentional resources [6][7][8].
Multiple studies have underlined the importance of time estimation in indicating task performance and users' satisfaction in diverse task settings [9,10]. Accurate time estimation is known to be an indicator of enhanced user experience, high decision-making quality, and high academic performance. For example, users were likely to overestimate the time they spent on webpage loading and downloading tasks, and their user experience ratings decreased correspondingly [11]. In addition, inaccurate time estimation has been shown to result in customers' dissatisfaction with waiting in line [12,13]. Previous studies have also paid attention to how time estimation influences decision-making. Klapproth's study [14] indicated that when individuals made larger time estimation errors, they tended to make impulsive decisions without evaluating the value of the options. Another study, too, found that individuals made impulsive decisions, choosing instant rewards over delayed rewards with greater value [15], when they overestimated the time. In the academic arena, Aitken [16] indicated that students who underestimated the amount of time they had spent reading textbooks and answering exercise questions were more likely to delay starting coursework and to have lower academic performance. Josephs and Hahn [17] observed that students who made accurate time estimations when reading a psychology manuscript tended to have higher grade-point averages (GPAs).
Though researchers have realized the importance of time estimation, few studies have investigated time estimation in the VRE; of those that have, there have been mixed results. One stream of research showed that during chemotherapy, most individuals estimated the time duration in the VRE to be shorter than the actual time elapsed [1,18]. In other studies, participants estimated that the time elapsed in the VRE was longer than the actual time they had spent playing a music game and a shooting game [19], and walking according to the direction of a virtual marker [20]. One of the main reasons for such inconsistency is investigating without taking task complexity into account to time estimation in the VRE.
In Sections 1.1 and 1.2, we provide a literature review (the first and second paragraphs). Then, we discuss the limits of the existing studies in terms of their experimental designs (the third paragraph of Sections 1.1 and 1.2) and assessment tools (Section 1.3). Such limitations led us to develop the three research questions that appear at the end of each section.

Task Complexity and Time Estimation
One approach to defining task complexity is to consider the number of elements comprising a task [21,22]. A high-complexity task includes many elements; a low-complexity task is comprised of few elements. As a task's complexity level increases, an individual needs to process more information, leading to a higher demand for attentional resources [22][23][24].
Several studies have revealed that in non-VREs, individuals estimate time more inaccurately, as tasks become more complex. For example, Zakay and colleagues [25] designed verbal tasks with three levels of complexity-reading words (low complexity), naming objects (mid complexity), and giving synonyms for the words on the cards (high complexity). Of the three task conditions, the participants made the largest time estimation errors during the synonym-providing task, the highest task complexity. More recently, Chan and Hoffmann [26] found that participants estimated the elapsed time to be longer than the actual time as the complexity of two tasks-a pin-to-hole assembly task and a task that required moving an object to a target area-increased. Brown [27] used a figuretracing task in which participants were instructed to trace a six-point star within a double boundary line. At the low complexity level, the participants performed the task with a pencil; at the high complexity level, the participants used a mirror-drawing apparatus. The study revealed that the subjects tended to make larger errors in both prospective and retrospective time estimations during high-complexity tasks compared to those of low complexity.
Although many researchers have discussed the effect of task complexity on time estimation under non-VRE conditions, few studies have considered VREs. A recent study by Schatzschneider et al. [28] used the VRE to investigate the impact of cognitive task types-verbal vs. spatial memory tasks-on time estimation. They observed that cognitive tasks significantly impacted participants' time estimation in the VRE. However, we noticed that the researchers did not consider the complexity of each task type. As the effect of task complexity on time estimation in the VRE remains unknown, our first research question (RQ) was as follows: RQ1: Does task complexity influence time estimation in the VRE?
Although the beta-band is thought to be sensitive to task complexity [36][37][38], there is no consensus as to the direction of the relationship. On the one hand, beta-band power is known to increase with task complexity. For example, Wilson et al. [39] observed higher beta-band power when the number of characters to be memorized increased from one to five. Murata [35] reported that the average beta-band power at Cz, Fz, and Pz increased with the complexity of the N-back task where participants were asked to respond when the letter in the current trial was the same as the one shown N trials before. In a word discrimination experiment, researchers found enhanced beta-band power when the semantic complexity increased [40]. On the other hand, other studies have reported opposite results. Fernández et al. [41] indicated that beta-band power trended higher on a simple number-reading task than on arithmetic tasks. Bočková et al. [42] showed a higher beta-band power on simple letter-copying tasks than on a complex task in which participants were instructed to write a letter other than the one shown.
Such inconsistency in the relationship between task complexity and beta-band power has not even been discussed in the VRE. Recently, researchers have successfully applied EEG systems together with consumer-level VR HMDs [43,44]. For example, Dey and colleagues employed a 32-channel portable EEG system with the HTC VIVE VR headset to generate an adaptive VR training system by collecting alpha-band activities in realtime [44]. Taking advantage of recent technological advancements in EEG devices that are portable and provide data in real-time [45,46], our second research question was to determine whether brain signals indicate task complexity in the VRE.

RQ2:
Do changes in brain signals indicate task complexity in the VRE?

NASA-Task Load Index
The NASA-Task Load Index (NASA-TLX), a widely applied workload assessment tool [47], has proven its usefulness in measuring task complexity from a subjective perspective [48,49], but it has some disadvantages. NASA-TLX is normally applied after an individual finishes a task, meaning that the result depends on the individual's memory and willingness to answer the questions without any bias [50]. Given the limitations of NASA-TLX, researchers have considered estimating perceived workload using responses such as time estimation [50,51] and EEG signals [37,45]. To further verify in the VRE that time estimation and EEG signals can work as workload assessment tools, our third research goal was to use correlation analysis to discover the relationship between time estimation error, EEG band power, and the NASA-TLX score.

RQ3:
Are there relationships among time estimation, brain signals, and perceived workload in the VRE?
To answer the three research questions, we designed controlled experiments with independent and dependent variables and analyzed collected experimental data, which are described in Section 2. Section 3 presents the descriptive statistics of dependent variables and the results from the analysis of variance (ANOVA) that show the effects of the independent variables on the dependent variables (RQs 1 and 2). The results from the correlation analysis are also provided to show the relationship between various dependent variables (RQ3).

Participants
Twenty-nine subjects (13 males and 16 females) aged 21-29 years (mean = 23.72, SD = 2.09) at the University of Washington participated in the study. Participants had either normal or corrected vision without visual or auditory impairments. Each participant signed a consent form prior to the experiment and received USD 20 in compensation for their participation. The study was reviewed and approved by the University of Washington's Institutional Review Boards (IRBs) before recruiting participants.

Apparatus
We used an HTC VIVE head-mounted VR device with a refresh rate of 90 Hz. A 90 Hz refresh rate is sufficiently capable of eliminating flicker, a contributing factor to motion sickness [52,53]. A single-player commercial VR software program, Jigsaw360 (Head Start Design), was used to deliver the main task in the study. To minimize the potential effect of visual complexity on task performance, apart from using a different number of jigsaw puzzles, we kept all experimental settings between the high-and low-task-complexity conditions identical. We used the same background color, the same jigsaw puzzle design (i.e., solid gray color), and the same contrast between the jigsaw pieces. Likewise, the virtual positions of the jigsaw pieces were consistent across task complexity conditions, as was the virtual position of the participants. All participants were required to sit in a chair to minimize their body movements. Thus, we assume that there were minimal differences in the spatial components of the high-and low-task-complexity conditions in the VRE.
Auditory and visual effects in the background of the game were eliminated to minimize any nuisance effect on the EEG signals. Participants' brain signals were collected using a wireless EEG (Epoc Flex by Emotiv) with a sampling rate of 128 Hz. All raw data were recorded using EmotivPro software (Emotiv). The experiment was performed on a Lenovo X1 laptop (Lenovo) equipped with an Intel i7 processor and Intel UHD 620 Graphics. We used R version 3.6.1. to perform statistical analysis [54].

Experimental Design
We employed a 2 × 3 within-subjects design. Table 1 summarizes the independent and dependent variables. The independent variables were task complexity and block sequence. The jigsaw game had two levels of complexity-8 pieces (low task complexity) and 18 pieces (high task complexity). Figure 1 shows the interface of low and high complexity levels in the VR jigsaw game. There were three blocks in the experiment, and each block consisted of two levels of task complexity; each participant was thus required to complete six trials in total. We used a randomized block design where the order of two trials (i.e., low or high complexity task) within a block was randomized to avoid any order effect. The dependent variables were absolute time estimation error; the relative beta-band power at the Cz, Fz, and Pz electrodes; and the NASA-TLX score. The Cz, Fz, and Pz electrodes are located on the midline of the central, frontal, and parietal lobe according to the International 10-20 system [55]. A number of studies have reported that the beta-band power at electrodes Cz, Fz, and Pz is sensitive to task complexity [35,36,39].   Figure 2 presents the procedure of the experiment. Before the experiment, the participants had completed an online survey with screening questions about their demographic information and previous VR experiences. The experiment lasted about 105 min per participant. At the beginning, each participant was asked to sign a consent form and read a brief set of instructions. Participants were informed that if they experienced any discomfort such as motion sickness, they were free to withdraw from the study immediately. Each participant then performed a practice trial with the VR headset to become familiar with the VR jigsaw game. Participants were informed that in the main study, they would be asked a time estimation question verbally each time they finished a jigsaw puzzle. After a participant completed their practice trial, we set-up the Epoc Flex EEG device. The Cz, Fz, and Pz electrodes were attached to the participant's head with electric gel. Each trial during the main experiment required the participants to put every jigsaw piece in the right place on a blank tray with a white border, as shown in Figure 1. To pick up a jigsaw piece, the participants needed to target the jigsaw piece using the virtual laser pointer and then press the trigger button on the controller. Then, they kept holding the button and moved the controller to drag the jigsaw piece to the location they wanted to. The participants only needed to release the button to place the jigsaw piece. The participants were instructed to gently move their necks and wrists and avoid moving other body parts to minimize the artifacts caused by bodily movement on the EEG signal recording. After completing each trial, the participants were instructed to estimate the time, in seconds, that the trial had taken them by answering a question: "How long do you think you spent on the previous jigsaw game?" Before moving on to the next trial, the participants also assessed their subjective workloads using the NASA-TLX questionnaire [47]. Each participant gave their answers verbally while wearing the VR headset. There was a one-minute break between trials. The participants were instructed to play the game at a comfortable pace. During the experiment, participants' brain signals, time estimations, actual completion times, and NASA-TLX scores were recorded.

Data Analysis
To investigate the effect of task complexity on time estimation, the study explored behavioral, physiological, and subjective responses in the VRE, as described in Section 2.4. As shown in Table 1, three dependent variables-absolute time estimation error, relative beta-band power, and NASA-TLX-represent behavioral, physiological, and subjective responses, respectively. The following sections provide detailed information about the data analyses related to participants' behavioral (Section 2.5.1), physiological (Section 2.5.2), and subjective responses (Section 2.5.3).

Absolute Time Estimation Error
In this study, we utilized the absolute value of time estimation error [27,56] to describe the accuracy of the time estimation results. To avoid the artificially high estimation accuracy that results from overestimation and underestimation results canceling each other out, we used the absolute value of the time estimation error instead of the true value [27,57]. As shown in Equation (1), the absolute time estimation error (AE) was specified as the absolute difference between the participants' estimated time (ET) and the actual task completion time (CT):

Relative Beta-Band Power
To analyze EEG band power, we first applied the fast Fourier transform (FFT) based on Welch's [58] method to decompose the voltage signals into the power spectrum. FFT is a method that is widely used to calculate power spectrum density with high computational efficiency [59]. Compared to the original FFT, Welch's FFT separates raw data into overlapping sample segments and derives the power spectral density by averaging the periodograms of all segments [58]. This reduces the variance in estimated band power [60]. We conducted Welch's FFT and data filtering by using the Scipy package [61] in Python 3.7.3. We filtered the data based on the frequency ranges of the delta (1-4 Hz), theta (4-8 Hz), alpha (8-12 Hz), beta (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma (30-100 Hz) bands [62,63]. After performing the FFT, we transformed frequency band power into a relative form: the ratio of a specific band power to the total power of the signal. For example, when calculating the relative beta-band power, we considered beta-band power to be the numerator and the total power of the signal to be the denominator. We preferred relative band power to frequency band power because the former has a smaller variance among subjects and is less influenced by individuals' electrical properties [30,64,65].

NASA-TLX
NASA-TLX records individuals' mental, physical, and temporal demands, frustrations, efforts, and performance to gauge their subjective workload. In practical usage, many researchers have modified the NASA-TLX to suit different purposes and situations. The most common modification is the elimination of the weighting process in the NASA-TLX [47]. The same modification was made in this study. To make it easier for participants to answer the questions verbally, we simplified the scale from 0 to 100 with 5-point increments to 21 gradations. Table 2 summarizes the descriptive statistics of the behavioral response (i.e., time estimation error), subjective response (i.e., NASA-TLX score), and physiological responses (i.e., the relative beta-band power at Cz, Fz, and Pz) under the low-and high-task-complexity conditions. Participants had a trend of making larger estimation errors under the hightask-complexity conditions than the low-task-complexity conditions. In addition, the mean NASA-TLX score was greater under the high-task-complexity conditions. However, the average relative beta-band power at Cz, Fz, and Pz was lower under the high complexity conditions compared to the low-complexity conditions. We also present the average task completion time and estimated time under both conditions in Table 2. In general, participants spent more time on the jigsaw game and made longer time estimates under the high conditions compared to the low conditions. To investigate the impact of the independent variables on the absolute time estimation error, we conducted a repeated-measures ANOVA. Figure 3 presents the interaction plot with the absolute time estimation error. We found a significant effect of task complexity on the absolute time estimation error (F (1, 140) = 82.02, p < 0.001), meaning that the more complex the task was, the larger the absolute error in time estimation was. The effect of the sequence of blocks on the absolute time estimation error was also significant (F (2, 140) = 4.60, p < 0.05). Fisher's least significant difference (LSD) post hoc analysis indicated that all three blocks differed significantly. Participants made larger time estimation errors in the first block than in the second block, larger errors in the second block than in the third block, and larger errors in the first block than in the third block. There was no significant effect of interaction between task complexity and block sequence on the absolute time estimation error.

Relative Beta-Band Power of EEG
The repeated-measures ANOVA results revealed a significant effect of task complexity on the relative beta-band power at Fz (F (1, 98) = 5.26, p < 0.05) and Pz (F (1, 98) = 5.85, p < 0.05) and a marginal effect on the relative beta-band power at Cz (F (1, 98) = 2.92, p = 0.09). Figure 4 shows the interaction plots of the two independent variables with the relatively high beta-band power at Fz (a), Pz (b), and Cz (c). The relative beta-band power at the three electrodes decreased with task complexity. The block sequence presented no significant results, indicating that task repetition did not affect the fraction of beta-band power. There was no significant interaction effect between task complexity and block sequence on each signal.

Subjective Workload
We conducted a similar repeated-measures ANOVA on the subjective workload scores. We found a significant main effect of task complexity on the NASA-TLX score (F (1, 140) = 123.89, p < 0.001). As illustrated in Figure 5, high task complexity generated a higher NASA-TLX score: participants perceived a higher workload in the high-complexity task than the low-complexity task. The sequence of blocks also significantly impacted the NASA-TLX score (F (2, 140) = 37.36, p < 0.001). Fisher's LSD post hoc analysis showed that all three blocks differed significantly. In our study, participants' perceived workload gradually decreased from the first block to the third block.  Table 3 summarizes the relationships between the absolute time estimation error, relative beta-band power at the three electrodes, and the NASA-TLX score by using the repeated-measures correlation analysis [66]. We found a significant negative correlation between the absolute time estimation error and the relative beta-band power at Cz. This correlation indicated that the proportion of beta-band power at Cz showed an overall decreasing trend when individuals made larger time estimation errors. In addition, we observed marginal correlations between the absolute time estimation error and the relative beta-band power at Fz and Pz. We also observed a significant correlation between the NASA-TLX score and the absolute time estimation error, indicating that larger time estimation errors were associated with increasing perceived workload. Table 3. Correlation coefficients between dependent variables.

Variables Coefficients p-Value
Absolute time estimation error and relative beta-band power at Cz −0.25 0.01 * Absolute time estimation error and relative beta-band power at Fz −0.16 0.11 Absolute time estimation error and relative beta-band power at Pz −0.17 0.08 Absolute time estimation error and NASA-TLX score 0.55 <0.001 *** * p < 0.05, *** p < 0.001.

Discussion and Conclusions
To the best of our knowledge, this study is the first to investigate how time estimation is influenced by task complexity in the VRE. The results of this study revealed three major findings that answer the three questions raised. First, we found that the participants made greater time estimation errors as the task complexity in the VRE increased. Second, in the VRE, the relative beta-band power of EEG was greater for the low task-complexity condition than it was for the high task-complexity condition. Third, we found a negative relationship between time estimation error and relative beta-band power and a positive relationship between time estimation error and mental workload.
We explored the impact of task complexity on time estimation in a new task setting, i.e., the VRE. In addition to considering the participants' behavioral and subjective responses, we employed EEG to capture the participants' physiological responses in real-time. The composite of behavioral, physiological, and subjective responses enabled us to overcome the limitations of the self-reported questionnaire of NASA-TLX that is available only after the completion of tasks and may produce self-reported biases. Our findings in the VRE provided new evidence of the direction of the relationship between the beta-band power of EEG and task complexity.
Our finding that the absolute time estimation error was negatively influenced by task complexity in the VRE substantiates the attentional resource allocation theory [67,68]. Based on this theory, estimating time and conducting cognitive tasks both require attentional resources, of which each person has only a limited capacity. In line with the structuralist point of view [21,22], this study provided a smaller number of elements to be completed during the low-complexity task and a greater number of elements to be completed during the high-complexity task. As a result, the participants needed to allocate greater attention to the high-complexity task and paid less attention to the process of time estimation, leading to less accurate time estimations than seen during the low-complexity task [15,67,69].
We also observed the effect of block sequences on the absolute time estimation error, demonstrating that task repetition significantly improved the accuracy of time estimation results. We note that the practice trial was designed to familiarize participants with the VR environment, task procedure, and device usage with the HMD and VR controllers. The screening survey showed that 38% of participants did not have VR experience. If we had not included the practice trial, the participants likely would have made many task errors due to task misunderstanding and their lack of familiarity with the VR system. These errors would have been reflected in the time estimation. As we designed the experiment to investigate the effect of task complexity only on time estimation, the practice trial was necessary. While the practice trial minimized the impact of previous VR experience, we found an effect of task repetition on the absolute time estimation error. Without the practice trial, we would have expected the effects of block sequences to increase dramatically. Throughout the experiment, we purposefully avoided providing any feedback to the participants on their estimation results. The participants were thus unable to adjust their estimations based on their previous results. Task repetition decreases the attentional resources assigned to the given task [70,71] and leaves more attentional resources available for time estimation, thereby contributing to more accurate time estimation results [70].
In addition to considering relative beta-band power, we examined changes in relative theta-band power in the frontal and parietal regions. We did not find significant results when we tested the relative theta-band power as the dependent variable. Given that the previous studies have reported increased theta-band power in enhanced task complexity [35,72,73], beta-band power is more sensitive in the VRE to detect changes in task complexity compared to the theta-band. The significant correlation between the relative beta-band power at Cz and time estimation error indicates the connection between beta-band activities in the midline central region and time estimation, whether inside or outside of the VRE. In a previous study conducted in a non-VRE setting, Kulashekhar and colleagues [74] observed a suppression in beta-band activities in the midline central region during time estimation. Ghaderi et al. [75] also reported a significant change in beta-band power at Cz when participants estimated elapsed time. As previous researchers have suggested, the beta-band plays an essential role in time estimation mechanisms [74][75][76], and our findings further support the theory in the VRE.
Additionally, our correlation results between the absolute time estimation error and the NASA-TLX score corroborate previous findings indicating that time estimation is a measure of mental workload in surgery training [49] and in 2D video games [77]. Our results also validate time estimation usage in the VRE. When the environment limits the use of the NASA-TLX questionnaire, time estimation can serve as a valid and quick mental workload evaluation tool.
Given that the age distribution of the participants was not diverse in that most of the participants were college-aged young adults, we did not originally consider age to be the main factor in this study. However, we acknowledge that age plays a significant role in time estimation tasks [56,78]. Previous studies found noticeable differences between age groups such that the elderly group made larger time estimation errors than the younger groups did [56,78]. To test the effect of age on our dependent variables, we divided age into two levels (i.e., older than or equal to 24 and younger than 24) using a median split. However, the ANOVA result showed that age did not have a significant effect on absolute estimation error (F (1, 27)  To increase the generalizability of the results, future studies might aim to recruit participants from different age groups.
Our study has several limitations that merit future research. First, this study included factors that may have imperceptibly influenced the quality of EEG signals, such as the vibration of the VR controller and the electrical interference between the VR HMD and the EEG headset. To eliminate noise, future research might perform further data filtering by comparing the EEG signals at the baseline and during the experiment. Second, the statistical power of the analysis was 0.6, indicating a 40% probability of having a Type II error in the results, which could be attributed to the small sample size. Third, only three EEG electrodes were used in the experiment. In the future, brain signals from multiple electrodes might be analyzed synchronously to further investigate brain connectivity and its relationship with time estimation in the VRE. Fourth, our study did not focus on the event-related potential (ERP). The error-related negativity (ERN) is an ERP component that is found about 100 ms after individuals make and observe incorrect responses [79]. For example, Pezzetta et al. [80] reported an ERN on the frontal lobe when participants failed to grasp a virtual glass in the VRE. By recording the ERN when participants make errors in the VR puzzle game, future studies might consider variations in ERN under different complexity levels and the correlation between ERN amplitudes and time estimation error. In addition, future research may consider the inclusion of conditions in the real world. A comparative study between the VRE and the real world will provide evidence to understand how the VRE affects human time estimation.
Our study presents an implication of applying EEG systems to assist task design in VR applications. The finding that higher task complexity resulted in larger time estimation errors indicates the importance of controlling complexity levels in VR applications. Current EEG systems are able to perform FFT automatically and export the power of frequency bands in real-time. Taking advantage of this capability, designers will be able to real-timemonitor changes in time estimation errors during the use of VR applications inferred from changes in EEG responses. Compared to a laboratory environment, practical VR design settings pose more challenges for manipulating task complexity. This is due to the diversity of functional requirements associated with VR design [81]. The relationship between task complexity and EEG signals suggests the possibility of using relative beta-band power as an index to help developers adjust the complexity levels in VR applications.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.