A Comparison of Monoscopic and Stereoscopic 3D Visualizations: Effect on Spatial Planning in Digital Twins

From the user perspective, 3D geospatial data visualizations are one of the bridges between the physical and the digital world. As such, the potential of 3D geospatial data visualizations is frequently discussed within and beyond the digital twins. The effects on human cognitive processes in complex spatial tasks is rather poorly known. No uniform standards exist for the 3D technologies used in these tasks. Although stereoscopic geovisualizations presented using 3D technologies enhance depth perception, it has been suggested that the visual discomfort experienced when using 3D technology outweighs its benefits and results in lower efficiency and errors. In the present study, sixty participants using 3D technologies were tested in terms of their ability to make informed decisions in selecting the correct position of a virtual transmitter in a digital twin and a digital terrain model, respectively. Participants (n = 60) were randomly assigned into two groups, one using 3D technology engaging stereoscopic shutter glasses and the second working with standard computer screen-based visualizations. The results indicated that the participants who used shutter glasses performed significantly worse in terms of response time (W = 175.0; p < 0.001, r = −0.524). This finding verifies previous conclusions concerning the unsuitability of stereoscopic visualization technology for complex decision-making in geospatial tasks.


Introduction
Terms such as digital twins and virtual reality (VR) have become a common part of the vocabulary not only for scientists but also the general public. We are witnessing the massive penetration of these technologies into the daily life of the whole society. The technologies hold irreplaceable positions not only in the entertainment industry but in daily use in a number of serious scientific disciplines. With the use of technologies such as 3D shutter glasses, 3D monitors, Cave Automatic Virtual Environments (CAVE) or head-mounted displays (HMDs), 3D visualization is transferred to many geo-related disciplines.
All these VR technologies can improve people's abilities to manage digital twins and/or similarly complex systems [1,2]. Digital twins compose of a digital representation enabling data exchange containing models, simulations as well as algorithms related to their real-world counterpart and its behaviour and features [3]. Such a statement demonstrates digital twins' complexity as well as the need for informed, evidence-based decision making. Possible uncertainties related to geospatial data used in the scope of digital twins can make decision-making more difficult for users [4][5][6][7]. Despite this issue, the digital twin opens up new perspectives, for example, in urban planning and geography in general [7][8][9][10].
Three-dimensional visualizations can assist in completing various spatial tasks in digital twins on different scales, from global and regional levels to local levels [11][12][13]. Despite technological developments and a wide range of 3D visualization applications, relatively little is still known about the theoretical background of such visualizations [14] nor their specific effect on human cognition in various activities. Although this problem was introduced sixteen years ago (in 2005), it remains relevant today, especially regarding their interactive versions [13,15]. An important part of the creation of VR for digital twins is understanding data visualization options across different user interface and visualization types [16,17].
The above-mentioned technologies generally engage the principle of stereoscopy to display a 3D visualization. Buchroithner and Knust [18] denoted a stereoscopic 3D visualization as "real-3D". Real-3D visualizations employ both monocular and binocular depth cues (namely binocular disparity and binocular convergence). Alongside real-3D, a "pseudo-3D" visualization is described [18] as a technology which employs only monocular depth cues (e.g., linear perspective, relative size, interposition, texture gradient or kinetic depth effect). Pseudo-3D visualizations are generally found as depictions on planar media (such as classic computer screens or widescreen projections). Pseudo-3D visualization is also considered a cheaper and more disseminated type of visualization since it places no extra demands on peripheral stereoscopic technologies, which are essential in the case of real-3D visualization.
This study explores the use of two types of 3D visualization used for presentation in digital twins. Both types may be considered equivalent in terms of information load since they depict exactly the same amount of information [19]. However, each stimulates different cognitive processes and potentially related interactive behaviour when the visualization is perceived [19,20]. In this paper, we examine these cognitive processes in the context of user performance in spatial tasks. Since the cognitive processes cannot be measured directly, in this paper we infer them from the participants' behaviour/performance in the digital twin which was manifested during the experiment.
Specifically, we compare the influence of certain 3D visualization types, pseudo-3D and real-3D, on human performance. In particular, user aspects comprising response times, interaction activity and error rate/correctness are explored. The main motivation is to identify these user aspects of spatial planning relying on interactive 3D models populated with geospatial data.

Related Work
In previous research, real-3D visualization has been suggested as an effective tool for geovisualization since it provides additional depth cues [18,21,22]. The concept of geovisualization in the present paper is based on MacEachren and Kraak's definition, which describes it as a field providing the theory, methods and tools for the visual exploration, analysis, confirmation, synthesis and communication of geospatial data [23]. Specifically, 3D geovisualizations concern 3D visual representation of the real world, its parts or gespatial data in general [24]. The development of 3D geovisualization relies heavily on computer graphics [14,24]. MacEachren and Kraak [25] identified research user interfaces and virtual environments as one of the most important tasks in the scope of 3D geovisualization. However, no clear recommendations for the production and use of real-3D geovisualization in geosciences have been created or are currently available [13,[26][27][28][29][30][31][32][33][34][35][36]. Despite frequent discussions on the role of additional depth cues in real-3D visualizations, the controversial question of the effectiveness and efficiency of this visualization type in practical application remains unresolved. Namely, increased response time, cognitive load in performancebased tasks and user distraction and discomfort while wearing peripheral devices such as 3D shutter glasses or HMDs are often discussed in terms of geovisualization [13,37,38].

Technologies for 3D Geovisualizations
With respect to the unavailability of standards or recommendations for the use of 3D technologies, well-focused user feedback helps us to understand the usability aspects and crucial factors involved in 3D geovisualization. Probably the first ever user evaluation dealing with real-3D and pseudo-3D depiction in geovisualization was conducted by Kraak [39] in 1988. The origin of this work corresponded to the first wave of interest in the development of real-3D visualization technologies which engaged stereoscopy [40][41][42]. These prime real-3D visualizations were largely limited by the computing power of the computers of that time; sometimes the usability of the display devices was not optimal, the devices themselves being quite clumsy in many ways. The second wave of interest in real-3D visualization arose with the dynamic technological development in the first decade of this century and the use of relatively cheap, high-resolution and portable display devices accelerated. In this new era, once again, 3D visualization saw the beginning of systematic study, especially with regard to the use of real-3D technologies in medicine [43][44][45] or aviation [2,[46][47][48]. User testing of real-3D geovisualizations has also begun to emerge. In 2005, Kirschenbauer [21] evaluated geospatial data displayed on an autostereoscopic Liquid Crystal Display (LCD) towards 2D paper maps. In the study, the demands on time increased (efficiency was lower) when working with real-3D maps, although accuracy did not. Later, Fuhrmann et al. [49] evaluated the differences in how a topographic map and its 3D holographic equivalent were perceived by users, who were asked to plan an optimal route to a given point. The eye-movement data demonstrated that the holographic map was a better option for this activity. Kjellin et al. [50] compared a static 2D map, 2D animation and real-3D visualization of spatio-temporal data and found no universal visualization which was ideally suitable for all the explored tasks. Similarly, Seipel and Carvalho [51] evaluated a 2D map and perspective views on a flat map (with real-3D visualization), both types presenting bar charts. They showed that the 2D maps and real-3D visualizations were equivalent in terms of effectiveness (accuracy of responses) and efficiency (response time). In addition to the previous two user studies, Seipel and Carvalho [51] investigated the differences between the above-described real-3D and pseudo-3D visualizations. Real-3D visualization of bar charts in the presented 2D maps did not provide any advantage resulting from additional depth cues; therefore, the pseudo-3D visualization was suggested as more suitable for the presentation of this type of geospatial data.
Real-3D and pseudo-3D visualizations were also compared by Sprinarova et al. [52] in different tasks working with digital terrain models (DTMs). Participants achieved greater efficiency (measured as task solving time) with the real-3D visualization but greater effectiveness (successful solving of tasks) with the pseudo-3D visualization. Jurik et al. [53] investigated the effect of 3D visualizations which presented both static and interactive stimuli. The real-3D visualization significantly increased effectiveness (accuracy) in making altitude comparisons within the DTMs, although only in non-interactive (static) tasks. In the interactive tasks, the differences in effectiveness flattened out with no further significant differences in efficiency. It was also observed that in the case of the real-3D visualizations, participants systematically omitted some points of interest located in the DTMs [53]. This research was followed by Kubicek et al. [20], in which participants identified the terrain profiles in different 3D visualizations of DTMs. The results of this user study demonstrated that the effect of the 3D visualization type (real-/pseudo-3D) on user performance was unclear because the level of navigational interactivity had a significantly greater effect on the usability of the geovisualization than the 3D visualization type. Users in this study, however, perceived real-3D as not only more attractive and desirable; it also gave them the feeling of more efficient and effective task-solving. Evaluation of real-3D and pseudo-3D visualizations using the altitude and profile comparison tasks was later performed by Jurik et al. [15]. Despite the proven significant effect in speed, accuracy was not clearly shown to differ in the conditions. A clear trend was observed in that the option to interact with DTMs in the real-3D visualization mode extended the user response times. Based on the findings of the previous two studies [15,20], interactivity was suggested as a working substitute for the real-3D visualization in spatial tasks.
In contrast to the studies mentioned above, which mainly used 3D shutter glasses for real-3D perception, Dong et al. [54] compared pseudo-3D visualizations of thematic maps and virtual globes with real-3D visualizations of the same geodata using HMD. Based on the eye-tracking data analysis, they found that the users with HMD-based technology processed information more effectively (which the authors deduced from the lower average fixation duration), but the participants using pseudo-3D visualizations had significantly quicker visual search times (measured as average saccade duration) and response times. Similarly, Zhao et al. [37] also compared HMD and pseudo-3D visualizations. Their study compared different modes of virtual movement (teleportation and walking). However, the results differed from the study by Dong et al. [54], in which the pseudo-3D visualization on a desktop monitor was preferred. No major advantage of using walking over teleportation was identified [37].

3D Geovisualizations and Spatial Tasks
The digital twin can be used for different purposes that relate to specific tasks with (3D) geospatial data. The nature of the spatial tasks depends on a specific type of geospatial data and domain-specific objectives. The same factors also influence the spatial tasks applied in regular user testing sessions. Several taxonomies and classifications of tasks with 3D visualizations generally [55] and 3D geovisualization and 3D maps specifically [50,56,57] have been proposed. The most common types of tasks (e.g., "identify" and "compare"-see Roth [58,59]) are usually the simplest. Based on the meta-analysis by Herman et al. [60] these simple tasks are also often used in user testing which includes 3D geovisualization. Regarding user studies which compare real-3D and pseudo-3D visualization, Kraak [39], Kjellin et al. [50] and Sprinarova et al. [52] employed "identify" tasks, especially as searches for specified objects. Kraak [39] also employed quantitative estimation [55] of values from 3D geovisualization. Another common group of tasks is spatial understanding (according to Laha et al. [55]), which also incorporates the "compare" category (according to Roth [58]), because spatial understanding includes absolute and relative comparisons of given objects and their properties (i.e., height of bars, altitude of objects). This type of task was used by Kirschenbauer [21], Seipel and Carvalho [51], Seipel [22], Jurik et al. [53] and Jurik et al. [15]. A more complex type of task which involves planning or more complex decision-making is not very often applied in real to pseudo-3D comparisons. From the above-mentioned studies, Fuhrman et al. [49] used this type of task in the form of route planning.
However, the real applications of 3D geovisualization offer a wide range of even more complex tasks which should be adjusted and applied to current usability testing in order to study complex human interaction strategies, behaviour and cognition in ecologically valid contexts. If the experimental results are generalized for interactive real-3D geovisualizations or for VR, static visualizations (non-interactive perspective views) cannot be considered as stimuli for this type of user testing. This fact is supported by the results of user testing conducted by Herman et al. [60], where the static and interactive pseudo-3D geovisualizations were found significantly different in terms of both efficiency and effectiveness. Keeping the level of experimental control high, many previous user studies focused only on isolated issues of visual perception or dealt with non-interactive content. Static perspective views or animations/video were used as stimuli by Kraak [39], Fuhrman et al. [49], Kjellin et al. [50], Seipel and Carvalho [51] and Seipel [22].
For a meaningful comparison of real-3D and pseudo-3D visualizations of digital twins, it is necessary to work with ecologically valid tasks. In the case of interactive 3D geovisualizations, it is necessary to analyse not only the effectiveness, efficiency, subjective evaluation and preferences (satisfaction) but also the specific user interactions, which can be recorded with suitable user logging. All performance and self-reported metrics can be linked and analysed, potentially even in a real time [61].

User Aspects
The structure of digital twin users is similar to the structure of the world population, with the following basic principles and limitations. Perception, sensorimotor processes and cognition are closely related to the individual abilities, motivations, interests, experience and knowledge of people, and they can be limited by human cognitive capacity, the current physiological and mental state of the person or external factors appearing as distractions or time constraints. From this point of view, customizable, dynamically adjustable and interactive 3D geovisualizations may be suitable tools for the transmission of geospatial content into a preferred mode of information access [62]. However, several limitations exist. According to Ware [63], as much as 20% of the population is not able to see stereoscopically. Visual discomfort is also a common problem for users, often (by almost a half of the tested population) being reported by users of 3D technologies [51]. In addition, real-3D visualization increases cognitive load and user distraction [22,45], possibly an important factor which downplays any advantage resulting from additional visual depth cues. Regarding this, 3D technologies are currently being tested both experimentally and practically and are likely to continue requiring such testing.
A relatively common aspect in user testing of 3D geovisualizations is the fact that research subjects are generally participants who possess a deeper knowledge of geosciences (geography, geoinformatics, cartography or related fields, students or graduates [39,49,52]. In other studies [22,50,51], the users' experiences or professions have not been screened or described in any detail, which limits the potential re-use of the analysed data and findings. During their education and training, experts in geosciences disciplines have developed much specific knowledge and many strategies to work with various cartographic products and geospatial applications. Such specific strategies may greatly affect (or bias) sensorimotors or the interactive aspects of the human-computer interaction process. Because the present study explores more general principles, including human perception and interaction with 3D geographical content, as in previous studies [15,53], we decided to study a non-expert population. This strategy provides expert undistorted data on general human behaviour and cognition and will better represent the vast majority of potential virtual 3D application users.

Materials and Methods
Addressing open challenges from the aforementioned studies, we designed an original experimental study comparing the effectiveness and efficiency of stereoscopic and monoscopic 3D visualization for spatial planning within and beyond digital twins ( Figure 1). We employed interactive tasks which addressed complex problem solving and decisionmaking strategies. We studied the effect of the visualization type on general user performance and explored certain advances and processes in the task solving process. The above-mentioned visual discomfort and related troublesome aspects of real-3D visualization technologies suggested a decrease in effectiveness and efficiency. We a given DTM and place there a virtual transmitter in a way that all the buildings place in the terrain were covered by the signal (propagated from the top of the transmitter). I the present study, we used two variants of the spatial planning task. The first task varian contained buildings with the uniform priority (3 trials). The second task variant include buildings with a higher priority that were distinguished by red colour (3 trials). The num ber of buildings as well as presented terrain models varied within the trials.  The hypotheses were tested using a between-subject experimental design with two comparative groups. Sixty non-expert volunteers were recruited for the experiment. Participants in a real-3D condition were using 3D shutter glasses, participants in pseudo-3D condition were using standard PC monitors ( Figure 2). Participants were asked to explore a given DTM and place there a virtual transmitter in a way that all the buildings placed in the terrain were covered by the signal (propagated from the top of the transmitter). In the present study, we used two variants of the spatial planning task. The first task variant contained buildings with the uniform priority (3 trials). The second task variant included buildings with a higher priority that were distinguished by red colour (3 trials). The number of buildings as well as presented terrain models varied within the trials. o H2: Real-3D visualizations decrease participant efficiency (response time) in task solving. • RQ2: How the 3D visualization type affects a user's strategy during spatial decision making?
o H3: The interaction strategy is affected by the type of 3D visualization. o H4: The interaction activity relativized by task time (activity per second) is affected by the type of 3D visualization.
The hypotheses were tested using a between-subject experimental design with two comparative groups. Sixty non-expert volunteers were recruited for the experiment. Participants in a real-3D condition were using 3D shutter glasses, participants in pseudo-3D condition were using standard PC monitors ( Figure 2). Participants were asked to explore a given DTM and place there a virtual transmitter in a way that all the buildings placed in the terrain were covered by the signal (propagated from the top of the transmitter). In the present study, we used two variants of the spatial planning task. The first task variant contained buildings with the uniform priority (3 trials). The second task variant included buildings with a higher priority that were distinguished by red colour (3 trials). The number of buildings as well as presented terrain models varied within the trials.

Participants
Data were gathered from 60 volunteers, specifically humanities students (mainly psychology, languages, philosophy) from Masaryk University (11 males). The volunteers were mainly young adults (m = 23.5 years; med = 23 years; s.d. = 5.36) with no or limited previous training in working with geovisualizations. They were invited via email or social networks to participate. The prerequisite for participation was no visual impairment or any other possible medical limitation, for example, epilepsy. The participants were informed that they could cancel the experiment at any time and were free to leave the session if they felt dizzy or experienced any other discomfort. The study was approved by the Masaryk University Ethics Committee for Research under project identification number EKV-2016-059. Participants provided written consent through an informed consent form and were rewarded for their participation in the experiment with candies.

Procedure
The study employed a between-subject experimental design in which two groups of participants used different 3D visualization types (pseudo-3D and real-3D) to solve decision-making tasks in an interactive geovisualization ( Figure 3). We pseudo-randomly assigned participants according to their reported gender, to keep a balanced male to female ratio in each group. After arriving at the lab and signing the informed consent form regarding the experimental procedure, participants received instruction on how to use the control devices and were familiarised with the purpose of the experiment, which was the assessment of 3D technologies for geovisualizations. We then collected the participants' demographic data and instructed them how to proceed through the testing session. We emphasized that their performance would be measured and that they should therefore focus on both accuracy and speed. A task training session followed, and participants practiced how to control and navigate the interactive geovisualizations, solving several simple tasks (comparison of altitudes of given points and identification of certain terrain profiles) both wearing and not wearing 3D glasses. After the training, the participants completed six decision-making trials (see Section 3.3). In the conclusion of the experimental session the participants were debriefed and rewarded.
The experiment was conducted in the HUME Lab (Experimental Humanities Laboratory) facility at the Faculty of Arts, Masaryk University, using conventional desktop PCs with 27-inch 3D monitors and active 3D shutter glasses (NVIDIA 3D Vision ® 2 Wireless Glasses, 60 Hz on each eye) under artificial lighting conditions. A testing application based on the Unity ® game engine was designed and used in the experiment. This application supports real-time rendering of large 3D models using both monoscopic (pseudo-3D) and stereoscopic (real-3D) visualization, provides interaction with these 3D models and also automates data collection following the principles of user logging methods. For each trial, user performance in solving given tasks (efficiency and effectiveness) and user interactions were recorded. The participants controlled the experimental interface and tasks with a conventional optical mouse which provided them with the same controlling options (degree of freedom) for both groups.

Stimuli and Tasks
The "digital replica" of the terrain was provided by the airborne laser scanning-based fourth-generation Digital Terrain Model of the Czech Republic (DTM 4G). The DTM 4G was processed at a ground resolution of 5 × 5 metres to provide as detailed digital twin to the physical world as possible when taking into account the processing capabilities of contemporary computers.

Stimuli and Tasks
The "digital replica" of the terrain was provided by the airborne laser scanningfourth-generation Digital Terrain Model of the Czech Republic (DTM 4G). The DT was processed at a ground resolution of 5 × 5 metres to provide as detailed digital tw the physical world as possible when taking into account the processing capabilit contemporary computers.
The DTM 4G digital twin was divided into parallel processing steps from the graphical perspective: a DTM measuring 5 × 5 kilometres was used in all tasks. The d oped digital terrain replicas were selected so that the relative vertical variation was s in all territories. The study proceeded from a geographical regionalization of the C Republic [65], and all terrains depicted were highlands with a relative height variat 200 to 300 m. The DTM 4G digital twin was divided into parallel processing steps from the geographical perspective: a DTM measuring 5 × 5 kilometres was used in all tasks. The developed digital terrain replicas were selected so that the relative vertical variation was similar in all territories. The study proceeded from a geographical regionalization of the Czech Republic [65], and all terrains depicted were highlands with a relative height variation of 200 to 300 m.
The developed digital terrain replica was covered with an orthophoto. The developed digital terrain replicas were vertically scaled with a fixed factor of 3.0 to adjust human perception of the digital terrain replica. Movement in the digital terrain replica was recomputed in the graphical user interface through a reference scale equal to 1:1000.

Task 1: Uniform Buildings
The participants were asked to explore a DTM and place a virtual transmitter in the digital twin environment so that the signal directly transmitted from the top of the transmitter covered all the buildings in the scene (represented by red cubes, see Figure 4). Namely, they were asked to grab the transmitter and position it in the terrain so that the line of sight to all the "red" buildings would be preserved. They were also reminded that they could move around the terrain to obtain better views, and once they were satisfied with their placement, they could click the "continue" button to move on to the next trial. To measure participant performance and activity, we pseudo-randomly placed 4, 6 and 8 buildings, respectively, into the three different terrains (three trials in total). For a demonstration of the tasks with prioritized buildings, see the Supplementary Video S1. The participants were asked to explore a DTM and place a virtual transmitter in the digital twin environment so that the signal directly transmitted from the top of the transmitter covered all the buildings in the scene (represented by red cubes, see Figure 4). Namely, they were asked to grab the transmitter and position it in the terrain so that the line of sight to all the "red" buildings would be preserved. They were also reminded that they could move around the terrain to obtain better views, and once they were satisfied with their placement, they could click the "continue" button to move on to the next trial. To measure participant performance and activity, we pseudo-randomly placed 4, 6 and 8 buildings, respectively, into the three different terrains (three trials in total). For a demonstration of the tasks with prioritized buildings, see the Supplementary Video S1.

Task 2: Prioritized Buildings
In the next three trials, we placed red and green buildings in the terrain ( Figure 5). The participants were asked to cover as many buildings as possible with the signal, but the red buildings had priority. Again, we pseudo-randomly placed 4 (2 red, 2 green), 6 (3 red, 3 green) and 8 (4 red, 4 green) buildings, respectively, into the terrain for coverage with the signal (3 trials in total). In the final trial, it was not possible to cover all 8 buildings with the signal, and the participants only had to determine the most suitable solution. Participants were informed about this fact and were motivated to find as good of solution as they could (i.e., to cover all red buildings and as many green ones as possible). Before this session commenced, participants were reminded of the assignment, i.e., to cover as many buildings as possible. For a demonstration of the tasks with prioritized buildings, see the Supplementary Video S2.

Task 2: Prioritized Buildings
In the next three trials, we placed red and green buildings in the terrain ( Figure 5). The participants were asked to cover as many buildings as possible with the signal, but the red buildings had priority. Again, we pseudo-randomly placed 4 (2 red, 2 green), 6 (3 red, 3 green) and 8 (4 red, 4 green) buildings, respectively, into the terrain for coverage with the signal (3 trials in total). In the final trial, it was not possible to cover all 8 buildings with the signal, and the participants only had to determine the most suitable solution. Participants were informed about this fact and were motivated to find as good of solution as they could (i.e., to cover all red buildings and as many green ones as possible). Before this session commenced, participants were reminded of the assignment, i.e., to cover as many buildings as possible. For a demonstration of the tasks with prioritized buildings, see the Supplementary Video S2.
As each participant completed the final transmitter placement task, a static screen thanked them for participating in the experiment. As each participant completed the final transmitter placement task, a static screen thanked them for participating in the experiment.

Effectiveness and Efficiency
ISO-9241-210 [64] aiming at human-centred design for interactive systems is a standardization cornerstone for user evaluation of digital twins [66]. Effective performance is usually understood as the quick and correct solution of a given task (trial). In the present study, we calculated effectiveness as the correctness rate (Equation (1)) and efficiency as the task completion time (in seconds). In performance-based tasks, speed and accuracy may contradict; this phenomenon is known as the speed-accuracy trade-off [67]. However, in many applied situations, the only acceptable solution requires both parameters, and we therefore applied two additional usability parameters, which are defined in ISO-9241-210 [64]. The parameters combine both task time and correctness rate [66], given as time-based efficiency (Equation (2)) and overall relative efficiency (Equation (3)).
The general effectiveness (correctness rate) is given by Time based efficiency expresses the speed-accuracy ratio and includes the number of trials and participants and is given by

Effectiveness and Efficiency
ISO-9241-210 [64] aiming at human-centred design for interactive systems is a standardization cornerstone for user evaluation of digital twins [66]. Effective performance is usually understood as the quick and correct solution of a given task (trial). In the present study, we calculated effectiveness as the correctness rate (Equation (1)) and efficiency as the task completion time (in seconds). In performance-based tasks, speed and accuracy may contradict; this phenomenon is known as the speed-accuracy trade-off [67]. However, in many applied situations, the only acceptable solution requires both parameters, and we therefore applied two additional usability parameters, which are defined in ISO-9241-210 [64]. The parameters combine both task time and correctness rate [66], given as time-based efficiency (Equation (2)) and overall relative efficiency (Equation (3)).
The general effectiveness (correctness rate) is given by Time based efficiency expresses the speed-accuracy ratio and includes the number of trials and participants and is given by Overall relative efficiency expresses the ratio of the time taken by the participants who successfully completed the task (trial) relative to the total time taken by all participants from a single group, given by

Time-based efficiency and overall relative efficiency can both be calculated for individual participants or groups of participants.
Calculations of the parameters mentioned above and their further processing, including statistical analysis, is depicted in Figure 6. Overall relative efficiency expresses the ratio of the time taken by the participants who successfully completed the task (trial) relative to the total time taken by all participants from a single group, given by • EOR: Overall relative efficiency (as a percentage). Time-based efficiency and overall relative efficiency can both be calculated for individual participants or groups of participants.
Calculations of the parameters mentioned above and their further processing, including statistical analysis, is depicted in Figure 6. Figure 6. Procedure of data processing and analysis. Figure 6. Procedure of data processing and analysis.

Interactive Activity
Our study also presents an analysis of the strategies the participants applied in solving the given tasks (trials), measured using user logging. We applied two methods to these measurements: either calculated as absolute values or relativized by dividing the number of activities by duration of solving the task (trial). The recorded data were analysed as follows: Average angular 3D velocity of virtual movement (degrees per second).

Statistical Analysis
Both efficiency and effectiveness parameters (described in Section 3.4.1.) as well as interaction activity data (Section 3.4.2.) were analysed using statistical methods to identify differences between real-3D and pseudo-3D visualizations. Sample size has been estimated by a power analysis using the G*Power software [68]. The Shapiro-Wilk test [69] was used to assess normality of distributions in the gathered measurements, and according to the results we applied parametric (Welch's t test [70]) or non-parametric (Wilcoxon rank sum test [71,72]) methods where suitable to compare differences in performance between real-3D and pseudo-3D conditions. All p-values < 0.05 are reported as statistically significant, along with the effect sizes (r [73] and Hedge's g [74]), where Hedge's g is being meant as an unbiased version of Cohen's d [75]. Following the convention, we interpret r values 0.1, 0.3, 0.5 and Hedge's g values 0.15, 0.40, 0.75 as small, medium and large respectively [76,77]. Analysis was conducted in R-3.6.0 [78], and for visualizations, the ggplot2 package [79] was applied.

Results
In the present study, we analysed and reported raw observations of correctness rate (referred to as the effectiveness) and completion time (efficiency) in decision-making tasks of digital twins. Since effective task solving usually requires fast and concurrently correct solutions, we applied two additional usability measures which combine both effectiveness and efficiency parameters, i.e., time-based efficiency and overall relative efficiency. Furthermore, we reported participants' interaction activity performed during task solving, which was accessed as a set of specific measurements recorded by the user logging method. Two variants of interaction activity measurements are reported: (1) absolute interaction activity measurements and (2) measurements relativized by dividing the number of activities by the task solving time.

Effectiveness and Efficiency
The Wilcoxon rank sum test revealed a large significant effect in efficiency for 3D visualization types, measured as task time (W = 175.0; p < 0.001, r = −0.524) and in timebased efficiency (W = 732.000; p < 0.001, r = 0.54). The participants engaged with real-3D visualizations spent more time looking for the correct solution than participants using the pseudo-3D visualizations ( Table 1). The differences in the average values of the two groups are better evident from Figure 6 (among others, see the outliers). In the case of the pseudo-3D group, there were two participants who were characterized as outliers in terms of their total tasks solving time. The pseudo-3D visualizations were more time efficient for accurate decision-making in spatial tasks. We did not discover any significant differences in effectiveness (correctness rate) or overall relative efficiency. In effectiveness, the majority of participants determined the correct solution, although the pseudo-3D group scored slightly better (Figure 7). The pseudo-3D visualization users significantly outperformed real-3D users in task time and time-based efficiency, suggesting that pseudo-3D not only facilitated speed in solving the tasks but also maintained a level of precision. Examining the differences in overall relative efficiency and time-based efficiency at the group level, we find that the results of these two measurements also differ. While overall relative efficiency revealed a small difference between real-3D (m = 85.14%) and pseudo-3D (m = 86.73%), time-based efficiency in the pseudo-3D visualization was twice as good (m = 0.015 goal/s for real-3D; m = 0.030 goal/s for pseudo-3D).

Interactive Activity
The number of mouse clicks, which was considered a measure of the general interactive activity, showed significant differences (W = 300.500; p = 0.028) between the real-3D and pseudo-3D conditions, with a medium effect (r = −0.284). The participants generally used more clicks in real-3D than in pseudo-3D to solve the tasks ( Table 2 for detailed values). Similarly, the length of virtual trajectory, which represented the distance travelled by the virtual camera around the terrain during the task, showed significant differences between the groups, with a large effect (W = 172.000; p < 0.0001; r = −0.529), and indicated that real-3D users travelled more ( Figure 8). Total 3D rotation denoted the tendency to manage the task by rotating the terrain. This parameter also showed significant differences between the groups (W = 218.000; p < 0.001), with a medium effect (r = −0.442) and indicated that the participants rotated more in the real-3D condition. Average speed expressed the length of the virtual trajectory travelled per second, so participants in the real-3D condition were generally quicker in their virtual movements (t = −3.364; p < 0.001), with a large effect (g = −0.855). Mouse clicks per second and average 3D angular velocity did not reveal any significant differences between the groups. Examining the differences in overall relative efficiency and time-based efficiency at the group level, we find that the results of these two measurements also differ. While overall relative efficiency revealed a small difference between real-3D (m = 85.14%) and pseudo-3D (m = 86.73%), time-based efficiency in the pseudo-3D visualization was twice as good (m = 0.015 goal/s for real-3D; m = 0.030 goal/s for pseudo-3D).

Interactive Activity
The number of mouse clicks, which was considered a measure of the general interactive activity, showed significant differences (W = 300.500; p = 0.028) between the real-3D and pseudo-3D conditions, with a medium effect (r = −0.284). The participants generally used more clicks in real-3D than in pseudo-3D to solve the tasks (Table 2 for detailed values). Similarly, the length of virtual trajectory, which represented the distance travelled by the virtual camera around the terrain during the task, showed significant differences between the groups, with a large effect (W = 172.000; p < 0.0001; r = −0.529), and indicated that real-3D users travelled more ( Figure 8). Total 3D rotation denoted the tendency to manage the task by rotating the terrain. This parameter also showed significant differences between the groups (W = 218.000; p < 0.001), with a medium effect (r = −0.442) and indicated that the participants rotated more in the real-3D condition. Average speed expressed the length of the virtual trajectory travelled per second, so participants in the real-3D condition were generally quicker in their virtual movements (t = −3.364; p < 0.001), with a large effect (g = −0.855). Mouse clicks per second and average 3D angular velocity did not reveal any significant differences between the groups.

Discussion
In testing the main hypotheses, we did not find any significant difference in general effectiveness (correctness rate) between the real-3D and pseudo-3D groups of digital terrain twins. We therefore reject hypothesis H1, a conclusion which corresponds with the previous study by Jurik et al. [15] in which the expected general differences in effectiveness between the real-3D and pseudo-3D visualizations were not supported. It was confirmed that 3D perception of digital twins does not require shutter glasses; a general computer screen outperformed more expensive professional equipment. Even in the present study, no clear effect from the visualization types was identified in the correctness scores.
However, the evidence in the data suggests that the real-3D visualization significantly decreased efficiency in complex spatial decision-making tasks (task time; p < 0.001; r = -0.524) and strongly decreased performance with respect to combined task solution time and correctness rate (time-based efficiency; p < 0.001; r = 0.54). We therefore also reject hypothesis H2. Decreased performance was not traded in digital twins entirely for quicker solution regarding the speed-accuracy trade-off phenomenon [67]. In contrast to the subjectively perceived effectiveness of real-3D technology discussed in a previous study by Kubicek et al. [20], the effect discovered in the present study indicates that real-3D visualizations increased the task solution time and decreased the weighted efficiency performance in the given tasks. From these results, we conclude that real-3D visualizations significantly decreased performance in virtual 3D geovisualizations.
We found that pseudo-3D visualization was in digital twins significantly better not only in general efficiency but also in the economy of interactive movement. The participants who engaged with the pseudo-3D visualizations made significantly fewer mouseclicks (p = 0.028; r = −0.284). The participants who engaged with the real-3D visualizations travelled significantly longer trajectories with the virtual camera in the digital twin environment (p < 0.001; r = −0.529) and performed more 3D rotations with the model in an effort to find the correct solution (p = 0.001; r = −0.442). We therefore accept hypothesis H3. This evidence directly contradicts some of the findings from Sprinarova et al. [52], where real-3D visualization, due to additional depth cues, suggested a reduction in the necessity to rotate (and perform other interactions) the digital twin models. With respect to the efficiency scores, this means that the participants who engaged with the pseudo-3D visualization moved within the environment significantly less to find a solution and also worked more quickly and with greater accuracy. The high values of the effect size highlighted the large effect of real-3D technology on efficiency during task solution. In the

Discussion
In testing the main hypotheses, we did not find any significant difference in general effectiveness (correctness rate) between the real-3D and pseudo-3D groups of digital terrain twins. We therefore reject hypothesis H1, a conclusion which corresponds with the previous study by Jurik et al. [15] in which the expected general differences in effectiveness between the real-3D and pseudo-3D visualizations were not supported. It was confirmed that 3D perception of digital twins does not require shutter glasses; a general computer screen outperformed more expensive professional equipment. Even in the present study, no clear effect from the visualization types was identified in the correctness scores.
However, the evidence in the data suggests that the real-3D visualization significantly decreased efficiency in complex spatial decision-making tasks (task time; p < 0.001; r = −0.524) and strongly decreased performance with respect to combined task solution time and correctness rate (time-based efficiency; p < 0.001; r = 0.54). We therefore also reject hypothesis H2. Decreased performance was not traded in digital twins entirely for quicker solution regarding the speed-accuracy trade-off phenomenon [67]. In contrast to the subjectively perceived effectiveness of real-3D technology discussed in a previous study by Kubicek et al. [20], the effect discovered in the present study indicates that real-3D visualizations increased the task solution time and decreased the weighted efficiency performance in the given tasks. From these results, we conclude that real-3D visualizations significantly decreased performance in virtual 3D geovisualizations.
We found that pseudo-3D visualization was in digital twins significantly better not only in general efficiency but also in the economy of interactive movement. The participants who engaged with the pseudo-3D visualizations made significantly fewer mouse-clicks (p = 0.028; r = −0.284). The participants who engaged with the real-3D visualizations travelled significantly longer trajectories with the virtual camera in the digital twin environment (p < 0.001; r = −0.529) and performed more 3D rotations with the model in an effort to find the correct solution (p = 0.001; r = −0.442). We therefore accept hypothesis H3. This evidence directly contradicts some of the findings from Sprinarova et al. [52], where real-3D visualization, due to additional depth cues, suggested a reduction in the necessity to rotate (and perform other interactions) the digital twin models. With respect to the efficiency scores, this means that the participants who engaged with the pseudo-3D visualization moved within the environment significantly less to find a solution and also worked more quickly and with greater accuracy. The high values of the effect size highlighted the large effect of real-3D technology on efficiency during task solution. In the other scores of number of mouse clicks per second and average 3D angular velocity (degrees rotated per second), we found no differences between the groups. We therefore reject hypothesis H4.
In general, the results in the present study correspond to several previous studies which have explored human perception and decision-making using dynamic and realistic geospatial content [15,37,53]. The results demonstrate that real-3D visualization of a digital twin is a problematic in aspects such as efficiency and effectiveness, which further highlights the importance of previously discussed visual distraction and discomfort while wearing peripheral 3D devices [22]. The efficiency effects revealed by the results clearly speak for the advantage of pseudo-3D visualization in digital twins' geospatial applications: effective task solution in applied issues requires both speed and the accuracy, which should be facilitated by the interface of the specific application.
Application of digital twins within and beyond spatial planning is influenced by the dichotomy of uncertainty and realism [7]. Uncertainty was always an important phenomenon in the geosciences [4][5][6]. Even in the case of visualizations within the research conducted by us, uncertainty sources commonly described in scientific papers arose [4][5][6][7]. The following uncertainty sources were identified in the presented paper:

•
Uncertainties of the input DTM (that were described by the data producer) and uncertainties related to DTM generalization (that affects our research). For more accurate modelling, it would be appropriate to use a digital surface model (DSM; including buildings and vegetation) instead of DTM in order for the spatial planning task to more closely reflect the reality. • Uncertainties originating from modelling simplification: a signal from transmitters is in reality influenced by other factors (e.g., diffraction, absorption) while the modelling approach was solely based on direct visibility. • Uncertainties related to limitations of human perception: each person uniquely perceives the presented digital terrain replica. In general, the human perceptionrelated uncertainties were limited by comparing the results from user groups instead of individuals.
As stated by Klippel et al. [7], uncertainty most probably increases when developing a more complex digital twin. Further empirical research seems necessary to find a balance between uncertainty and realism in the scope of immersive visualizations of digital twins. As stated by [5,19], visualizations, scenarios and datasets need to be informationally equivalent.
Uncertainty in digital twins is also present on the level of statistical evaluations. Such an uncertainty is a common limitation in inferential statistics. The effect size and confidence levels computed in advance are commonly employed to address the statistical uncertainty [73][74][75]80], as was the case in our research.
In summary, the novelty and practical consequences of our research are as follows.
• Time-based efficiency and overall relative efficiency usability metrics were used as a novel methodological approach to find a balance between the quality and rapidness of evidence-based decision making in spatial planning based on digital twins. We also analysed the interaction activity, which is not a common feature in similar studies. • A higher complexity of tasks used for evaluations in our experiment can be identified in comparison to similar studies [15,20,22,52,53]. Such a higher complexity implies, among other things, (1) a digital twin closer to reality, (2) a task that better simulates evidence-based decision making, and (3) a higher impact in practice.

•
Based on the findings presented in this paper, we do consider pseudo-3D (monoscopic) visualization to be a more suitable and, at the same time, more accessible way of digital twins presentation in practice. The interactive pseudo-3D visualization of digital twins makes it possible to involve a wider range of experts and the general public in spatial planning, which promotes the principles of participation to increase public acceptance.

Conclusions and Future Work
The results of this study with 60 participants confirmed that shutter glasses (real-3D) are not a benefit compared to an ordinary computer screen (pseudo-3D) in 3D visualizations in digital twins. The participants who used the real-3D visualizations performed significantly worse in the tasks in terms of task completion time and time-based efficiency (a score measuring weighted task correctness). The limitations of shutter glasses appear primarily in complex human decision-making and problem-solving processes above the digital twins. Typical examples are evidence-based policy making, observations and measurements based on 3D digital twins or data-driven analytics.
Open challenges include the following topics: • Exploration of the differences between user performance in the use of pseudo-3D geovisualizations compared to real-3D using different hardware as HMD that provide a higher degree of immersion. • Ecological validity of tasks, i.e., an improvement of the experimental tasks to appear more realistic. For instance, it would be useful to test more complex tasks, such as planning optimal routes through a digital terrain replica.

•
The testing of realistic 3D digital twins of more complex environments as cities, building interiors or complex natural environments (like caves or tropical forests). • Focus on differences resulting from gender, cultural background, knowledge and other factors need to be verified for digital twins.
Both this study and addressing the open challenges aim to bring the physical and digital world interactions in digital twins a step closer together.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13152976/s1, Video S1: Example of interactive 3D geovisualization for Task 1-uniform buildings; Video S2: Example of interactive 3D geovisualization for Task 2-prioritized building. Funding: The APC was funded by project entitled "Geographical research on dynamics of natural and societal spatial processes" (MUNI/A/1570/2020). This research was supported also by the research infrastructure HUME Lab Experimental Humanities Laboratory, Faculty of Arts, Masaryk University.

Institutional Review Board Statement:
The study was conducted according to guidelines of the Declaration of Helsinki, and approved by the Research Ethics Committee of Masaryk University (protocol code EKV-2016-059; 26 September 2016).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Data from user testing aggregated for individual participants as well as the script for statistical analysis are available online at https://www.mdpi.com/2072-4292/13/1 5/2976/s1 (accessed on 6 June 2021).