A Task-Centred Methodology to Evaluate the Design of Virtual Reality User Interactions: A Case Study on Hazard Identification

: Virtual reality (VR) is a computer-based technology that can be used by professionals of many different ﬁelds to simulate an environment with a high feeling of presence and immersion. Nonetheless, one main issue when designing such environments is to provide user interactions that are adapted to the tasks performed by the users. Thus, we propose here a task-centred methodology to design and evaluate these user interactions. Our methodology allows for the determination of user interaction designs based on previous VR studies, and for user evaluations based on a task-related computation of usability. Here, we applied it on the hazard identiﬁcation case study, since VR can be used in a preventive approach to improve worksite safety. Once this task and its related user interactions were analysed with our methodology, we obtained two possible designs of interaction techniques for the worksite exploration subtask. About their usability evaluation, we proposed in this study to compare our task-centred evaluation approach to a non-task-centred one. Our hypothesis was that our approach could lead to different interpretations of user study results than a non-task-centred one. Our results conﬁrmed our hypothesis by comparing weighted usability scores from our task-centred approach to unweighted ones for our two interaction techniques.


Introduction
The Architecture, Engineering and Construction (AEC) industry is a sector where the design and the preparation of future elements and actions are key to the whole lifecycle of a building [1]. This goes from its preconception with ground and cost studies to its maintenance with future reparation planning. To address such needs of foresight, anticipation, and preventive correction, this sector is currently using more and more computer technologies [2,3]. Among them, building information modelling (BIM) methodology and tools are ones of the most widely used, notably for their power of data follow-up, update, and sharing [4,5]. Nonetheless, other computer technologies can also be used in the industry such as virtual reality (VR), augmented reality (AR), the internet of things, or artificial intelligence [6]. Indeed, for example, the VR technology has been particularly adopted during the design stage of the construction [7], and AR during the maintenance stage [8].
In the last years, it has been shown in the literature that computer-based technologies can be beneficial for a large variety of purposes and issues in the AEC industry, such as energy management [9], construction monitoring [10], or design reviewing [7,11].
Focusing here on one cross-cutting issue in particular, safety is a paramount one in the AEC industry, both before and after the construction of a building. Indeed, on one hand, during the operation and maintenance stages, i.e., after the construction, the safety of the building users is required and can be ensured by carrying out corrective and follow-up actions through safety assessments [12]. To conduct such safety assessments, computational approaches such as artificial neural networks can be used and contribute to the prediction of potential safety problems and, thus, to the anticipation of maintenance actions, as explained by Harirchian et al. [13]. On the other hand, before the use of the building and, therefore, during its construction, this is the safety of the AEC workers that must be ensured. The AEC industry is indeed an industry that is particularly dangerous for its workers [14][15][16][17]. In many countries, accidents are more frequent and more fatal for workers in the AEC industry than in other ones [18]. To address this crucial issue [14], computer-based approaches such as BIM-based ones have been proposed in the literature [19,20].
In this paper, for our case study, we focused on the construction workers' safety, on a worksite. According to the Occupational Safety and Health Administration-the OSHAthe most common construction hazards are the fall, electrocution, caught in between, and struck-by hazards [21]. To address these safety issues, one option is to conduct preventive and corrective actions on the worksites during the construction, for example using signals that highlight hazardous zones in augmented reality [22] or using monitoring systems that can prevent mechanical failures [23]. Another option is to take safety measures before the beginning of the construction, notably by improving the understanding of these worksite hazards by the AEC professionals [24], and by learning more about the ways to identify and mitigate them [25,26]. As a result, a technique called design for safety has been proposed in the literature to prevent hazards from the design stage of the building lifecycle [27][28][29]. This has been particularly studied in BIM authoring tools, either to avoid fall hazards with the automatic addition of protections in such hazardous zones [20,30], or to prevent electrocution fatalities [31]. For the two other main kinds of hazards-caught between and struck-by ones-other preventive actions can be done on construction planning, instead of the building design itself. Indeed, these kinds of hazards are directly related to the dynamic aspect of a worksite since they are mostly due to construction vehicles and workers movements [32,33]. Then, as construction vehicles and workers follow planned movements during the construction, hazardous zones-zones with conflicts that may result in collisions-can be identified in construction planning, usually using paper plans [34] or eventually BIM 4D simulations [35], i.e., construction simulation with 3D representations over time.
In our approach to improve the workers' safety, we focused on the preventive actions aforementioned, and in particular on the process of conducting hazard identification reviews. For that, the VR technology can be used [36], notably for hazards that are related to dynamic events, thanks to the improved sense of immersion and presence that VR offers [37][38][39]. In that sense, several studies have been conducted in the literature about the use of VR for hazard assessment [40]. Many of them have focused on hazard identification in VR for training purposes, either for workers or civil engineers [41]. Indeed, Sacks et al. [42] compared the effect of a traditional training to a VR training on the civil engineers' memory by evaluating their hazard knowledge after different intervals of times. In their experiment, the VR-trained participants performed better than the control group who used paper plans. Zhao et al. [43,44] and Fang et al. [45] focused their studies on training workers to avoid having hazardous behaviours, for example when they use a crane. Finally, in the same line, Xu et al. [46] and Joshi et al. [47] conducted objective and subjective user evaluations on VR training for safety, which confirmed the benefits of this technology for construction workers. Nonetheless, the use of VR has also been studied for other purposes than training, and notably as an environment that can allow safety managers to detect design or construction planning defects that would engender hazardous zones in the future worksite [48][49][50][51].
In line with the aforementioned papers, which showed the benefits of VR for the hazard identification task, our paper proposes to study the use of VR environments for this specific task, and its design and evaluation process in particular. Indeed, as virtual reality is intrinsically linked to human factors [52,53], one issue related to VR is that the design of VR environments and notably its user interactions should be adapted to the VR users' tasks [54]. Indeed, the design of the user interactions in VR is paramount since their usability is directly linked to user task performance [55,56], and as a result to the effectiveness of the user tasks performed in VR [57]. To address this issue, user-centred design methodologies have been developed in the VR literature. These methodologies classified the different existing interaction techniques and presented some approaches to make design choices for the VR user interactions [58][59][60]. Nonetheless, a remaining problem is that in these methodologies the main actors of the design process must have knowledge expertise in VR [58]. Indeed, such user-centred methodologies appear to be conceived to be used by VR developers and then to guide these developers in a process of integration of the users [60]. As a result, the influence of the users and also of the user tasks characteristics on the design may be lower than expected. To address this problem, we propose here a new methodology for the design and the evaluation of VR user interactions, which is centred on the tasks performed by the users and makes them the main actors of the VR design process. The novelty of our work lies in this task-centred approach for both the design and the evaluation of VR user interactions.
About our task-centred methodology and our evaluation approach in particular, we first followed the principles stated by Bowman et al. about usability [61,62]. Therefore, we mainly based our evaluation approach on the use of quality factors to define and determine the usability of VR user interaction techniques. Then, we aimed to strengthen the use of these usability quality factors in a task-centred way. Hence, we proposed to attribute different weights to the quality factors, which represent their relative importance for the current task. Then, these weights could be used to perform a task-centred computation of the usability of an interaction technique. In the user evaluation that we conducted in this paper, we computed both weighted-sum and unweighted (simple-sum) usability scores, to compare our task-centred evaluation approach with a non-task-centred one [59,60]. We expected that weighted and unweighted scores would give different interpretations about the usability of two different interaction techniques that we tested here. We present in Section 2 our task-centred methodology, how we applied it on the hazard identification case study, and the user evaluation that we conducted to test our hypothesis. Results of our experiment are given in Section 3. Finally, Section 4 provides an interpretation of these results, and Section 5 gives some conclusions regarding this study and ideas of future work.

Our Task-Centred Methodology to Design and Evaluate VR User Interactions
In our research, we proposed a new task-centred methodology to design and evaluate VR user interactions. Its objective is to guide the creation of VR applications and in particular to drive the choices of VR interactions in terms of usability according to the user task. Our methodology relies on taxonomies and characterisations of tasks, user interactions, and user interaction techniques, and is supported by previous works in this direction from the VR literature [58,60,63]. This methodology is composed by three task-centred steps, two to theoretically design the VR user interactions and one to proceed to a concrete user evaluation. Figure 1 gives an overview of our methodology.
The first step of our methodology consists of the construction of a model that defines and decomposes the user task to be performed. In that purpose, a hierarchical task analysis [64] is conducted on this task, which leads as an output to a decomposition into subtasks that we called primitive tasks. The aim of such analysis is to go from a task that is specific to a field to more generic primitive tasks that can be directly linked to VR user interactions, such as selection, manipulation, and navigation. Examples of such primitive subtasks are "to move to some places", or "to orient an object". To perform this hierarchical task analysis, we created a semi-automated system that guides the task decomposition into subtasks. Then, our second step consists of the determination of proposals of interaction techniques that are adapted to the subtasks obtained from the first step. This step uses a rule-based system to deduce the characteristics of the interaction technique that would be more adapted in terms of usability for a given subtask, according to its characteristics. This system relies on usability studies that have been conducted in the literature for different kinds of tasks and conditions using different VR interaction techniques. Based on these studies and depending on the kind of interaction-selection, manipulation, or navigationand the user task characteristics, our system can determine either a unique proposal of interaction technique, or several proposals in absence of previous results for some VR interaction technique characteristics. Examples of such interaction technique characteristic are the order of magnitude of the speed for a navigation interaction, or the kind of selection tool for a selection interaction.
Finally, our methodology ends with an evaluation step that consists of the creation of a VR application with the proposed interaction techniques and of a task-centred user study using this VR application. Each primitive task-determined in our step 1-and its associated interaction technique(s)-determined in our step 2-should be evaluated independently in the user study. Moreover, in the case of multiple proposals of interaction techniques for one user interaction, these different proposals must be tested in the user study, and their usability can be compared to improve the knowledge about such VR interaction techniques and their use for the studied user task. In any cases, first, VR applications must be developed as concrete supports for the user study. At this stage, external constraints that may influence the concrete implementation of an interaction technique, such as the budget or the kind of devices to be used, must be stated and considered. Then, a task-centred user study must be conducted, based on one main principle: the application of quality factors on the usability evaluation [61,65], according to the studied subtask. The quality factors of an interaction vary depending on the kind of interaction, and can be for example the spatial awareness for a navigation interaction, or the ease-of-use for a selection interaction. Our task-centred approach relies on defining the relative importance of each quality factor depending on the current user task and subtask. This can notably be done thanks to the users' expertise in the field to which the task belongs. To define such relative importance, we propose in our methodology to use weights that represent the importance of each quality factor in terms of usability for the VR interaction when performing the current user task. Finally, a weighted usability score can be computed using a formula that takes in account the user scores for each quality factor and the weights previously determined. Additionally, to reduce the number of quality factors and, thus, the number of measures to take, a preliminary selection of the quality factors that are relevant for the user task can be done first, through an interview with professionals of the task field.

Objective of the Study and Hypothesis
In this paper, we present how we applied our task-centred methodology on the case study of the hazard identification task. This methodology allowed us to determine the design of the VR user interactions for each subtask; nonetheless, for the worksite exploration subtask, we determined two possible designs for its associated navigation user interaction. These two designs can be evaluated and compared in terms of usability in the evaluation step of our task-centred methodology. In this study, we propose to follow both our task-centred evaluation approach and a non-task-centred one. Our objective is to study the effect of the application of our approach on the usability results and their interpretation, compared to a non-task-centred approach.
In that purpose, we propose here to compute the usability scores for our two possible interaction techniques in two main different ways: an unweighted way from a non-taskcentred approach, and the weighted way from our methodology. Figure 2 shows these two computational approaches and their associated formulas, U1 in the unweighted way and U2 in the weighted way [66]. Then, we propose to compare the unweighted and weighted scores for our two navigation interaction techniques, and to see if these comparisons are similar or different depending on the kind of computation. Our hypothesis is that our task-centred evaluation with the weighted computation would give different results and usability interpretation compared to the non-task-centred evaluation.

Presentation of the Case Study and the Hazardous Situations
In this paper, we took as a case study the task of hazard identification, in a preventive context, i.e., before the beginning of the construction on a worksite. Indeed, thanks to the immersion that it provides, VR can improve the user task performance for this task that is usually performed by construction safety civil engineers on paper plans [34]. Nonetheless, the strength of the VR benefits may vary depending on the kinds of hazards. Indeed, hazards linked to the workers' behaviours benefit more from the VR immersion [49] than hazards related to worksite design issues such as trench collapse hazards, or to personal protection equipment issues, such as electrocution hazards. Consequently, in this study we focused on the two following kinds of hazards that directly involve workers: the falls ones and the struck-by ones-more specifically workers-vehicles collisions for these last ones. Figure 3 shows two examples of these kinds of hazards in our VR application. With these two kinds of hazards, our user task here is directly related to the workers' behaviours, and thus all the hazardous situations would involve a worker in our study. As a positive side-effect, this would help our participants to signal hazards in a unified way by always targeting a worker and not another element of the environment such as a vehicle. However, a potential negative side-effect might have been for this task to become a "search-a-worker" task: to avoid this, we put in our VR application some workers in both safe and hazardous situations. Similarly, to prevent false positives, we put basic personal protection equipment to all the virtual workers, so they could only be identified as in hazardous situations for the following reasons: the absence of external-not personalprotections such as guardrails or barriers, their proximity to a moving vehicle, or the use of inappropriate material on scaffolding such as pallets. Figure 4 shows on the top a worker in a safe situation thanks to guardrails, and on the bottom another one in a struck-by hazardous situation.

Step 1: Hierarchical Analysis of the Hazard Identification Task
To design our VR user interactions for the hazard identification task, first we followed our hierarchical task analysis step that consists in decomposing this task into subtasks until reaching primitive tasks such as move, place, orient, etc. Through this analysis, we obtained the following primitive tasks: • to move in the whole worksite; • to activate the functionality to signal workers in hazardous situations; • to target workers in hazardous situations; • to place/orient the machines present on the worksite.
The primitive task "to move in the whole worksite"-i.e., exploration of the worksiteis one of the key subtasks for a successful hazard review since a bad locomotion would imply potential misses in terms of hazard detection, for example by going too fast. This is why we propose in this study to focus on this exploration of the worksite subtask, and thus on the design of its related user navigation interaction.

Step 2: Determination of Proposals of VR Interaction Techniques
For each primitive task, we used our rule-based system to determine proposals of VR interaction techniques, according to the characteristics of these primitive tasks. We obtained the following results: • for the subtask "activate the functionality to signal workers in hazardous situations": physical/interface button interaction technique; • for the subtask "target workers in hazardous situations": virtual raycast pointer or virtual hand go-go interaction technique; • for the subtask "place/orient the machines present on the worksite": world-inminiature interaction technique; • for the subtask "explore the worksite": two potential navigation interaction techniques with one unfixed value (selection target mode).
Indeed, for the worksite exploration subtask, we obtained two proposals of navigation interaction techniques, which share the same following characteristics values: the control by the user of the direction in which to move without any restrictions, the user control of the acceleration through fits and starts actions, and a navigation speed in the human speed order of magnitude, since the users need to collect information for identifying hazard during the navigation. These two proposals differ from only one characteristic, the selection mode of the target for the next navigation point, each proposal having one of the two potential different values for this characteristic: the direct selection mode value-on the 3D space, and the indirect one-e.g., on a 2D map or through a list of zones. Based on that, we proposed the two following concrete implementations for our VR application prototypes: • a free pointing steering technique with a direct selection mode on the 3D space, with smooth translation at a human speed to the targeted place; • a free pointing steering technique with an indirect selection mode on a 2D map, with smooth translation at a human speed to the targeted place.
These two VR navigation interaction techniques consist for the user in targeting through a virtual raycast a place to move towards, and then in smoothly travelling to this place. The direct navigation technique allows a user to select the place or target to move towards by pointing on the ground of the 3D space of the worksite. The indirect navigation technique allows a user to select the place or target to move towards by pointing on a 2D map that represents the worksite. Figure 5 shows on the left a user moving with the direct technique, and on the right another user moving with the indirect technique.

Step 3: Creation of VR Prototypes and Task-Centred Evaluation Approach
For each of the navigation interaction techniques obtained from our step 2, we needed to develop a prototype of VR application. It should be highlighted that in our two prototypes, the interaction techniques related to all the other subtasks than the worksite exploration were also implemented thanks to step 2 results, such as hazardous situation targeting and tagging subtasks. We developed these VR prototypes in the Unity3D 2019 game engine, and the building models that we used here had been previously exported from Autodesk Revit 2018 BIM software.
To choose the VR devices to be used in this study, our main constraint was their accessibility to the experiment participants. Commonly, this accessibility issue in VR mainly comes from cost restrictions on the VR equipment, depending on where the experiment is conducted: available laboratory equipment, outdoor experiment restrictions, etc. In our case, we had to face two main restriction aspects, due to the full lockdown of the laboratories in 2020, because of the COVID-19 pandemic: our devices and data collection had to support a full remote usage, and our participants had to own their personal device. Thus, we chose to develop our VR prototypes for two different kinds of devices: computer desktops and VR head-mounted displays (HMD). We thought having two device possibilities should increase our participants' number. Therefore, we built four VR prototypes in total, one for each of the two interaction techniques for our two kinds of devices.
Our first available option for our participants was to use a desktop application on a computer. To allow for its remote distribution, we distributed our application online through a webpage, accompanied with all the required instructions and questionnaires. Our application was displayed on web browsers using the WebGL technology, and had been optimised for computer desktop screens. We proposed this modality to address our accessibility issues with this easily-accessible option. Additionally, pointing interaction techniques could be easily reproduced in this kind of 3D non-immersive environment, allowing us to get a correct evaluation of our navigation techniques.
The second option that we provided to our users was an HMD application. This option was viable thanks to an initiative of IEEE VR community members that had taken VR equipment outside of their laboratories during the lockdown and had proposed to share remote experiments [67]. We also distributed our HMD application through a webpage with all the instructions and questionnaires. Participants had to download and install our application on their HMD to use it, allowing thus for having subjects that would perform their task in a 3D immersive environment despite the lockdowns. To maximise our participants number, we built several versions of our VR application for different HMDs: the HMD Oculus Quest, the Oculus Rift and the HTC Vive. This was possible since all these HMDs share the same key characteristics that we needed for this study: the tracking of the user head that allows for free head movements during the hazard identification review, and the presence of tracked hand controllers that our users could use for our navigation and selection interaction techniques. Figure 6 shows a user navigating in the desktop application thanks to a mouse pointer on the left, and a user navigating in the HMD application using a HMD controller on the right. Then, for our usability evaluation with users, different quality factors can be taken into account in the case of a navigation interaction. Bowman et al. [61] proposed the following list of quality factors for navigation interactions: • the rapidity/speed during a travel; • the precision/accuracy, i.e., the proximity of the arrival point compared to the desired target; • the spatial awareness, i.e., "the user's implicit knowledge of disposition and orientation within the environment during and after travel"; • the ease of use of the interaction technique for a novice; • the ease of learning of the interaction technique for a novice; • the information gathering, i.e., "the user's ability to actively obtain information from the environment during travel"; • the presence felt by the users in the virtual environment thanks to the navigation interaction.
Nonetheless, Bowman et al. also noticed that depending on the user task and its context, some quality factors would be more relevant than others to evaluate the usability of an interaction technique [61]. As suggested in our methodology, a preliminary selection of the relevant quality factors for our specific case study and navigation interaction can be done through an interview with some AEC professionals. First, the professionals noted the importance of the information gathering quality factor because of the context of hazard identification, and considered the spatial awareness and ease-of-use quality factors as complementary factors. Indeed, our navigation interaction technique should allow for a great inspection of the virtual worksite when moving-information gathering factor, should not be complex so the users can focus on their inspection and not on navigatingeasy to use factor, and should not make the users disoriented in order to let them locating spatially the hazards-spatial awareness factor. Moreover, they noted that, with large distances to be travelled during the worksite inspection and without any specific need of precision, the accuracy quality factor could be discarded, whereas the rapidity quality factor could be kept. They nonetheless remarked the lower importance of the rapidity quality factor since the quality and the completeness of the hazard identification was more important than its rapidity. Finally, it was hard to consider in our study the presence quality factor since we were also using some non-immersive devices.
To sum up, in the case of our navigation interaction and exploration task, the following quality factors selected for our user study were: • rapidity; • information gathering ability; • ease-of-use; • spatial awareness.
Usability scores would be computed for each quality factor from the measures taken during the user study. Following our methodology, weights that represent the quality factor importance would be used for such computation, and in this study, they would be attributed by our participants through several weighting techniques that we explain in our user study protocol.

A User Study about Our Task-Centred Evaluation Approach on the Hazard Identification Case
Study and Its Navigation Interaction in VR 2.4.1. Subjects For our user study, we recruited voluntary unpaid subjects. Our experiment was available through the Internet without specific material requisites, except for the HMD participants. Oppositely, in terms of knowledge and skills requisites, performing the hazard identification task should have required for our participants to be safety engineers or construction workers. However, to avoid such limitation, we provided on the webpage of the experiment an informative document about the kinds of hazards that must be identified here. Similarly, to avoid computer skills prerequisites, a complete tutorial about the commands for navigating and tagging the workers in danger was provided. This allowed users with any kind of background to participate in our experiment.
Thanks to this design for our study, 34 different subjects participated to the experiment on the desktop version of our application, and 28 succeeded in performing correctly their task-the 6 remaining did not entirely finish the experiment. Data collected through a prequestionnaire confirmed the presence of a large variety of profiles in terms of previous VR experience and AEC and safety knowledge. These subjects had mainly been recruited from the personal and professional networks of the authors. In the case of the HMD version of the application, half of the participants were VR experts from the IEEE VR community [67], and the other half were non-experts in VR. In total, 12 subjects participated to the HMD experiment with either the Oculus Quest, Oculus Rift or HTC Vive HMD.

Protocol and Design of the Experiment
Our experiment followed a within-subject design to evaluate the usability of our two navigation interaction techniques during the global task of hazard identification. Our participants had thus to perform this task twice, once with each navigation technique. For that, we built two scenarios with fall and struck-by hazardous situations in a unique virtual worksite. In these scenarios, some workers were positioned into different places of the worksite. Some of them were in safe placed, whereas the other ones were either close to a moving machine without any elements that could define a restricted walking area, or close to a high difference of height levels without any guardrails.
After having accepted with informed consent, the participants had to follow the following protocol, which was entirely explained on our experiment webpages. First, general explanations were given about the purpose of the experiment. Then, participants had to read an informative document about the hazards that would exist on the virtual worksite. After that, a tutorial about the VR application commands was given: navigation techniques, selection and manipulation techniques to target a worker, etc. Next, participants had to fill a short pre-experiment questionnaire about their previous experience and knowledge about the VR technology, the hazard identification and mitigation and the AEC industry processes. Next to this form, they also had to attribute weights to the quality factors used in this usability evaluation. It was explained to the participants what a quality factor was, and short definitions of all the quality factors were given. They had to evaluate which quality factors were more important than the others ones by distributing weighting points.
Then, each participant had to perform a training in the application using the tagging worker interaction techniques and the navigation ones. Finally, each participant was able to pursue the hazard identification task in the application, either through the desktop or the HMD one, once with each of our two navigation interaction techniques. The order between the navigation interaction techniques and the scenarios-i.e., the different sets of hazardous situations were randomised and counterbalanced.

Measures for Each Quality Factor and Scores Computation
For each of our four quality factors-rapidity, information gathering, spatial awareness, and ease-of-use, measures were taken during the experiment. Based on our remote design for this experiment, with participants in full autonomy, first priority had been given to non-invasive and automatic measures. Inspired by previous experiments about navigation technique usability [58,61], we proposed to take the following measures, shown in Table 1.

Rapidity Navigation time (in seconds) Information gathering Number of hazards marked + time spent to identify (in seconds) Situation awareness Time spent for camera orientation (except when moving, in seconds)
Ease-of-use Number of clicks required for navigating in all the worksite From the data collected for each of these measures, we computed for each participant their personal scores for the four quality factors. These scores came from the normalisation of the data obtained from all the participants, after having processed the outlying values. We indeed computed these scores by applying a linear transformation of the data, with the value 1 attributed to the "best" value-e.g., the lower time, the higher number of hazards detected, etc., and 0 to the "worst" one. In the case of the information gathering quality factor-the only quality factor with two measures here, we computed an intermediate score for each of our two measures, and then we took the average of these two scores. Finally, this normalisation from 0 to 1 for all the scores allowed us to have the same scale for all of them, before computing the final usability scores, also on a 0 to 1 scale.

Weight Attribution to Quality Factors
In this study, to test our hypothesis we needed to compute both unweighted and weighted usability scores. To get these weights in this experiment, we asked the participants to attribute them through a pre-experiment questionnaire before using our VR application. Moreover, we used here two different techniques to obtain these relative importance weights, either by comparing all the quality factors importance at the same time, or by pair-comparison. By doing so, we expected that our users would be comfortable with at least one of these two techniques, and would succeed in attributing weights according to the importance they perceived for each quality factor in the context of this hazard identification task.
In the first technique, called "simple" weighting, participants had to define the weighting by distributing weights on four sliders from 0 to 6, as shown in Figure 7. Each value was directly reflecting the importance given to each quality factor. In the second technique, called the "cross-weighting" technique [68,69], they had to define the weights using a cross table, comparing the quality factors by pairs. Here, for each pair of quality factors, participants had to select which quality factor was, as for them, the most important between both, and then they had to attribute a relative weight of 0, 1, or 2. A weight of 2 meant that the selected quality factor was "much more important" than the other, 1 that it was "moderately more important", and 0 meant that the two quality factors had the same importance. Figure 8 shows the interface provided in our webpage for the cross-weighting technique. Finally, pairwise weights must be summed to obtain a total weight for each quality factor and thus to have a weighted usability formula similar to the one obtained with the simple-weighting technique.  In total, we used in this user study four different ways to compute our global usability scores: (1) the unweighted way to get the "unweighted usability scores" without following our task-centred approach, (2) the user "simple-weighting" way, (3) the user "cross-weighting" way, and (4) an expert weighting way to get "weighted usability scores" following our task-centred evaluation approach. This expert weighting had been attributed before the experiment thanks to the following comments made by the AEC professionals about the different quality factors: the main importance of information gathering, the medium importance of the situation awareness and of the ease of use, and the lack of importance for the rapidity. This resulted in the following expert weighting: 0 for the rapidity, 4 for the information gathering, for the situation awareness 2, and for the ease of use 2-computed here using the cross-weighting technique.

Desktop Application: Results Dataset and Significant Effect of Order on Scores
We first needed to check on our 56-entries desktop application dataset the potential effects of the order-the number of the personal session when a user performed (1 or 2), or of the scenario-the set of hazards to be identified-factors on our quality factor and usability scores. Indeed, as our main objective was to study the interaction technique factor effect, we aimed to avoid having any significant effect of other factors. To do this analysis, an ANOVA could be run on our data with the interaction technique, order, and scenario factors: we started thus by verifying the required ANOVA assumptions. First, we applied the Shapiro-Wilk's test on our series of data-the four quality factor scores and the four global usability scores. This test returned p-values higher than a level of significance of α = 0.01 or of α = 0.05 depending on the series, which ensured that our data followed a normal distribution for all our scores. Then, we applied the Levene's test: it returned p-values higher than a level of significance of α = 0.01 or of α = 0.05 depending on the series, which showed the homogeneity of the variance between the different possibilities of interaction and scenario combination for all the users. Finally, our three observed factors were independent by their nature-the navigation interaction techniques, the session order and the scenarios.
Therefore, we ran an ANOVA for the three aforementioned factors on all our scores, and Table 2 shows the p-values obtained for the scenario and order factors. Interaction effects between the different factors, which are not displayed here, were not significant on a level of significance of α = 0.05. First, it appears that the scenario factor has not had a significant effect on any scores. However, we can observe a significant effect of the order factor on several scores results with p-values inferior to a level of significance of α = 0.05. This is because participants improved significantly their scores in their second session, no matters the interaction technique used. Thereupon, we ran again an ANOVA on the scores obtained by the participants during the first use of our application, with the interaction technique and scenario factors, to check if there were no residual significant effect of the scenario factor with the order fixed to the first session. Table 3 reveals no significant effects of scenario on our data. As a result, for a cleaner analysis of the interaction technique effect, we decided to use then this 28-entries subset of data with only the users' first session results. Thus, next figures and discussions about the desktop application results would be based on this "between-subjects dataset". For our HMD application dataset, we also needed to check the potential effects of the order and the scenario factors on our scores, by running an ANOVA with these two factors and the interaction technique one. Before that, we checked if our data respected the required ANOVA assumptions. First, our data followed a normal distribution for each of our data series, according to the Shapiro-Wilk's test that returned for all the series a p-value higher than a level of significance of α = 0.05. About the sphericity of the data, we checked it graphically by observing similar variances for each group. Indeed, the Mauchly's test can only be run when there are more than two levels for the factors, and with only two levelsour case here-the literature said that sphericity necessarily holds for effects. Finally, the three observed factors were independent by their nature.
Then, after running this ANOVA with these three factors on our scores, no significant effect of the scenario of hazardous situations or of the session order factors have been detected, as shown in Table 4. As a result, the results shown and analysed in the next figures and discussions would take into account the scores present in the complete withinsubject HMD application dataset.

Desktop Application Results
First, Figure 9 show the users' scores results for each individual quality factor, from left to right: the rapidity, the information gathering, the situation awareness and the ease of use. Next, Figure 10 shows the usability scores results that we computed with the four different ways that we described, from left to right: the scores without using weights, then the ones using the weights given by the users with the simple-weighting technique, next the ones that rely on the cross-weighting technique used by the users, and finally the ones with the expert weighting.
Doing a brief visual analysis, it can be noticed in Figure 9 that it seems that the participants performed better in terms of rapidity with the direct navigation interaction technique than with the indirect one. Moreover, it seems to be the same trend with the information gathering and situation awareness scores. Finally, about the ease of use, it seems to be the opposite trend, more moderately, with better scores with the indirect technique. About the global usability scores, it seems that Figure 10 shows that the usability was better with the direct mode for the unweighted scores, and the simple weighted scores. For the cross-weighted and expert-weighting scores, this trend does not appear so clearly; the usability might be similar with the two interaction techniques.

HMD Application Results
First, Figure 11 shows the distribution of the results obtained for, from left to right, the rapidity, information gathering, situation awareness, and ease of use quality factor scores. Next, the results for our four computation of usability scores can be observed in Figure 12, from left to right: the scores without using weights, then the ones using the weights given by the users with the simple-weighting and the cross-weighting techniques, and finally the ones with the "expert" weighting.
Doing a brief visual analysis, in Figure 11 it seems that the participants performed better in terms of rapidity with the direct navigation interaction technique. Then, it seems to be the same trend with the information gathering scores and the situation awareness scores. Regarding the ease of use, it tends to be the opposite trend with better scores with the indirect technique. Finally, about the usability scores, Figure 12 may show that the usability was better with the direct technique, for the unweighted scores on the left. However, for the other (weighted) usability scores present in the figure, this trend tends to not be as pronounced as with the unweighted ones.

Statistical Analysis on Desktop Application Usability Scores and Discussion
For our desktop application dataset, Table 5 (on the top) shows the p-values obtained from the two-way ANOVA conducted on all our quality factor scores, considering the interaction technique and scenario factors, and Table 6 (on the bottom) shows the ones obtained for all our usability scores. Scenario p-values revealed no significant effect as shown previously in Table 3. Focusing then on the interaction technique factor, Table 5 (on the top) shows its related p-values for the rapidity, information gathering, situation awareness and ease of use scores, and Table 6 (on the bottom) for the four usability scores.

Scores int. tech. Factor p-Value
Unweighted usability 0.048 * Simple-w. usability 0.032 * Cross-w. usability 0.090 "Expert" usability 0.085 This statistical analysis confirmed the results observed previously in the figures. First, the interaction technique factor does have a significant effect on the rapidity, information gathering and situation awareness scores with p-values of 0.006, 0.002, and 0.013, respectively, on a level of significance of α = 0.05. By looking at the results, the best scores for these quality factors are with the direct interaction technique. Moreover, there is no significant effect of the interaction technique on the "ease of use" scores, even if a subtle trend can be noticed with better scores with the indirect technique.
Then, this analysis shows that the interaction technique factor does have a significant effect on the unweighted usability score with a p-value of 0.048 on a level of significance of α = 0.05. With the simple-weighting, this is the same, with a p-value of 0.032 on a level of significance of α = 0.05. However, with the cross-weighting usability scores, both with the weighting defined by the users and the expert one, there is no significant effect of the interaction technique with p-values of 0.090 and 0.085, respectively, on a level of significance of α = 0.05. These last results tend to confirm our hypothesis about the effect of our task-centred approach on the interaction technique usability results. Indeed, the interpretation of the usability results for our two navigation interaction techniques with weighted scores is different and opposed to the one done for the unweighted scores results.
In total, with this desktop application, two over three weighted computations led to an interpretation that is different compared to the one with the unweighted computation, as expected from our hypothesis. Nonetheless, due to this unexpected result with the computation from the simple-weighting technique, some future research should be done on the techniques used to define the quality factors weights. In this experiment, such differences could be explained by the fact that, even if it may be more complex for the users, the cross-weighting technique allowed for a weighting distribution by the users that was closer to the expert one than with the simple-weighting. Indeed, in our experiment, it was expected for the participants to attach the greatest importance to the information gathering quality factor, since this is the most important quality factor for this task of exploration and identification, by definition. However, recomputing the weights with a total of 12 points in all the cases for both weighting techniques, on average the participants gave more importance to this quality factor with the cross-weighting technique-average: 5.20-than with the simple-weighting-average: 4.16. Similarly, it was expected to obtain the lowest possible weights for the least important quality factor for this task, the rapidity; once again, on average our users gave less importance to it with the cross-weighting technique-average: 1.15-than with the simple-weighting technique-average: 1.67.

Statistical Analysis on HMD Application Usability Scores and Discussion
For our HMD application dataset, Table 7 (on the top) shows the p-values obtained from our three-way ANOVA on all our quality factor scores, considering the interaction technique, order and scenario factors, and Table 8 (on the bottom) shows the ones obtained for all our usability scores. Order and scenario p-values revealed no significant effect as shown previously in Table 4. Table 7 (on the top) shows the interaction technique p-values for the rapidity, information gathering, situation awareness and ease of use scores, and Table 8 (on the bottom) for the four usability scores. This statistical analysis confirmed the results observed graphically on the previous figures. First, the interaction technique factor does have a significant effect on the rapidity, information gathering, situation awareness and ease of use scores with p-values of 0.043, 0.0002, 0.0006, and 0.0009, respectively, on a level of significance of α = 0.05. Looking at the scores, the best ones are with the direct interaction technique for the rapidity, information gathering, and situation awareness quality factors, whereas for the ease of use the best scores are with the indirect interaction technique.
Then, the analysis shows that the interaction technique factor does have a significant effect on the unweighted usability score with a p-value of 0.048 on a level of significance of α = 0.05. However, with all our three different weighted scores proposed here-the simple-weighting, the cross-weighting by the users, and the expert weighting-there is not a significant effect of the interaction technique factor on the usability scores, with p-values of 0.200, 0.638, and 0.114, respectively, on a level of significance of α = 0.05. This means that with the HMD application, in our user study three over three weighted techniques gave opposite results and interpretations for the evaluated interaction techniques, compared to the unweighted evaluation. These results confirm our hypothesis about the effect of our task-centred approach on the interaction technique usability results and their interpretation, and its difference with a non-task-centred approach.

Conclusions and Future Work
Virtual reality simulations are paramount in the workflow of numerical engineering, before physical actions in reality such as on a worksite in the case of the AEC industry. In this context, we presented in this paper our task-centred methodology to design and evaluate virtual reality user interactions. To improve this numerical workflow, we created this methodology to guide and facilitate the design of VR applications by VR end users. Indeed, our methodology provides them with the benefit of a formalised approach for the design of VR user interactions, which they can follow without having VR expertise. Through a guided analysis of a user task and its characteristics, a professional in the field to which this task belongs can obtain proposals of VR user interaction designs with our semi-automated system in very few steps. About the evaluation of these VR user interaction designs, our task-centred approach allows for the formalisation of the users' needs, which improves as a result the quality of the VR user interaction designs in terms of usability. Moreover, this formalisation should also help the VR end users to reduce the number of required iterations when creating a VR application for their specific purposes.
After that, we showed in this paper how we applied our task-centred methodology on the case study of the hazard identification task: construction of a model of the task, de-termination of proposals for the design of VR user interaction, and user evaluation. For the subtask of worksite exploration, we obtained two possible designs of navigation interaction techniques. These two interaction techniques could then be prototyped, evaluated and compared in terms of usability in our evaluation step, which mainly relies on applying a task-related weight on each quality factor used in the current evaluation. In this study, we proposed to compare the usability results and interpretations of our task-centred evaluation with weights, with the ones without applying such different weights and letting all the quality factors at the same level of importance. Our hypothesis was that our task-centred evaluation based on weights would give usability results that could be interpreted in a significantly different way in comparison with a non-task-centred evaluation without any task-related weights.
To run this study and its related user evaluation, we built VR applications with our two different navigation interaction techniques. Due to the sanitary crisis context, we conducted this experiment remotely and our users participated through remote modalities-either a desktop application or an HMD application. We took different measures for each quality factor about this navigation interaction, and then we computed both weighted usability scores-with three different formulas-and unweighted ones. Considering both our desktop and HMD modalities, five over six weighted usability scores computed here give an interpretation of the results that is opposed to the one obtained using unweighted usability scores. Indeed, our unweighted scores show a significantly different usability for our two navigation interaction techniques, whereas our weighted scores reveal no significant differences in terms of usability. These results verified thus our hypothesis about the difference between task-centred and non-task-centred evaluations of VR user interactions. This validates the importance of following an approach that takes into account the expertise of the professionals in the field to which a task belongs to create a VR application for performing this task, and, therefore, our task-centred approach is validated.
As future work, it would be interesting first to test and evaluate in a user study the different ways that we used to obtain the quality factors weights, since one way led to one weighted computation with different results than our other five ones. Indeed, a dedicated experiment could be conducted on the evaluation of the weighting procedures themselves-simple and cross weightings-with, for example, some criteria of usability, of understanding how to attribute weights, etc. This would be an important verification in order to know which weighting technique should preferably be used in a task-centred evaluation. Then, our results about our hypothesis should be confirmed by running similar experiments on other case studies with different tasks, interaction techniques and quality factors. Additionally, thanks to these other experiments, a machine learning model could be trained with the aim of attributing automatically the quality factors weights according to the user task. Finally, for on-site applications, which are based on the augmented reality technology instead of the virtual reality technology, our task-centred methodology could be adapted for augmented reality interactions, and studies could be conducted applying this modified methodology.  Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are openly available at https://github. com/pierreraimbaud/webp/tree/master/data (accessed on 30 April 2021).