3. Methodology and Experiment Setup
In this section, six experiments were conducted to evaluate the usability of the VR system using the evaluation framework originally introduced in [
27]. The framework specifies a five-step procedure for conducting VR usability tests in product development contexts:
First, the objective of the experiment and the tasks to be performed must be defined, such as conducting ergonomic evaluations or design reviews.
Second, the designated circumstances for the target group must be specified. In this step, the characteristics of the participants are defined according to factors such as level of experience, age, gender, familiarity with digital tools, professional background, number of participants, test location, and the hardware used.
Third, the specific circumstances of the actual test participants are recorded. These are later compared with the predefined target conditions in order to identify and quantify any deviations.
Fourth, the usability test is executed. During this phase, two inspectors per experiment complete the inspection questionnaire (objectively measurable criteria), while the immersed participants complete the empirical questionnaire (criteria based on subjective perception).
Finally, the responses of the participants are evaluated by analyzing the seven usability factors together with the results of the TLX and the SUS.
Across the six experiments, several common experimental conditions were maintained. A single VR software application was used throughout the project; however, it was continuously improved through a series of updated versions, with each updated version being implemented and evaluated in a subsequent experiment. Although the software evolved across experiments, each experiment was conducted with a distinct group of participants, and no participant was involved in more than one experiment. Consequently, individual learning effects due to repeated exposure can be excluded. However, given the exploratory multi-case design of this study, observed differences across experiments reflect the combined impact of system configurations, user backgrounds, and contextual tasks, rather than a single isolated factor.
The proportion of discovered usability problems in each experiment was calculated using Equation (2), which estimates the number of problems P identified by the participating users. The reported percentages represent theoretical estimates of expected usability-problem discovery coverage based on the Nielsen–Landauer model and an assumed problem-discovery probability of p = 0.31. These values do not represent empirically verified proportions of all existing usability problems.
3.1. Hardware Configuration
In addition, two types of hardware were used in the experiments: a standalone headset (H1) and a PC-based headset (H2). (
Table 4) provides a comparison of the two hardware systems used.
3.2. Participant Framework
Target participants for this study were divided into three groups:
U1: bachelor’s students in mechanical engineering without practical experience.
U2: junior engineers with early industry experience.
U3: senior engineers from an industrial development team.
To ensure methodological rigor and transparency, the following details outline participant demographics, recruitment procedures, eligibility criteria, and experimental standardization:
Participant Demographics and Data Privacy: In compliance with institutional data privacy protocols and ethical guidelines to maximize participant anonymity, specific individual demographic characteristics (such as exact age and gender) were not recorded, as they fell outside the primary scope of this usability evaluation. Instead, the data collection focused strictly on professional and technical backgrounds, specifically, digital tool and VR experience, which are reported per cohort in
Table 5.
Institutional Review Board Statement: The study was conducted in accordance with the Declaration of Helsinki and in accordance with the ethical guidelines of Ostfalia University of Applied Sciences. The study was non-invasive and used anonymized data, so formal ethics board approval was waived. According to Ostfalia University of Applied Sciences’ regulations, formal review by the Ethics Committee was not required in this case, as the research did not affect a person’s physical, mental, social or legal integrity.
Recruitment Procedures: Participants were recruited via purposive sampling across three distinct streams: undergraduate and graduate engineering students at our university, professional design and engineering teams from the primary industrial partner, and researchers from an international partner consortium. All participation was entirely voluntary and conducted under informed consent.
Inclusion and Exclusion Criteria: The primary inclusion criterion required participants to have a documented background in engineering, design, or product development relevant to the specific evaluation contexts. Conversely, the explicit exclusion criterion was a known predisposition to severe motion sickness, ensuring participant safety during immersion.
Task Standardization, Familiarization, and Duration: To mitigate variance in baseline VR literacy, a standardized, self-paced familiarization phase was implemented prior to testing. Participants were granted unconstrained time to adapt to the hardware interface without external time pressure, ensuring they achieved operational mastery before commencing the formal evaluation. Consequently, task durations varied dynamically based on individual user pacing rather than rigid time limits.
Due to the small and uneven sample sizes across the experimental groups, the comparisons between test groups are presented descriptively and should be interpreted as indicative rather than statistically conclusive. The comparison between students, junior engineers, and senior engineers was intentionally designed to reflect the different experience levels present in the industrial context. These groups represent the main stakeholder categories within the company, including working students, early-career engineers, highly experienced professionals and international distributed teams. The purpose of this grouping was not to establish statistically significant differences, but rather to explore how users with varying levels of expertise interact with and respond to the introduction of new technologies such as VR in our case. This approach provides practical insights into the usability and potential adoption of the system across the spectrum of potential users within the organization.
3.3. Tasks
The tasks performed in the experiments involved testing various VR application scenarios, such as ergonomic evaluations and design reviews. Each experiment aimed to optimize a specific phase or task within the product development process through the use of VR. Beyond that, the tasks were designed as domain-specific instances of comparable design and engineering activities rather than fundamentally different tasks. Each participant group performed tasks that reflect their real-world professional or educational context, thereby ensuring ecological validity. Specifically, students in the ergonomics design program carried out tasks aligned with their coursework, with which they were already familiar through prior use of CAD tools. Likewise, participants from the rail manufacturing sector engaged in cable routing tasks corresponding to their routine work, while professionals from the cutting machine industry performed tasks derived from their operational processes. In all cases, these tasks had previously been conducted using CAD systems and were adapted for execution within the VR environment investigated in this study. Despite differences in application domains, the underlying interaction principles, workflows, and evaluation framework were kept consistent across all experiments. Time limits to complete the tasks were applied only to the participants in Experiment VI, as they were bachelor’s students and the testing window was strictly restricted to their scheduled lecture time.
3.4. Experiments
In this section, each experiment and its setup are reviewed in order to clarify the respective objectives and the tested scenarios. Furthermore, the section explains how these objectives are achieved through planned tasks derived from product development activities and executed within the VR environment. All experiments followed a similar overall procedure consisting of participant introduction, system familiarization, task execution, and post-task evaluation. However, the duration and depth of the introduction and familiarization phases varied depending on the participants’ prior experience and the specific experimental context.
Participant background information, including prior experience with digital tools and VR systems, was collected prior to each experiment using a 5-point Likert scale, where 1 indicates very low experience and 5 indicates very high experience. The results are summarized in
Table 5. For each group, the average experience score was calculated using a weighted mean approach. Specifically, each scale value was multiplied by the number of corresponding responses, and the sum of these products was divided by the total number of responses. This method provides a representative mean score for each group, allowing comparison across experiments.
Participant experience with digital and VR tools was recorded to characterize the background and contextual conditions of the participant groups rather than to serve as a control variable in the usability analysis.
The purpose of collecting these data was to verify that the participant groups were reasonably comparable and representative of typical user profiles. In cases where substantial deviations between expected and recorded experience levels would have been observed, exclusion of the respective group would have been considered. However, the recorded values indicated that all groups exhibited experience levels within an acceptable and comparable range. Therefore, no group was excluded, and the variability in user experience was retained. This approach supports the objective of evaluating system usability under realistic conditions, where users may have heterogeneous backgrounds and varying levels of familiarity with digital and VR tools.
For experiments integrated into academic coursework, participation in the usability evaluation and questionnaire components remained voluntary and did not influence course grading. Students were informed that they could withdraw from the study at any time without academic disadvantage. Ethical review procedures were conducted in accordance with institutional requirements, and no personally identifiable participant data were collected.
The objective of this experiment was to conduct a design review of a timekeeper, a cyber-physical device used for measuring disruption times, with dimensions of 10 cm × 10 cm (
Figure 2). The purpose of the design review was to verify the design accuracy, support quality assurance, and improve traceability for future development iterations within the product development process.
To achieve this objective, participants performed several review tasks in the virtual environment. These tasks included dimensional measurements, geometric validation, assessment of surface characteristics and side counts, and systematic documentation of identified findings using a digital checklist workflow integrated into the VR system.
The experiment was conducted using the developed first version of the VR software and involved 23 participants divided into two groups based on their institutional affiliation (Ostfalia University of Applied Sciences, Germany, and Tshwane University of Technology, South Africa). Both groups used a PC-connected VR headset (H2).
Before starting the experiment, participants received a short introduction to the VR system and its interaction functionalities. They were then given a brief familiarization phase to practice navigation and object interaction in the virtual environment. After this introduction, participants performed the assigned design review tasks individually.
During the experiment, participants used the VR system to inspect the virtual model of the timekeeper and identify potential design issues according to the defined checklist. The identified problems were recorded and later analyzed to determine the number of usability or design-related issues detected by the participants.
The objective of this experiment was to evaluate the ergonomic and functional design of a train interior (
Figure 3) using VR system. The primary goal of the design review was to identify potential improvements related to passenger comfort, accessibility, and spatial efficiency in future train cabin configurations.
To achieve this objective, participants interacted with a virtual model of the train interior and performed several evaluation tasks designed to simulate typical passenger activities and spatial interactions. The evaluation focused on key aspects of interior usability, including passenger comfort and available movement space, compatibility of storage areas with passengers’ personal belongings, clarity of user orientation within the cabin, aesthetic perception of the environment, as well as safety and accessibility during passenger movement.
The experiment was conducted using an updated version of the VR software application, in which the usability issues identified during experiment I had been addressed and corrected. The improved software version aimed to enhance interaction efficiency, navigation stability, and measurement accuracy. The VR environment was operated using the PC-based headset (H2).
A total of 10 participants took part in the experiment. All participants were bachelor’s students enrolled in the Ergonomics and Industrial Design course, representing users with theoretical knowledge of ergonomic principles but limited professional experience.
Before starting the experiment, participants received a short introduction to the VR system and were given time to familiarize themselves with the navigation and interaction methods. After this familiarization phase, each participant individually explored the virtual train interior and performed the assigned evaluation tasks.
During the session, participants inspected the spatial configuration of the interior, assessed comfort and movement possibilities, and identified potential design limitations. Observations and identified issues were documented using the integrated digital checklist system within the VR environment. The collected data were later analyzed to evaluate the usability of the VR-based ergonomic assessment approach.
The objective of this experiment was to conduct a collaborative design review of a cutting machine (
Figure 4) using the multi-user functionality of the developed VR system. The purpose of this test was to evaluate how effectively development teams could perform design reviews within separate teams.
The review session was conducted as an online multi-user VR meeting, where several participants were connected simultaneously to the same virtual session. This setup allowed participants to collaboratively inspect the machine model, discuss design aspects in real time, and identify potential design improvements.
The evaluation focused on several important aspects related to the machine’s operational and ergonomic performance. These included the analysis of material flow and operator ergonomics, the inspection of safety mechanisms and guarding elements, the accessibility of maintenance components, and the evaluation of loading and operational procedures during machine use.
The experiment utilized a further updated version of the VR software, building upon the improvements implemented in experiment II. This version additionally incorporated multi-user communication and synchronized interaction features, enabling real-time collaboration between participants. The system was operated using the PC-based headset (H2).
A total of five participants took part in the experiment. All participants were engineers from a product development team in a company specializing in freezer-cutting machines. Their professional background provided practical industry experience relevant to machine design, safety evaluation, and production processes.
Before the collaborative review began, participants received a short briefing on the VR system and the multi-user interaction features. Once connected to the virtual environment, participants explored the cutting machine model, discussed design elements, and identified potential issues related to ergonomics, safety, and operational efficiency.
Throughout the session, identified findings were documented and later analyzed to assess the effectiveness of the VR-based collaborative design review process within an industrial development context.
The objective of this experiment was to evaluate the routing of cables on the roof of a regional train using the developed VR system. The purpose of this assessment was to analyze the cable layout and identify potential spatial limitations while ensuring compliance with technical requirements such as minimum bending radii and safe separation distances between cables and surrounding components.
To achieve this objective, participants interacted with a virtual model of the train roof assembly containing the cable routing configuration. The evaluation focused on several key aspects of the installation and maintenance process. These included verifying that the cables could be installed smoothly without physical obstructions, confirming that assembly and maintenance procedures could be performed practically, preventing overcrowding within the cable routing paths, and ensuring compliance with relevant safety and technical standards.
The experiment was conducted using a developed version of the VR software, which incorporated additional improvements based on feedback and observations from the previous experiments. These improvements primarily focused on enhancing interaction stability, learnability, and the visualization of complex assemblies. The VR environment was operated using the PC-based headset (H2).
A total of five participants took part in the experiment. All participants were engineers from a product development team in a company operating in the railway industry, providing practical experience in train system design and technical installation processes.
Before starting the experiment, participants received a brief introduction to the VR system. However, they did not engage in any prior practice interaction because the available time for the employees was insufficient. After the introduction, participants individually inspected the cable routing configuration within the virtual environment. During the inspection, they assessed spatial feasibility, installation accessibility, and potential design issues related to safety and maintenance. Identified findings were documented and later analyzed to evaluate the effectiveness of the VR-based cable routing assessment approach.
The objective of this experiment was to investigate how VR tools can support and enhance design evaluation processes, particularly for assessing ergonomic and spatial characteristics of a train interior (
Figure 5). The objective was to determine how effectively VR could be used to identify design issues related to accessibility, passenger interaction, and spatial configuration.
Participants interacted with a detailed virtual representation of the train interior and performed several evaluation tasks designed to simulate realistic user interactions within the cabin environment. The evaluation focused on multiple aspects of the design, including the visual inspection of internal components and their accessibility, passenger movement and safety considerations, accurate dimensional analysis of the interior space, inspection of internal structural elements, and assessment of storage requirements and passenger behavior patterns.
The experiment utilized a modified version of the VR software developed in the previous experiment, incorporating additional improvements to visualization and interaction functions. Unlike the previous experiments, the system was operated using a standalone VR headset (H1), enabling a more flexible and portable VR setup.
A total of 40 participants took part in the study. The participants were junior engineers with limited industry experience, yet they possessed a solid background and familiarity with engineering design concepts.
Before starting the experiment, participants received a short introduction to the VR system and were allowed time to familiarize themselves with the navigation and interaction mechanisms. After this training phase, each participant individually explored the virtual train interior and completed the assigned evaluation tasks.
During the experiment, participants inspected the design from different viewpoints, assessed ergonomic aspects of the interior layout, and identified potential design improvements. Observations and identified issues were recorded using the integrated documentation tools of the VR system.
The objective of this experiment was similar to that of experiment V, focusing on the evaluation of ergonomic and spatial aspects of a train interior using the VR system (
Figure 6). However, in this case the tasks were designed to be more complex and rigorous, providing a deeper assessment of the participants’ ability to identify design issues within the virtual environment.
Participants were required to perform the same evaluation activities as in the previous experiment, including visual inspection of internal components, assessment of passenger movement and safety, dimensional analysis, structural inspection, and evaluation of storage requirements. The increased level of difficulty resulted from the fact that the experiment was conducted as part of a graded academic course requirement, requiring participants to perform a more detailed and systematic evaluation.
The experiment was conducted using a new version of the VR software, which incorporated additional refinements to improve usability and interaction performance. The system was again operated using the standalone VR headset (H1).
A total of ten participants took part in the experiment. The participants were bachelor’s students enrolled in the Ergonomics and Industrial Design course, representing users with foundational knowledge of ergonomic analysis and product evaluation.
Before beginning the evaluation tasks, participants received an introduction to the VR system and completed a short familiarization session. They then individually performed the assigned tasks within the virtual train interior model.
During the experiment, participants analyzed the design in detail and documented any detected issues related to ergonomics, accessibility, spatial arrangement, or structural design.
An overview of the experiments, the tested scenarios, and the objectives of the evaluations is presented in
Table 6.
4. Analysis of the Participants’ Responses
In this section, all experiments and their results are analyzed and discussed in detail. The objective of this section is to understand how each usability factor affects the overall acceptance of the system. In this analysis, the usability factors are the usability dimensions, SUS and TLX. To evaluate the system usability across the heterogeneous experimental configurations, usability metrics were synthesized using descriptive statistics, specifically calculating the arithmetic mean and standard deviation (σ) for each experiment. The calculated means provide an indication of central tendency for usability values within each experiment, while the standard deviations serve as essential indicators of descriptive uncertainty and intra-experiment variance. The complete distribution of these metrics across all experimental cohorts is documented in
Table 7, establishing a transparent basis for context-specific observations without implying generalizable causal rankings.
The first experiment investigated the usability of the VR system in a cross-cultural context, involving junior engineers from two countries. The objective of this evaluation is to examine how cultural differences between teams influence the usability of the VR system for both inspectors and actively involved users. A total of 23 engineers participated in the usability test, conducted in collaboration with Ostfalia University of Applied Sciences (Germany) and Tshwane University of Technology (South Africa).
The calculated usability degree exhibited substantial variability (Mean = 51.7, σ = 19.8), indicating considerable dispersion in the observed usability outcomes, with a notably low score in the learnability dimension, indicating challenges in system understanding. The mean SUS-item ratings were comparable between universities, averaging 3.17 and 3.05, suggesting a consistent perception of usability across cultural groups. These findings indicate that, although the system was generally usable, users experienced difficulties in quickly understanding and learning how to interact with the software effectively.
During the inspection, both participant groups provided largely consistent responses, indicating a high degree of objectivity in the evaluation process. The analysis confirmed that the VR system provided the core functionalities required to perform the assigned design review task, including object grouping, model scaling for inspection, and the selection of individual components within the virtual environment.
Despite this, the inspection phase also revealed several limitations. Specifically, the lack of supporting features such as error feedback and recovery instructions was identified as an area requiring improvement.
In contrast to the inspection results, the empirical survey revealed noticeable differences between the two participant groups. User responses showed variations in user interactions, particularly regarding task completeness. While participants from one group generally considered the available functions sufficient for the assigned tasks, participants from the other group indicated the need for additional features. These differences may be partially attributed to variations in prior experience with VR technologies, which in turn influence user expectations and evaluation criteria.
The TLX results further highlighted differences in user opinions, particularly with regard to overall satisfaction. While some participants reported satisfaction with their performance, others expressed lower levels of satisfaction due to challenges in system interaction.
In summary, the observations of experiment I indicate that the used VR system provides the essential functionality required for design review tasks and is generally perceived as usable across different cultural contexts. However, limitations related to system learnability, task appropriateness, and self-description of the software were identified. These findings highlight the need for improvements in interface design and user support mechanisms to enhance usability and ensure a more consistent user experience across various user groups.
The second usability experiment focused on how users interacted with the VR system during their first encounter with a set of predefined, product-related tasks. The main aim was to examine the initial user experience, with particular emphasis on usability, interaction behavior, and perceived workload, rather than on task efficiency or complete functional performance. The study involved students enrolled in an ergonomics course who completed structured tasks in a virtual train model. These tasks included object inspection, taking measurements, navigating the environment, and using basic interaction functions.
Overall, the usability evaluation indicated a higher level of usability than in the first experiment with an average score of 62.3%. The usability degree showed considerable variability (M = 62.3, SD = 20.3), indicating a wide spread in the observed usability values across this experimental setting. The mean SUS-item rating resulted in a mean value of 3.18 on a five-point scale, suggesting a generally acceptable usability perception among participants. While users were able to complete the assigned tasks successfully, several usability aspects revealed opportunities for improvement, particularly regarding system learnability and the clarity of certain interaction mechanisms. Analysis of the responses showed that several core interaction functions were clearly recognized by the participants. Object selection and manipulation were identified as available and simple, suggesting that the software supports the required interaction tasks. In addition, users perceived the system response positively because they experienced immediate feedback following their actions. However, other functional aspects revealed noticeable uncertainty among users. Features, such as error handling or object grouping, were not clearly recognized by many participants, and a number of users reported difficulties in evaluating these features. This suggests that such features were either not sufficiently visible within the interface or were not required during the assigned tasks. Similarly, system status information, such as battery status, was not noticed, indicating limitations in interface transparency.
The evaluation of usability dimensions based on established principles revealed mixed results across all categories. Task suitability was generally perceived positively, particularly with regard to the availability of relevant functions for completing the assigned tasks. However, supporting elements such as error messages or contextual help were considered less effective, indicating a need for improved user guidance during task execution. In terms of expectation conformity, the interface design was largely perceived as unintuitive. Menu structures and visual elements did not align with user expectations, and several graphical representations were difficult to understand. Learnability emerged as one of the weaker aspects of the system. Many participants experienced difficulties identifying features related to system guidance or preview functions. Visual orientation within the interface was not consistently clear, indicating that first-time users may require additional instructional support or guided interaction mechanisms. Similarly, error tolerance and system controllability were not clearly perceived by users. Functions such as ‘undo’ or alternative input methods were not widely recognized, suggesting that these features were either insufficiently communicated or not encountered during the experimental tasks. Despite these limitations, several usability aspects received positive feedback. The system was generally perceived as self-descriptive, with users reporting a clear sense of control during interaction and an adequate understanding of icons. Furthermore, user engagement was notably high, as participants expressed a positive initial impression of the software. The results of the NASA TLX indicated a manageable level of workload during task execution. Participants described the tasks as moderately demanding, primarily due to the novelty of the VR environment and unfamiliar interaction techniques. Physical workload was perceived as low, and time pressure was considered appropriate for the experimental setup. Emotional responses varied, with some users reporting satisfaction and a sense of accomplishment, while others experienced temporary uncertainty, particularly when interacting with unfamiliar system features.
In summary, the participants reported in experiment II that the VR system provides a generally positive first user experience with moderate usability and manageable workload. Core interaction functions performed effectively and supported task completion. However, several usability aspects, including learnability, expectation conformity, and error tolerance, require further optimization to improve overall usability and reduce uncertainty for users with limited practical experience.
The third experiment investigated the usability of the VR system within a real industrial context, focusing on a multi-user design review of a cutting machine. The evaluation was conducted with experienced engineers and emphasized collaborative interaction, technical inspection, and ergonomic assessment, like reachability aspects, within a virtual environment. The targeted outcomes of this experiment focus on evaluating the usability of the VR system from the perspectives of only active users.
Overall, the findings suggest that the VR system is generally usable. The calculated usability degrees were comparatively consistent (M = 50.3, SD = 7.1), indicating low dispersion in the observed usability values. The mean SUS-item rating results indicate a moderate usability score of 2.77. The system was perceived as relatively simple to operate, with users indicating that most functions could be learned quickly. At the same time, the willingness to use the system regularly was rated higher, suggesting that further improvements are required to achieve long-term adoption.
The empirical evaluation identified several missing or insufficiently implemented software features, particularly in the areas of learnability and error tolerance. Key shortcomings included the absence of flexible error message handling, lack of visible system status indicators (e.g., controller status), missing diagnostic tools, and the absence of ‘undo’ functionality. In addition, the software did not provide clear previews of actions or sufficient visual cues to indicate menu hierarchy levels. These limitations negatively affect the transparency of the system and increase the cognitive effort required for task execution.
The empirical questionnaire results were limited due to the small number of responses. Although five participants took part in the experiment, only four participants completed the survey. The usability metrics were calculated in accordance with the defined methodology based on these four valid responses. The fifth participant’s data were incorporated where possible, based on the questions they answered. However, the usability metrics were calculated in accordance with the defined methodology based on the four valid responses. However, a positive tendency in self-descriptiveness and user engagement was observed, indicating that participants recognized the potential value of the VR system for collaborative engineering tasks. In addition, the CEO of the company stated clearly that they are planning to implement VR in their design-review process with clients, because it provides more clarity and allows clients to familiarize themselves with the machine, especially those who may not be able to understand CAD designs.
The NASA TLX results indicate that the perceived workload during task execution was generally low to moderate. Task complexity was rated between simple and moderately complex, primarily due to limited prior experience with VR systems and insufficient preparation. Physical workload was perceived as low, confirming that interaction with the VR system did not impose significant physical strain. Time pressure was not considered an issue by any participant, with the task pace described as appropriate or even slow.
User satisfaction and perceived performance varied among participants. While some users reported that tasks were easy and understandable, one participant found them more challenging and indicated that their performance could be improved with additional training. Overall effort was rated as low, although one participant reported difficulty in reaching their desired performance level. Emotional responses were mostly positive, with participants generally feeling relaxed; however, minor stress and frustration were reported in relation to technical issues such as audio communication problems and occasional system instability during the multi-user session.
Despite these limitations, participants demonstrated active engagement with the VR system and were able to complete the assigned collaborative tasks. The multi-user functionality enabled effective communication and joint inspection of the virtual model, highlighting the potential of VR for distributed design reviews. At the same time, the identified usability issues such as system feedback, learnability, and technical reliability, indicate that further refinement is necessary to ensure consistent performance in professional environments.
In summary, the observed pattern in experiment III suggests that the VR system is functionally applicable and positively perceived in an industrial multi-user design review scenario. However, improvements in system robustness, user guidance, and feature transparency are required to enhance usability. These aspects will be taken into account in the next version of the software.
The fourth experiment evaluated the application of the VR system within another real industrial design review scenario, focusing on the analysis of cable routing on the roof of a regional train. The objective of this evaluation was to assess system usability in a professional engineering context, with special focus on interaction quality, task support, and user acceptance in comparison to traditional CAD software (in this case CATIA V5). The targeted outcomes of this evaluation involve a detailed analysis of the seven usability dimensions, as the experiment was conducted with an experienced industrial team.
Overall, the results indicate a moderate level of usability. The usability degree exhibited substantial variability (M = 54.6, SD = 19.5), indicating a broad distribution of the observed usability outcomes. The mean SUS-item rating yielded a value of 2.9 on a five-point scale, reflecting a rather critical perception of the system among professional users. Although participants were able to complete the assigned tasks, the results highlight several usability limitations that negatively affected efficiency, intuitiveness, and overall acceptance.
From a functional perspective, the system showed strong capabilities in visualization and object interaction. However, ideal precision was not achievable with the available headset at the time. The participants were development engineers who conducted a design review in VR, following their usual review practices. Their feedback was strongly influenced by comparisons between VR and the CAD software they typically used. Many participants initially resisted the VR technology, citing the need for additional training before adoption. One notable comment from the team leader was:
“If I have to invest more money and time to prepare the workforce and adapt the process to implement a new software that only supports one phase of the process, while I can already perform all tasks with the current software, then I do not need it.”
Participants evaluated system responsiveness and the ability to manipulate and inspect complex geometries within the virtual environment negatively. Users reported insufficient flexibility in object selection and difficulties related to controller input, which reduced interaction efficiency.
The analysis of responses revealed deficiencies in system self-descriptiveness. While basic feedback mechanisms were present, important system states such as controller status or active interaction modes were not continuously visible. This lack of transparency led to uncertainty during task execution, particularly when switching between different tools or interaction modes. In addition, inconsistencies in interaction logic were identified, as users were required to manually deactivate functions before activating new ones, which does not align with typical user expectations.
Evaluation of usability dimensions showed a mixed performance across categories. Task suitability was generally rated as adequate, as the system provided the core functions required for the design review tasks. However, the lack of supporting features, such as advanced measurement tools and precise representation of cable radii, limited the effectiveness of the system for detailed engineering analysis. This technical limitation had a direct negative effect on user trust and perceived reliability of the VR model.
Expectation conformity was only partially fulfilled. While some interface elements, such as menu structures and visual design, were considered understandable, the overall interaction concept was perceived as non-intuitive. Participants indicated that additional training would be required before the system could be effectively integrated into existing workflows.
The learnability of the system was identified as a critical weakness. Although some visual cues, such as color coding and icons, supported user orientation, these were not sufficiently clear or consistent. The absence of preview functions and limited guidance mechanisms made it difficult for users to anticipate the outcome of actions, increasing cognitive effort during task execution.
Similarly, error tolerance and controllability were limited. The system lacked essential features such as undo functionality and diagnostic feedback, restricting users’ ability to recover from mistakes. While most participants were eventually able to perform the required interactions, the process was often inefficient and required additional effort.
User satisfaction results reflect these usability challenges. While participants acknowledged the high potential of VR for immersive visualization and collaborative design reviews, they also emphasized that the system is currently less efficient than conventional CAD tools. Resistance to adoption was observed, particularly from a managerial perspective, where the additional effort required for training and process integration was perceived as a barrier.
The NASA TLX results indicate a moderate workload. Physical demand was generally low, confirming that VR interaction does not impose notable physical strain. However, cognitive load and frustration levels were elevated in some cases, mainly due to interaction difficulties and system limitations. Time pressure was not considered a significant issue.
In summary, the results of experiment IV within their configuration suggest that the VR system offers advantages in terms of visualization and spatial understanding, particularly for large and complex models. However, limitations in usability, interaction design, and technical accuracy could affect user efficiency and acceptance in a professional engineering context. To enable successful integration into industrial workflows, improvements are required in system intuitiveness, feature completeness, and reliability, as well as in reducing the gap between VR and established CAD-based processes.
The fifth experiment aimed to evaluate the VR system in a broader and more diverse user context, with a particular focus on identifying missing functionalities and collecting user-driven recommendations for improving the system. Due to the relatively large number of participants and their varied professional backgrounds across different engineering domains, this experiment emphasized qualitative insights into user needs alongside the assessment of usability across seven dimensions.
A total of 40 participants, junior engineers with practical experience in various industrial departments, took part in the evaluation. This heterogeneous background enabled a comprehensive assessment of the system from multiple professional perspectives, particularly regarding its applicability in real-world engineering tasks.
Overall, the results indicate relatively high usability with moderate variability (M = 68.7, SD = 10.8), suggesting a moderate spread in the observed usability degree, which represents the highest usability rating among all conducted experiments. The mean SUS-item rating resulted in a mean value of 3.0 on a five-point scale, indicating a generally positive perception of the system. Participants were able to complete the assigned tasks effectively, and the system showed improved performance compared to earlier versions. Nevertheless, the primary outcome of this experiment lies in the identification of missing features and improvement potential.
A key result of this study is the identification of 18 missing functions required by users to effectively perform their tasks. These functions were derived from participants’ direct interaction with the system and reflect practical requirements from different engineering domains. In addition, participants proposed 23 recommendations aimed at improving system usability, functionality, and integration into existing workflows. The feedback was notably detailed and critical, reflecting the participants’ technical background and professional experience.
Analysis of the usability dimensions revealed positive performance across most categories. In terms of task suitability, participants confirmed that the system provides the core functionalities required for design evaluation and spatial analysis. However, the absence of several advanced features limited the completeness and efficiency of task execution.
Regarding self-descriptiveness, the system was perceived as understandable, with users generally able to interpret system behavior and interaction outcomes. Nevertheless, some participants indicated that additional guidance and clearer system feedback would further improve usability, particularly for more complex tasks.
The expectation conformity dimension was evaluated positively overall. Interface elements, such as menus and visual structures, were largely consistent with user expectations. However, certain interaction mechanisms still deviated from conventional engineering software workflows, requiring adaptation by the users.
In terms of learnability, the system showed noticeable improvement compared to earlier experiments. Participants were generally able to familiarize themselves with the system within a short period. However, given the complexity of some tasks, additional onboarding support and training features were still considered beneficial.
The evaluation of controllability indicated that users were able to interact with the system and perform the required operations successfully. Interaction with objects and navigation within the virtual environment were generally perceived as manageable, although some users reported minor inefficiencies in control precision.
The error tolerance dimension remained an area with improvement potential. Participants noted the absence of certain features, such as undo functions and error handling mechanisms, which are essential for efficient and confident task execution in professional environments.
Finally, user engagement was rated highly. Participants expressed strong interest in the VR system and recognized its potential for supporting engineering tasks, particularly in visualization and interdisciplinary collaboration. The immersive nature of the system contributed positively to user motivation and acceptance.
The TLX results further support these findings, indicating low perceived workload across cognitive, physical, and temporal dimensions. Participants reported high levels of satisfaction and relatively low effort during task execution, suggesting that the system provides a comfortable and efficient interaction environment despite existing limitations.
In summary, the observations of experiment V suggest that the VR system achieves a high level of usability and user acceptance in this case-specific configuration. The large and varied participant group enabled the identification of a substantial number of missing functions and practical improvement recommendations, which are critical for further system development. While the system performs well across most usability dimensions, targeted enhancements, particularly in feature completeness and error tolerance, are necessary to fully support professional engineering workflows.
The sixth experiment investigates the usability of the VR system under conditions of increased task and time pressure. In this experiment, participants were required to complete predefined tasks within a limited time frame as part of a graded academic activity. The objective of this evaluation is to analyze how time pressure and performance requirements influence usability across the seven defined dimensions, as well as their impact on perceived workload, user satisfaction, and suggested improvements.
A total of ten participants, all bachelor’s students in mechanical engineering, took part in the experiment. Compared to previous experiments, the participants reported a higher level of experience with digital tools and VR systems. This provides a suitable basis to evaluate the system under more demanding conditions. Overall, the results indicate a moderate to good level of usability. Despite the imposed time constraints, participants were partially able to complete the assigned tasks, which shows that the system supports task execution even under pressure.
The analysis of the seven usability dimensions shows generally positive results. The results indicate moderately high usability with noticeable variability (M = 63.6, SD = 13.4), indicating a non-negligible dispersion in the calculated usability values.
Task appropriateness was generally not rated positively. Although participants confirmed that the system provides the necessary functions to complete the tasks, some users indicated that certain functions were missing, which affected the completeness of task execution. Expectation conformity is evaluated positively, since the interface structure, including menus and icons, was generally perceived as clear and understandable. Nevertheless, some participants reported that object manipulation was not fully intuitive, indicating differences between expected and actual interaction behavior.
Self-descriptiveness was predominantly evaluated negatively, as participants reported that they did not feel in control of the interaction and were unable to understand the system’s behavior. In addition, not all users were able to clearly identify the next steps during task execution, which indicates that system guidance is still limited in more complex situations.
Learnability represents one of the weaker dimensions. Participants reported occasional difficulties in understanding system functions, especially in relation to error messages and predictable system responses. These aspects appeared to become more noticeable under time pressure, where additional guidance could help reduce uncertainty.
Controllability is generally sufficient, as users were able to select and manipulate objects within the virtual environment. However, some inconsistencies in interaction precision were observed.
Error tolerance is identified as a weak aspect, since participants reported issues such as missing recovery functions and limited ability to correct mistakes. These limitations negatively influence user confidence, especially in time-constrained scenarios.
User commitment is generally positive, as most participants reported a good first impression and did not perceive the system as overly demanding. However, the perceived efficiency varies, indicating that time pressure influences the interaction performance.
In addition to the usability evaluation, participants provided several suggestions for system improvement. Frequently mentioned aspects include the integration of alternative interaction methods such as hand tracking, as well as the implementation of a tutorial or guided onboarding. Furthermore, improvements in object interaction, such as snapping functions and more interactive elements, were suggested. Participants also highlighted the need for better system adaptability, for example through adjustable user height or automatic detection. Additional features such as object scaling and coloring were also identified as relevant improvements. These suggestions indicate the need for a more intuitive, flexible, and user-adapted system.
The NASA TLX results show that the overall workload is high under time pressure. The time pressure arose from the tasks that had to be completed within the previously agreed timeframe. Cognitive demand is perceived as high due to the need to understand the system during task execution. Physical demand is low, and participants reported no significant physical strain. Time pressure is perceived as manageable, and the effort required to complete the tasks remains relatively low. Some participants reported dissatisfaction with their performance, and some experienced frustration due to unclear interaction elements.
The results of the mean SUS-item rating indicate a generally moderate score of 2.9 on a five-point scale. Participants reported that the system is not particularly easy to use and that its functions are not fully integrated. The system was also considered moderately learnable within a reasonable amount of time. However, some users reported a certain level of complexity and occasional inconsistencies in system behavior. The need for technical support was not dominant, but it was still present in more complex interaction scenarios.
In summary, since the participants were students enrolled in the course and were required to finish their tasks within the scheduled lecture time, the results of experiment VI show that the VR system maintains a stable level of usability under time pressure in this case-specific configuration.
To improve transparency and support the descriptive interpretation of workload-related findings,
Table 8 summarizes the NASA Task Load Index (NASA-TLX) observations across all experiments. Because the experiments differed in participant groups, task contexts, hardware configurations, software versions, and environmental conditions, the reported workload outcomes are presented descriptively rather than as statistically comparable measures. The table provides a structured overview of the perceived cognitive, physical, temporal, and emotional workload dimensions observed within each experimental configuration, allowing the reader to identify case-specific usability patterns without implying inferential or causal relationships between experiments.
(
Table 9) provides a descriptive overview of the six conducted usability experiments, including the software versions, hardware configurations, participant groups, usability degrees, sample sizes, and the theoretically expected usability-problem discovery coverage based on the Nielsen–Landauer model. The table is intended to summarize the heterogeneous experimental configurations and support contextual interpretation of the reported usability findings. Because the experiments differed in hardware, software, participant backgrounds, task contexts, and sample sizes, the reported values should be interpreted descriptively and not as directly comparable inferential results.
5. Discussion
This study set out to explore whether variables such as software, hardware, user background, and context of use affect the usability of VR systems within the product development process. Based on six experiments involving participants with different levels of experience, as well as varied hardware configurations and use cases, the findings indicate that VR provides demonstrable benefits in specific phases of product development, while its effectiveness remains highly context-dependent.
The comparative analysis across experiments II and VI, which involved inexperienced users, and experiments III and IV, which included senior engineers from a development team, showed the role of user background on usability outcomes. Professional development teams were more concerned with technical precision and the integration of VR into existing workflows, resulting in lower acceptance when the system did not fully align with their operational requirements. The results indicate that participants’ professional backgrounds and prior experience can shape their expectations and perceived system needs. Consequently, professional users identified specific deficiencies and missing functionalities, as their feedback was closely tied to the practical requirements of their workplace tasks, for example when relying on a particular CAD software. This highlights the importance of tailoring usability assessments to the users’ operational environments.
In addition, the usability ratings given by the development engineers in the third and fourth experiments were closely aligned. This again suggests that the user background influences both perceived usability and technology acceptance. In these cases, the engineers evaluated the technology more critically in relation to their actual professional needs. When real decision-making and the potential profitability of an investment are at stake, the technology tends to be assessed more rigorously. In contrast, the student participants tended to evaluate the technology based on personal preferences, without considering profitability.
Regarding the software factor, the first experiment employed version 1.70, while the second used a slightly updated version (1.70.3). Usability degree improved in the latter case, reflecting the positive effect of addressing previously identified inefficiencies. A similar pattern was observed in experiments IV and V, where software updates again led to higher usability ratings. These outcomes should be interpreted within the specific experimental configuration and not as evidence of overall system superiority.
The comparison between PC-based and standalone VR systems suggests that hardware configuration influences usability, particularly for inexperienced users. Usability improved slightly with the standalone devices, suggesting that such systems offer greater ease of use and flexibility, particularly for less experienced participants. However, PC-based systems remain necessary for high-precision engineering applications where graphical performance and model complexity are critical. For example, in the specific application of cable routing in Experiment IV, it was found that the resolution in stand-alone systems is insufficient to accurately represent bending radii.
With respect to the use case factor, the TLX results were generally positive across all experiments, indicating low physical and cognitive workload. However, in the final two experiments conducted under identical technical conditions and with the same tasks but differing user roles and objectives, the participants in experiment VI exhibited higher levels of stress and cognitive effort. This was likely because the tasks in experiment VI were performed as part of a formal course assessment, which introduced additional cognitive pressure and performance-related stress. These findings suggest that the specific use case and contextual purpose of the activity can influence user acceptance and perceived task load.
Another finding from the comparison of the reactions of the two leaders in experiments III and IV concerned their openness to adopting new technology. The leader from the medium-sized enterprise (experiment III) was more receptive, whereas the leader from the large enterprise (experiment IV) was more cautious. This observation aligns with the findings of [
22], who reported in
Section 2.2 that medium-sized enterprises tend to be more open to new technologies.
It was reported that, after the first experiment, 99.9% of the usability problems had been identified. This value relates exclusively to the test case carried out there. As a result, further problems may be identified in subsequent experiments due to the changed boundary conditions. However, the objective of the subsequent usability tests was not only to detect problems, but also to optimize the overall usability of the system as well as to evaluate the factors that influence the usability.
It has been shown that the application of a standardized usability evaluation contributes to the continuous improvement of the VR system. The progressive software enhancements are clearly observable and indicate a positive correlation between usability assessments and the iterative development process of the targeted system. This means that the advancements achieved in the software can be directly associated with improvements in usability. This finding underscores the effectiveness of a systematic evaluation approach.
It is important to distinguish the purpose of a usability test. When the primary objective is to identify system errors or to determine the required number of participants, it is recommended to conduct a minimal number of tests using the approach described in
Section 2.3. However, if the objective is the continuous optimization of the system in order to enhance user satisfaction and technology acceptance, it is recommended to conduct iterative usability evaluations. In this case, each testing cycle should incorporate previously identified variables, such as user feedback, and involve new user groups, new scenarios, and updated versions of both the software and hardware.
Across several experiments, participants consistently identified missing functions required for performing domain-specific tasks. This indicates that usability of VR systems is not only determined by interaction quality but also by the completeness of task-relevant features. Particularly in professional environments, the absence of specialized features could reduce perceived usefulness and limit system acceptance, even when the underlying interaction mechanisms function correctly.
The comparison between VR and conventional CAD tools emerged as a recurring theme, particularly among professional engineers. While VR was highly valued for its immersive visualization and spatial understanding of complex assemblies, participants emphasized that traditional CAD systems still provide superior precision, feature depth, and workflow integration. This suggests that VR systems are currently better suited as complementary tools for design reviews and collaborative visualization rather than as direct replacements for established engineering software.
Several experiments revealed that first-time users required additional onboarding and guidance to interact effectively with the system. This was particularly evident in experiment IV, where participants received only a brief introduction without any practical VR familiarization. As a result, some users rejected the system; although this was not the only contributing factor, it was observed as a contributing factor. These findings indicate that training and onboarding procedures are essential for the successful adoption of VR tools in engineering contexts. Systems intended for industrial environments should therefore incorporate guided tutorials or training modules to reduce the initial learning curve and improve overall user acceptance.
In addition to the importance of the onboarding process prior to applying VR, the learnability dimension was evaluated predominantly negatively across most experiments. Although several optimizations were implemented, the system was still perceived as difficult to learn. A likely explanation is that the technology is relatively new and many participants were not yet familiar with it, which naturally increases the initial learning effort. Minor inconsistencies in the interaction design or limited exposure time may also have contributed to this perception.
The collaborative evaluation conducted in experiment III also highlights the potential of VR as a communication platform for distributed teams. Participants reported that the shared virtual environment facilitated discussion and joint model inspection, suggesting that VR can support collaborative decision-making processes in product development.