In recent years, interactive virtual reality (VR) and augmented reality (AR) technologies have experienced a rapid expansion across both scientific and industrial domains. This growth has been fueled by the increasing availability of diverse hardware devices, the development of tools for designing immersive and interactive environments, and the accessibility of high-performance computing platforms capable of supporting real-time simulations. These factors have significantly lowered the barriers to the adoption of VR, enabling its integration into a wide spectrum of applications ranging from scientific visualization and education to product development and training [
1,
2,
3]. In the industrial context in particular, VR has emerged as a versatile tool for enhancing productivity, safety, and efficiency. The recent literature highlights its potential across different sectors, underscoring the role of VR in improving complex workflows and supporting decision-making. Three prominent classes of applications have gained considerable attention. First, assembly training applications are supported by VR to immerse operators in realistic scenarios, allowing them to practice complex procedures without the risks and costs associated with physical prototypes [
4,
5,
6]. Some methodologies of virtual assembly are shared with AR implementations or even use the same devices. In particular, [
7,
8,
9] introduced and refined methods for interactive virtual assembly in augmented and virtual reality, including natural hand interaction and physics-based manipulation. The present study adopts the same interaction logic (e.g., pick, rotate, insert) but scales it to a comparative usability benchmark across multiple commercial headsets. Second, visual inspection tasks have been supported by VR-based solutions that enable detailed examination of components, defect detection, and quality assurance within controlled environments [
10,
11,
12]. And also, medical training represents one of the most promising areas of VR adoption, where immersive simulations support surgical rehearsal, anatomy exploration, and skill acquisition in safe yet highly realistic conditions [
13,
14,
15]. Within this domain, prior work by some of the authors has contributed methodological advances that are directly relevant to the present usability study. Avola et al. [
16] developed an immersive VR endoscopic prototype that captured fine motor actions and task completion times, demonstrating how quantitative performance metrics can be extracted from VR simulations. Avola et al. [
17] introduced VRheab, a motor rehabilitation system based on recurrent neural networks, which relied on repeated user trials and objective logging of movement accuracy. Avola et al. [
18] proposed a low-cost full-body rehabilitation framework using serious games, emphasizing the use of weighted trial averaging to reduce learning and fatigue effects. Although these studies addressed clinical applications, they established practical protocols for collecting and processing user performance data in VR—specifically, the use of multiple repetitions per task and the handling of outlier trials—which we directly adapt in our KPI computation (Equations (1) and (2)). Thus, these self-citations are motivated by methodological continuity, not by mere self-reference. Together, these examples demonstrate the cross-disciplinary relevance of VR for industrial and professional training, while also pointing to its critical role in supporting human-centered tasks. Another rapidly growing domain where interactive VR simulations are gaining impetus is that of industrial digital twins. The coupling of digital twins (virtual replicas of physical assets updated in real time) with immersive VR environments enables not only advanced data visualization but also direct interaction with the digital model for tasks such as monitoring, fault diagnosis, layout validation, and the simulation of complex operational procedures. Several studies have highlighted how VR can serve as a spatial and collaborative interface for digital twins, enhancing the understanding of multiphysics phenomena and facilitating shared decision-making among heterogeneous teams (engineering, production, maintenance). Recent industrial examples further demonstrate concrete applications, ranging from engineering reviews and project validation to solutions enabling remote inspection and maintenance of critical assets through their digital replicas [
19,
20,
21]. Chheang et al. [
19] showed how collaborative virtual reality can support part inspection in additive manufacturing by overlaying digital twins onto physical components. Yang et al. [
20] developed a live digital twin system with VR interfaces for immersive manufacturing monitoring and control. Mahdi et al. [
21] proposed an OPC UA-based architecture for wire arc additive manufacturing, addressing real-time data synchronization between physical and virtual assets. These implementations clearly showcase the potential of VR as an interaction layer for digital twins, while also underscoring the importance of comparative evaluations of hardware in terms of latency, tracking accuracy, and usability factors that ultimately determine the effectiveness of VR–digital twin integration in operational industrial settings.
In this context, three recent works by the authors have established methodological foundations that directly inform the present study. Cellupica et al. [
22] developed an interactive digital twin framework for training users in the use of a sensorized upper-limb prosthesis within a VR environment, introducing a reusable architecture for real-time sensor data integration and haptic feedback. Cirelli et al. [
23] demonstrated a real-time interactive digital twin for structural dynamics, combining reduced-order modeling with augmented reality to allow users to manipulate and interrogate a mechanical component under impulse loads, highlighting the importance of low-latency interaction and physics fidelity. Cirelli et al. [
24] extended this approach to augmented reality for structural digital twins, focusing on real-time visualization and user-driven exploration of reduced-order models. Together, these studies identify key requirements for immersive digital twin interfaces, such as low tracking latency, stable controller interaction, and visual clarity, which directly depend on the choice of VR hardware. The present study therefore builds on these prior works by shifting the focus from feasibility of individual digital twin applications to a systematic, comparative evaluation of commercial VR headsets under controlled, task-oriented conditions. From the literature, different contributions can be identified: reviews discussing architectural requirements and fidelity levels of digital twins; applied studies integrating VR for real-time control and monitoring (e.g., mining or additive manufacturing contexts); and research evaluating the role of immersive interaction in reducing diagnosis time and improving the quality of operational decisions. These works also emphasize practical challenges, including real-time data synchronization, network latency, visualization of large CAD models within VR scenes, and the need for user interfaces that make dynamic twin data explicit, all of which directly affect hardware selection and user experience design. Recent industrial examples further demonstrate concrete applications, ranging from engineering reviews and project validation to solutions enabling remote inspection and maintenance of critical assets through their digital replicas [
25,
26,
27]. These implementations clearly showcase the potential of VR and AR as an interaction layer for digital twins, while also underscoring the importance of comparative evaluations of hardware in terms of latency, tracking accuracy, and usability factors that ultimately determine the effectiveness of VR–digital twin integration in operational industrial settings. Despite these advances, significant challenges remain in the implementation of realistic and interactive VR scenes. Achieving a convincing sense of immersion requires not only high-fidelity graphical rendering but also low-latency interaction, ergonomic usability, and seamless integration of hardware and software. These requirements impose constraints on both device selection and system design, often leading practitioners to choose VR headsets and controllers based primarily on technical specifications such as resolution, refresh rate, or tracking capabilities. However, such specifications do not always translate into optimal user interaction or usability in practice. Moreover, many VR environments are developed in a device-agnostic manner, which ensures portability across platforms but frequently results in suboptimal performance and interaction quality [
28,
29,
30]. This gap between hardware capabilities and actual user experience underscores the need for systematic evaluations that extend beyond datasheets and specifications. Recent contributions have started to address the need for systematic evaluations of XR hardware in industrial contexts. Barve and De Amicis [
31] conducted one of the first user-centered validations of a low-cost VR head-mounted display (HMD) harvester simulator, quantifying cognitive load, usability, and simulator sickness with established instruments. Gornall et al. [
32] provided a systematic review of XR applications in Construction 4.0, mapping which XR modes, devices, and graphics engines are most prevalent, and identifying implementation barriers. Malva et al. [
33] performed a qualitative, criteria-based comparative analysis of eleven commercially available AR/MR devices for physical asset management, evaluating them against operational requirements such as ergonomics, display characteristics, and spatial mapping performance. Together, these studies underscore the growing interest in evidence-based hardware assessment for professional and industrial XR deployments, yet none of them offers a standardized, task-oriented benchmarking methodology focused on controller-based VR interaction for assembly, inspection, and training operations, the specific gap addressed by the present work. To fill this gap, the present study introduces a methodological framework with several novel elements beyond a simple device comparison. First, we propose a composite performance metric that normalizes task execution times and accuracy scores onto a common scale, making the comparison fair and reproducible across different headsets. Second, our task set is systematically designed to isolate the fundamental interaction primitives underlying industrial operations, such as grasping, rotating, inserting, button pressing, and throwing, following established ergonomic standards. Third, we adopt a weighted scoring scheme to reduce the influence of learning and fatigue effects across repeated trials, improving the reliability of the collected data. Finally, the entire evaluation pipeline, from task definition to statistical analysis (including repeated measures ANOVA and post hoc comparisons), is presented as a reusable benchmark for future assessments of VR hardware. Taken together, these elements constitute a transferable usability benchmarking methodology for VR headsets in industrial contexts, not merely a one-time device comparison. The objective of the present study is to address this gap by providing a comparative analysis of leading VR hardware devices with a focus on user interaction and general usability, evaluated through quantitative metrics rather than qualitative impressions. This is unlike prior work, which often relies on subjective assessments, single-device studies, or generalized specifications [
34,
35,
36,
37]; for example, Kamm et al. focused on the feasibility of a single headset for dexterity training in multiple sclerosis, Khorasani et al. compared interaction styles without cross-device analysis, Chang et al. reviewed VR sickness measurement methods, and Kaminska et al. provided general usability guidelines without quantitative KPIs. The present research employs a set of application-driven tests designed specifically to reflect industrial use cases, comparing five commercial headsets through objective metrics and inferential statistics. The analysis focuses on the interaction mediated through hand-held controllers, which remain the most widespread and reliable mode of input in industrial VR applications. Gesture recognition, while increasingly supported by many devices, is still heavily dependent on software libraries and contextual conditions, making standardized comparisons more complex at present. By narrowing the scope to controller-based interaction, the study ensures reproducibility and comparability across platforms, while also providing insights that are directly relevant to the majority of current industrial VR applications. Due to the exploratory nature of this work and the absence of prior direct comparisons among the selected devices, no directional hypotheses are formulated. Instead, the study aims to systematically collect quantitative and qualitative usability data, providing a foundation for future hypothesis-driven research.