1. Introduction
In craniomaxillofacial (CMF) radiology and surgery, technological advancements have catalysed a shift from conventional preoperative planning to three-dimensional (3D) patient-specific computer-assisted surgical simulation (CASS), driven by the continuous improvement in both the quality and quantity of radiological imaging data [
1,
2,
3]. However, while the diagnostic capability and information content of medical imaging have steadily advanced, visualisation methods have not progressed at a comparable pace. Radiologists and surgeons still rely on a conventional two-dimensional computer screen interface (CI) operated via mouse and keyboard inputs, despite the inherently three-dimensional nature of radiological scans and anatomical structures. The lack of true 3D immersion demands a high level of spatial reasoning, leading to time-consuming and potentially error-prone interpretations and surgical simulations [
4].
The application of CMF CASS extends to a variety of procedures, including orthognathic surgery, correction of deformities, trauma reconstruction and tumour resection, among others [
5,
6,
7,
8,
9,
10,
11,
12]. These simulations enable clinicians to preoperatively assess different surgical approaches, supporting the selection of the most suitable strategy and the planning of surgical cutting guides for osteotomies [
13]. Notably, CASS improves both efficiency and accuracy compared with conventional planning methods [
5,
14]. Accurate preoperative planning is paramount in CMF procedures to mitigate risks such as excessive bleeding and temporary or permanent nerve damage [
15,
16,
17,
18,
19,
20].
While virtual reality (VR) offers a true 3D user interface that enhances perception and comprehension, its integration into CMF CASS remains an area of ongoing research [
21,
22,
23]. Studies on VR in CMF surgery are limited, with only a few focusing on orthognathic procedures [
22]. Additionally, only two studies have specifically examined anatomical landmarking and user-experience preferences [
24,
25]. These studies indicate that current customised VR systems, while showing promising results in simple experimental models, have yet to be fully validated by independent teams. The integration of VR into clinical medical practice presents multiple challenges. Firstly, there is a scarcity of strong evidence confirming its effectiveness in medical settings, as most existing studies originate from pioneering installations and often lack methodological consistency and comprehensive data. Secondly, the absence of FDA-approved medical devices poses a significant hurdle [
23]. Additionally, a major constraint is the steep learning curve and the potential difficulties in adoption, particularly for experienced users who are accustomed to traditional techniques [
26]. More recent reviews have highlighted the growing interest in immersive virtual and extended reality applications in radiology, while emphasizing that clinical integration and robust validation remain key challenges for routine adoption [
27,
28].
The quality of radiological images plays a crucial role in CMF CASS. In recent years, cone-beam computed tomography (CBCT) has become a widely used tool for CMF modelling due to its cost-effectiveness and dose efficiency [
29]. Advances in CBCT technology have led to higher-resolution imaging, making it essential to carefully balance image quality with radiation dose. Ensuring clear visualisation of anatomical structures while minimising radiation exposure is vital for both diagnostic accuracy and patient safety. The impact of various CBCT imaging parameters on radiation dosage has been extensively examined in the scientific literature [
25,
26,
29,
30,
31,
32,
33]. Studies have shown that radiation exposure can be adjusted to meet clinical needs, often allowing for lower doses while preserving diagnostic quality [
32,
33].
However, despite extensive research on radiation dosage in CBCT, there has been limited investigation into how variations in radiation dose and imaging parameters specifically affect the clinical quality and usability of radiological images in CMF CASS. Furthermore, the effects of such adjustments on VR-based CMF CASS remain unexplored, highlighting a critical gap in current knowledge. Bridging this gap could enhance the precision of diagnostics and surgical planning, ultimately elevating patient-care standards in CMF radiology and surgery. A deeper understanding of the relationship between imaging parameters, radiation dose and clinical usability would contribute to more effective and safer imaging practices.
This study explored a previously unexamined area in CMF CASS by evaluating the impact of high-dose (HD) and low-dose (LD) CBCT imaging modes on image quality and usability across two user interfaces: computer interface (CI) and virtual reality (VR). In the CI environment, image assessment was based on CBCT multiplanar reconstruction (MPR) views, whereas in VR the same data were visualized using an immersive three-dimensional representation. Five CMF radiologists performed a subjective assessment using Likert scoring on CBCT-based visualizations and 3D-segmented models (3D-SEG), focusing on CMF CASS for bimaxillary osteotomy (BIMAX), a complex surgical procedure used to correct maxillofacial deformities affecting both the upper and lower jaws [
31].
Beyond assessing image quality, this study also further examined whether VR provides visualisation and usability comparable to CI, examined the potential impact of the VR headset on clinical outcomes and sought to validate VR image quality in a medical setting. To assess the learning curve and identify potential implementation challenges, mental workload was measured, and user experiences were gathered to inform future VR-based CMF software development.
The findings of this study will contribute to refining imaging protocols, enhancing the precision of VR applications in CMF radiology and CASS platforms and extending these advancements to broader radiological and surgical applications. By improving imaging standards, this research aims to elevate diagnostic accuracy and optimise surgical planning across multiple medical fields.
Taken together, this study was designed to examine how variations in CBCT radiation dose settings (high-dose and low-dose) influence image quality and clinical usability in CMF CASS when assessed using two different user interfaces: computer interface (CI) and virtual reality (VR). In addition, the study explored whether VR-based visualization can achieve comparable visualization quality and usability to CI and evaluated potential effects of the VR headset on clinical performance. User workload metrics and qualitative user experience feedback were further collected to identify learning-curve effects and practical implementation challenges relevant to future VR-based CMF applications.
2. Materials and Methods
2.1. Patient Data
This retrospective study used only previously acquired clinical CBCT data, and no patients were imaged specifically for research purposes. Ethical approval requirements and data-usage permissions are detailed in the Institutional Review Board Statement at the end of the manuscript. The CBCT scans were retrospectively retrieved from the Picture Archiving and Communication System (PACS) of Tampere University Hospital, Finland. The dataset included six patients (three males and three females; mean age: 40.6 years) and comprised a total of 12 CBCT scans. All patient data were pseudonymised.
Initially, all patients underwent HD full-facial CBCT scans for combined orthodontic and orthognathic surgery planning at the onset of treatment. Prior to surgery, LD full-facial CBCT scans were additionally acquired for 3D patient-specific computer-assisted surgical simulation (CASS).
All scans were obtained using a Planmeca Viso G7 scanner (Planmeca Oy, Helsinki, Finland) with a voxel size of 300 µm. The imaging parameters for HD mode were 110 kVp, 11 mA and 4.5 s, while LD mode was performed using 90 kVp, 14 mA and 4.5 s. Radiation dose was quantified using the dose–area product (DAP), which reflects the X-ray energy delivered to the patient and the irradiated field area. The field of view varied among patients. The mean DAP was 1106.45 mGy·cm
2 for HD imaging mode and 717.15 mGy·cm
2 for LD imaging mode. Exposure parameters and DAP values for all patients are presented in
Table 1.
2.2. Image Quality Evaluation
Five experts—CMF radiologists with 16 to 40 years of clinical experience—subjectively assessed the radiological diagnostic image quality and usability for CMF CASS. The evaluation comprised four sessions, during which CBCT scans were presented in randomised order. Each session lasted approximately 2.5 h and included an assessment of six scans from both imaging modes (three HD and three LD) using both the CI and VR user interfaces.
Sessions 1 and 2 constituted the first evaluation round. After a minimum interval of fourteen days to minimise recall bias, Sessions 3 and 4 formed the second round. Each radiologist therefore assessed all scans twice.
Each scan was evaluated under both viewing conditions (CBCT views and 3D-SEG), resulting in 30 assessments for the CI and 30 assessments for the VR interface per radiologist (15 anatomical landmarks under each viewing condition). This corresponded to 60 assessments per radiologist per scan. With five radiologists, each scan produced 300 individual assessments per round. The overall evaluation workflow is illustrated in
Figure 1.
To ensure comprehensive coverage of the anatomical areas relevant to CMF imaging and CASS planning, 15 anatomical landmarks and structures were selected for evaluation. Although clinical CASS workflows typically combine CBCT views and 3D-SEG viewing modalities, the present study focused exclusively on the assessment and comparison of image quality.
Eight landmarks pertain to the surface of the skull:
Pogonion (Pg)
Right mental foramen (RMenF)
Right mandibular foramen (RManF)
Left temporomandibular joint (the condylar head and its position relative to the skull base) (LTMJ)
Anterior nasal spine (ANS)
Nasion (N)
Midline of the upper incisor tips (MUI)
Incisive canal (IC)
Subsurface areas were evaluated through the following landmarks:
- 9.
Path of the right mandibular canal (RMC)
- 10.
Tips of the roots of the right lower second molar (TRLSM)
- 11.
Tips of the roots of the left upper first molar (TLUFM)
- 12.
Walls of the left maxillary sinus (WLMS)
- 13.
Tip of the root of the right upper canine (TRUC)
Additionally, two structures related to root canals and more detailed dental anatomy were evaluated:
- 14.
Root canals of the right lower second molar (RCRLSM)
- 15.
Root canals of the left upper first molar (RCLUFM)
For subjective analysis of image quality and usability, a five-point Likert scale was used with the following definitions:
0: Not usable for diagnostics
1: Partly usable for diagnostics
2: Minor issues, almost fully usable for diagnostics
3: Almost perfect, fully diagnostically usable
4: Perfect, fully diagnostically usable.
Ratings of 3 and 4 were considered diagnostically acceptable, whereas ratings from 0 to 2 were classified as not fully suitable. Median and mean Likert values were calculated for all data and analysed separately for each anatomical landmark. Intra- and interobserver agreements were determined to validate the assessments.
CBCT data were visualized as multiplanar reconstructions (MPR) in the computer interface (CI) interface and as an interactive volumetric representation in the virtual-reality environment. For clarity, these visualizations are hereafter collectively referred to as CBCT views.
The CI evaluation was performed using a Barco MDRC-2224 BL 24-inch DICOM-calibrated monitor (1920 × 1200 resolution; Barco NV, Kortrijk, Belgium) and Planmeca Romexis software Planmeca Romexis software (Planmeca Oy, Helsinki, Finland; version 6.2.1.19). The VR evaluation was conducted using a modified version of the Planmeca VR software (Planmeca Oy, Helsinki, Finland) originally designed for dental implantology. The software was modified for the purposes of this study.
The system was implemented using the Unity 3D development platform (Unity Technologies, San Francisco, CA, US (version 2021.1)). A Meta Quest 3 headset (Meta Platforms, Menlo Park, CA, US) with a resolution of 2064 × 2208 per eye (4.56 million pixels per eye; 9.11 million total) and Meta Touch Plus controllers were used. The workstation consisted of an MSI Raider GE78HX (Micro-Star International, New Taipei City, Taiwan) equipped with an Intel i9 processor, 32 GB RAM and an NVIDIA 4080 GPU (NVIDIA Corporation, Santa Clara, CA, USA).
Radiologists used a comprehensive range of tools within both user interfaces. In CI, the software integrated into the standard radiological workflow, beginning with coronal (
Figure 2a), sagittal (
Figure 2b) and axial (
Figure 2c) CBCT views, as well as a 3D-segmented model (
Figure 2d). These could be freely rotated, zoomed, enhanced, thresholded and cropped based on expert preference.
In the VR interface, users interacted with a full 3D-segmented model (
Figure 3a), supplemented by manipulation tools for windowing, thresholding and cutting (
Figure 3b). The CBCT views provided movable coronal (
Figure 3c) and sagittal (
Figure 3d) radiological slices displayed alongside the model. Alternatively, the model could be visualised as a hologram-like CBCT views. All VR CBCT views interactions were controlled via a cursor ball, with the displayed slice updating dynamically according to its position in relation to the 3D model across 360°.
2.3. Intra- and Interobserver Agreements
Inter- and intraobserver reliability were assessed based on the two evaluation rounds described above. Round 1 consisted of Sessions 1 and 2, and Round 2 of Sessions 3 and 4, separated by a minimum interval of fourteen days. All six patients and all twelve CBCT scans were evaluated in both rounds by all five radiologists.
Inter- and intraobserver agreements were calculated using the percentage of absolute agreement, defined as the proportion of identical ratings among observers relative to the total number of ratings. Agreement scores ranged from 0 to 1 and were interpreted as follows: <0.50: poor; 0.50–0.75: fair; 0.75–0.90: good; >0.90: excellent agreement.
2.4. Statistical Analysis
Given the limited number of patients and expert evaluators, the analyses were restricted to descriptive statistics. No inferential statistical testing was performed, and no p-values or confidence intervals are reported. Likert-scale ratings were summarised using both mean and median values to provide an indicative overview of image quality and usability across imaging modes, visualization interfaces, and anatomical structures.
The study was designed as a pilot and feasibility investigation aimed at exploring observable trends rather than testing predefined hypotheses or establishing statistical significance. Consequently, no a priori sample size calculation was conducted. The inclusion of six patient datasets and five CMF radiologists was based on feasibility considerations and the exploratory objective of obtaining expert-level comparative assessments to inform the design of future, larger-scale studies.
2.5. NASA Task Load Index
Following each VR session, every radiologist completed the NASA Task Load Index (NASA-TLX), a subjective measure of perceived mental workload. The index evaluates workload across several dimensions, producing an overall workload score for each user. In this study, the NASA-TLX was collected using a scale ranging from 0 (very low) to 20 (very high) across five indicators: (1) mental demand, (2) temporal demand, (3) performance, (4) effort and (5) frustration. Because the use of CI is routine in radiologists’ daily work, a separate usability analysis for CI was not required. The NASA-TLX questionnaire used in this study is shown in
Figure 4.
2.6. The Follow-Up Interviews
Follow-up interviews were conducted after each VR session, with two interviews in the first round and two in the second. Radiologists provided insights into their overall experience with the VR interface, highlighting both strengths and areas for improvement. They also assessed the system’s usability compared to the CI and offered recommendations for future VR development. To ensure candid and comprehensive feedback, experts were encouraged to share their thoughts freely.
3. Results
3.1. Image Quality Evaluation
All results are presented descriptively, focusing on observed differences and trends in mean and median ratings.
A total of 7200 Likert-scale ratings were collected to evaluate image quality across HD and LD imaging modes, CI and VR user interfaces, and CBCT views and 3D-SEG viewing conditions. To provide an overview of overall image quality, median and mean values were calculated for each modality combination. While the median scores showed minimal variation across imaging modes and viewing conditions, the mean values demonstrated slightly greater variability (
Table 2).
While
Table 2 provides an overall summary of image quality across modalities, a more detailed assessment at the anatomical-landmark level is necessary to understand where these differences arise.
Table 3 presents the median ratings for each of the 15 anatomical structures across all imaging conditions, highlighting the generally high perceived image quality.
Table 4 complements these findings by presenting the corresponding mean values, which reveal finer distinctions between modalities and viewing conditions. These detailed tables form the basis for the subsequent analysis of diagnostic sufficiency.
The Likert scores for anatomical landmarks and structures derived from CBCT views data showed minimal variation, with consistently high ratings across both visualization interfaces. In contrast, the 3D-SEG evaluations demonstrated greater variability in mean values, revealing modality-dependent differences that were not apparent in the median scores. Ratings of 3–4 on the Likert scale were predefined as diagnostically acceptable, whereas ratings of 0–2 were considered not fully suitable for diagnostic use. In the present dataset, however, only one mean score fell below 3 across all conditions, indicating a ceiling effect that limited meaningful differentiation at the level of mean values. Therefore, a stricter exploratory cutoff of 3.5 was applied to the mean Likert scores for relative comparison only, with the aim of identifying anatomical regions that were comparatively more challenging to interpret.
As only one mean score fell below 3 across all conditions, a cutoff value of 3.5 was adopted to identify anatomical regions that were more challenging to interpret, highlighting four structures with notably reduced visibility.
Representative examples of the different viewing conditions are shown in
Figure 5, illustrating how anatomical structures—including the left temporomandibular joint (LTMJ)—appear across 3D-SEG and CBCT views modalities in both CI and VR interfaces.
The RMC was the most difficult structure for both interfaces. The lowest score was recorded in HD VR (2.80), followed by HD CI (3.03). LD data improved performance in both interfaces, yielding identical mean values of 3.07. Similar challenges were observed for the WLMS. Here, HD CI achieved a score of 3.10, while HD VR performed slightly better at 3.35. However, unlike the RMC, LD images resulted in a modest reduction in the scores for the WLMS, with means of 3.03 for CI and 3.28 for VR.
The LTMJ also exhibited reduced visibility in CI under both HD and LD conditions, with mean scores of 3.15 and 3.25, respectively. VR performed better for this structure, improving from 3.45 in HD to 3.58 in LD. Representative examples of the LTMJ under different modalities and viewing conditions are shown in
Figure 5: 3D-SEG visualizations in VR (
Figure 5a) and CI (
Figure 5b), coronal and sagittal CBCT views in VR (
Figure 5c,d), and coronal and sagittal CBCT views in CI (
Figure 5e,f).
For the RCRLSM, the VR interface produced consistent scores of 3.33 in both HD and LD, whereas CI recordings improved slightly from 3.45 in HD to 3.55 in LD, indicating modest benefits from LD data in this case.
Overall, only 3.2% of all Likert ratings (228 scores) fell within the 0–2 range, indicating performance below full diagnostic capability. The distribution of these ratings is summarised in
Table 5. Of the remaining evaluations, the vast majority—6972 ratings—were scored at a diagnostically acceptable level (Likert 3 or 4), corresponding to 96.8% of all assessments. This high proportion of acceptable scores was consistent across both HD and LD imaging modes and across interfaces: in CI, 99.7% of CBCT views and 94.0% of 3D-SEG ratings were acceptable, and in VR, the corresponding proportions were 99.6% and 94.0%, respectively.
No scores of 0 (“not usable for diagnostics”) were recorded. Scores of 1 (“partly usable for diagnostics”) were absent in CBCT views assessments but occurred in 3D-SEG images across both interfaces. These consisted of ten ratings in total and were distributed across the five experts as follows: 5, 5, 4, 5, and 1. Most suboptimal ratings were 2 (“minor issues, almost fully usable for diagnostics”). A total of 218 scores fell into this category, occurring predominantly in 3D-SEG conditions—CI (HD: 50, LD: 48) and VR (HD: 56, LD: 42)—with fewer occurrences in CBCT views: CI (HD: 3, LD: 2) and VR (HD: 4, LD: 3). The distribution of score-2 evaluations among the experts was 33, 103, 60, 4, and 8, respectively.
These findings highlight the influence of anatomical characteristics, interface type, and viewing condition on interpretability, as well as notable interobserver variation. The implications of these patterns for diagnostic confidence and workflow integration are addressed in the following section.
3.2. Intra- and Interobserver Agreement
The summarised intra- and interobserver agreements for all radiographic assessments ranged from good to excellent. Intraobserver agreement values varied between 0.76 and 0.97 across the five evaluators (E1–E5), while interobserver agreement values ranged from 0.73 to 0.91. These results are presented in
Table 6.
The full interobserver agreement matrix for all evaluators (E1–E5) across CI, VR, CBCT views and 3D-SEG assessments is presented in
Table 7.
Differences in interobserver agreement were observed between the CI and VR interfaces. In the CI, agreement ranged from 0.91 to 0.99 for CBCT views and from 0.60 to 0.78 for 3D-SEG. In the VR interface, agreement values ranged from 0.78 to 0.97 for CBCT views and from 0.60 to 0.86 for 3D-SEG. The detailed interobserver agreement matrices for CI 3D-SEG, CI CBCT views, VR 3D-SEG and VR CBCT views are provided in the
Supplementary Material (Tables S1–S4).
Four key observations highlighted the variability in interobserver agreement across different anatomical areas, notably influenced by specific radiologists. In RMC, interobserver agreement values ranged from 0.42 to 0.70, with notable differences observed across various modalities: CI 3D-SEG showed values between 0.17 and 0.58, VR CBCT views from 0.33 to 0.96, and VR 3D-SEG between 0.17 and 0.50. Expert E3 provided particularly critical values of 0.21, 0.46, 0.46, and 0.17.
For the WLMS, interobserver agreement ranged from 0.53 to 0.80, with variability in the CI 3D-SEG ranging from 0.12 to 0.54. Expert E2 was critical in this analysis, scoring 0.38, 0.25, 0.29, and 0.12. In the VR 3D-SEG, the observed differences varied from 0.08 to 0.88, with E2’s scores being 0.21, 0.08, 0.08, and 0.08.
The LTMJ exhibited interobserver agreement values from 0.25 to 0.58 within the CI 3D-SEG, reflecting variability based on anatomical position and relationships. Lastly, for the pogonion, interobserver agreement values in the CI 3D-SEG were notably low, ranging from 0.04 to 1.00, with Expert E2 consistently showing critical scores of 0.08, 0.04, 0.04, and 0.46.
3.3. NASA Task Load Index
The ranges of NASA-TLX values for the five experts and the median scores for the first and fourth sessions are presented in
Table 8. Mental demand decreased slightly from the 1st to the 4th session, reflected in a lower median (11 → 10) and a narrower range (4–18 → 2–15). Temporal demand increased marginally, with the median rising from 8 to 10, while its range decreased (4–18 → 3–15). Performance ratings decreased between sessions, as shown by a reduced median (18 → 14) and a contraction in range (15–19 → 11–17). Effort also decreased, with the median dropping from 12 to 8 and the range narrowing from 4–20 to 3–11. Frustration levels declined, with both the median (9 → 6) and range (4–19 → 3–14) showing reductions at the 4th session.
The mental demand (
Figure 6a) and temporal demand (
Figure 6b) scores varied between experts. Mental demand decreased across all experts except Expert 3 (E3). Temporal demand also generally decreased from the first to the fourth session, with the exception of Experts 3 (E3) and 4 (E4). Performance scores (
Figure 6c) showed a slight overall decline, except for Expert 1 (E1), who maintained stable performance across sessions. Both effort (
Figure 6d) and frustration (
Figure 6e) levels decreased for all experts except E3.
3.4. Follow-Up Interviews
Follow-up interviews provided additional insight into the radiologists’ experiences with the VR interface. All radiologists found VR 3D-SEG more intuitive than CI 3D-SEG and highlighted its potential to improve visualisation in complex cases. Responses were consistent across all interview rounds, with both strengths and weaknesses of the VR interface identified during the image-quality evaluations.
As the sessions progressed (Rounds 3–4), new challenges emerged relating to navigation and interaction within the VR CBCT views environment. These challenges were not associated with image quality but instead with the logical functionality and operational stability of the VR interface when compared with the familiarity of CI CBCT views. Radiologists emphasised the need for clearer user guidance and improved navigation functions in VR CBCT views to support usability and to facilitate potential integration into clinical radiological workflows in future versions.
4. Discussion
Understanding the interplay between radiation dose, diagnostic image quality and usability is essential for optimising patient safety and enhancing outcomes in CMF radiology and CASS. This study investigated how two CBCT imaging modes—HD and LD—influence both subjective image quality and usability, including the interpretation of anatomical structures and landmarks, in CMF BIMAX CASS across CI and VR. The quality of CBCT data, encompassing both CBCT views and 3D-SEG, is crucial for accurate CASS [
24], highlighting the importance of imaging protocols tailored to specific CBCT systems [
31]. Furthermore, the overall usability of VR was assessed within this context, aiming to optimise the balance between dose reduction and the maintenance of high diagnostic reliability. Previous research has suggested that LD CBCT can be advantageous in clinical settings by enhancing patient safety. However, to the best of our knowledge, the effects of radiation dosage in VR-based radiological assessments have not been previously studied. Moreover, concerns remain regarding the diagnostic quality of images viewed through commercial VR headsets, which currently lack DICOM calibration.
In dental applications, virtual reality has been explored in implant planning, demonstrating potential to enhance spatial understanding and manipulation during preoperative procedures [
34].
Studies from other radiological fields have explored VR’s potential to enhance non-invasive imaging and support complex diagnostics and surgical planning. VR has shown promise in improving the understanding of intricate anatomical and pathological structures [
35], including applications involving blood vessels [
36], brain tissue and tumour models [
37], fracture analysis and custom bone implant planning [
38].
It should be noted that the low-dose (LD) protocol differed from the high-dose (HD) protocol not only in tube current–time product but also in tube voltage, which was reduced from 110 kVp to 90 kVp. Changes in kVp influence X-ray beam penetrability and contrast characteristics in addition to image noise. The enhanced subject contrast at lower kVp may partially compensate for increased noise, particularly in osseous structures, and provides a plausible explanation for why certain anatomical features received slightly higher scores under LD conditions. Consequently, the observed differences should not be attributed solely to dose reduction but rather to the combined effects of exposure parameter adjustments. These findings underscore the importance of interpreting LD versus HD comparisons in CBCT as protocol-based differences rather than as isolated dose effects.
The mean Likert scores for subjective diagnostic image quality in CBCT views across both CI and VR interfaces showed no major differences between HD and LD modes, although there was a slight tendency favoring LD. In contrast to CBCT views, the mean Likert scores for 3D-SEG demonstrated greater variability across both CI and VR interfaces, although all scores remained within diagnostically acceptable limits. Minor difficulties were observed in four anatomical areas when using 3D-SEG: two were common to both interfaces, with one additional area noted in CI and another in VR.
While this study focused exclusively on image-quality assessment and comparison, it is important to note that in clinical CASS workflows, both CBCT views and 3D-SEG viewing modes are typically used in combination. This integrated approach not only facilitates cross-validation of anatomical structures but also ensures diagnostic robustness, particularly in cases where 3D-SEG visualisation may be suboptimal and CBCT views can serve as a reliable alternative.
The RMC was initially challenging to interpret in both CI and VR 3D-SEG. The complexity of the mandibular canal (MC) poses both anatomical and radiological difficulties. Previous studies have shown significant variability in MC identification, with some canals being difficult or impossible to visualise. This is particularly evident in the molar region, where visibility can be affected by factors such as reduced bone density and sparse trabecular structure. When cortical borders are absent, canal localisation may depend on contrast differences or trabecular patterns [
39,
40], further complicating 3D-SEG assessments. Precise localisation of the MC is essential in BIMAX, as the mandibular split along the sagittal plane carries a risk of nerve injury and bleeding [
41]. Despite these challenges, the RMC achieved satisfactory diagnostic scores, with mean Likert values exceeding 3 (“almost perfect, fully usable”) in both HD and LD CI 3D-SEG and in LD VR 3D-SEG. Interestingly, HD VR 3D-SEG was the only mode to fall below 3 (mean 2.80; “minor issues, almost fully usable”), suggesting improved segmentation in LD mode.
The second common challenge across interfaces was the segmentation of the WLMS. Accurate visualisation of the maxillary sinus walls is vital in CMF CASS, especially in 3D-SEG, as BIMAX involves a horizontal osteotomy passing through these regions. Evaluation is complicated by the lateral thinness of the sinus walls and occasional anatomical extensions into the dental area, which may include tooth roots [
42]. Notably, this was the only region in which HD outperformed LD, aligning with prior findings on orbital-floor segmentation, where thin bony structures were compromised by smaller voxel sizes and low contrast-to-noise ratios in LD modes [
29]. In follow-up interviews, radiologists noted that the WLMS was more easily visualised in VR, attributed to the favourable visualisation characteristics of VR 3D-SEG.
In CI, assessing the LTMJ posed greater challenges compared with VR. Before undergoing BIMAX, it is essential to evaluate facial disproportion and the condition of the LTMJ to identify any discrepancies that may indicate potential postoperative complications [
43]. Radiologists noted that in VR 3D-SEG, the complex LTMJ area—often obscured by overlapping structures—was more clearly visualised than in CI 3D-SEG, echoing the observations made for the WLMS. Additionally, a similar enhancement in the quality of the 3D-SEG viewing mode, as observed for the RMC, was seen here, with LD performing better than HD in both CI and VR.
In VR, challenges arose in evaluating the RCLUFM. While root canals are not typically segmented in 3D models, their inclusion was crucial for assessing fine-detail 3D-SEG image quality. Radiologists noted that anatomical curvature, small dimensions and occasional canal obliteration limited visibility in this region.
Intraobserver agreement across radiologists was consistently good, reflecting strong internal consistency. Interobserver agreement ranged from moderate to good, indicating overall consensus with some variability. A closer examination of mean Likert scores in CI and VR 3D-SEG revealed that two radiologists were more critical—one particularly in VR and the other in both CI and VR 3D-SEG—whereas the remaining experts rated these regions as diagnostically sufficient.
The NASA-TLX assessments provided further insights into VR usability and cognitive workload. Radiologists reported a wide range of scores, reflecting individual variability in adapting to the VR interface. Although all participants were experienced CMF radiologists routinely using CBCT on CI platforms, VR was a novel user interface for most, with only one out of five having prior hands-on VR experience. This novelty likely contributed to the variation observed across sessions.
The observed changes in NASA-TLX scores should be interpreted primarily as reflecting a learning-curve effect rather than indicating inherent superiority of VR over computer interface (CI). As VR represented a novel user interface for most participants, reductions in mental demand, effort, and frustration over successive sessions likely reflect increasing familiarity with the interaction paradigm and workflow. Importantly, the study did not include a direct baseline comparison between CI and VR for CBCT view interpretation, and qualitative feedback indicated that CI was generally preferred when assessing CBCT views, whereas VR was considered more advantageous for interaction with 3D-segmented models.
Over the course of four sessions, notable decreases were observed in mental demand, effort and frustration, indicating that users adapted to the interface and workflow. This adaptation reduced cognitive and emotional strain over time. Interestingly, the performance scores showed a slight decline, which may suggest rising self-expectations or increasing task complexity. Temporal demand increased modestly in the final session, possibly reflecting mounting time pressure or a greater degree of task familiarity prompting a faster pace. These trends were further elucidated in the follow-up interviews.
Radiologists’ adaptation to the VR interface varied. Three participants rapidly mastered efficient controller movements and task execution. Despite this, all users experienced some difficulty when using the CBCT views mode in VR—unlike the 3D-SEG viewing condition, which was generally perceived as intuitive. Two participants reported considerable frustration in certain sessions, and one noted a marked increase in temporal demand by session three. Additionally, two radiologists experienced physical symptoms, such as nausea and vertigo, particularly during longer VR exposures—symptoms commonly reported in immersive environments.
When asked to compare CI and VR usability and provide suggestions for improvement, radiologists generally favoured VR 3D-SEG for its enhanced intuitiveness in exploring anatomical structures and their spatial relationships. Unlike in CI 3D-SEG, overlaid anatomical structures in VR were perceived as less distracting, thereby facilitating improved spatial perception and overall ease of interpretation. Conversely, for CBCT views, CI was preferred due to its standardised layout, consistent usability and alignment with established clinical routines. The absence of familiar tools and workflows in VR CBCT views was a notable drawback, leading most experts to prefer traditional slice-based navigation over the holographic CBCT views representation.
This gap in user experience points to the need for a more streamlined and standardised VR workflow. Suggested improvements included an intuitive user interface with customisable shortcuts to reduce cognitive load and improve navigation efficiency. Enhanced controller functionality, ergonomic headset design and built-in vision correction were also recommended to improve comfort and precision. Voice-command integration was proposed to enable hands-free task execution, particularly in time-sensitive or high-volume clinical environments.
Looking ahead, radiologists envisioned the future of VR as a fully integrated alternative within standard radiology workstations and hospital PACS. Seamless PACS integration would ensure real-time access to imaging studies while maintaining compliance with data-security protocols. Furthermore, VR platforms should support integrated speech recognition for immediate reporting, mirroring existing CI systems. This would eliminate the need for post-examination dictation, ensuring workflow continuity, reducing reporting turnaround times and minimising errors associated with context switching.
In addition to these workflow-related preferences, radiologists emphasised that the clinical value of VR ultimately depends on the diagnostic reliability of the visual information it provides. Their feedback therefore highlights not only usability requirements but also the need to determine whether VR can deliver image quality that is truly comparable with established clinical standards.
Taken together, the results also suggest that the diagnostic performance of the VR software and headset may approach that of the gold-standard CI setup using DICOM-calibrated monitors, indicating promising clinical potential for future VR-based radiological applications. However, further research involving larger and more diverse datasets is needed to confirm these findings across broader clinical scenarios. In addition, future VR systems should ideally be capable of DICOM-level calibration to ensure consistent diagnostic quality and enable safe, standardised integration into clinical radiology workflows.
The present study has several limitations that should be considered when interpreting the findings. First, the small sample size of six patients limits the generalisability of the results. The study was designed as a pilot and feasibility investigation to explore trends in image quality, usability, and observer experience rather than to support population-level inference. While the limited number of cases restricts broad extrapolation, the multi-dimensional evaluation enabled detailed within-case comparisons across dose settings and visualization interfaces.
Second, the study relied on a single CBCT scanner, one VR headset, and one VR software platform, which may limit applicability to other technical configurations. In addition, image quality assessment was based primarily on subjective Likert-scale evaluations without objective image quality metrics. This approach was chosen to emphasize clinically relevant usability and interpretability in conventional and VR-based visualization, but it limits the strength of quantitative performance comparisons. Future studies should therefore include larger and more diverse patient cohorts, multiple imaging and VR platforms, and combine subjective expert assessments with objective, task-specific image quality measures.
In addition, not all types of image artifacts associated with low-dose CBCT imaging were represented in the present dataset. Similarly, the range of included clinical conditions does not cover all pathologies that may be relevant for future transfer to digitalized VR-based models. As a result, the findings may not fully capture artifact-dependent or pathology-specific challenges in low-dose imaging and VR visualization. Future studies should include a broader spectrum of artifact types and clinical scenarios to further assess the robustness of the proposed approaches.
Furthermore, voxel size may influence the generation and visualization of three-dimensional models used in virtual reality. Larger voxel sizes can reduce spatial resolution and surface detail, potentially affecting the visual fidelity and perceived accuracy of graphical models, particularly for fine osseous structures. Although a consistent voxel size was used across all acquisitions in this study, the impact of voxel size on VR-based model quality was not systematically evaluated. Future work should investigate how different voxel sizes affect segmentation accuracy, surface representation, and user perception in VR environments.