1. Introduction
Periodontal bone loss is one of the most clinically relevant indicators for diagnosing, staging, and monitoring periodontitis, and radiographic assessment remains an indispensable component of periodontal evaluation. In recent years, cone-beam computed tomography (CBCT) has gained increasing importance due to its ability to provide three-dimensional visualization of periodontal structures, offering advantages over conventional two-dimensional imaging, such as improved detection of intrabony defects, dehiscences, and complex bone morphologies [
1,
2]. However, the diagnostic value of CBCT is closely dependent on image quality, including voxel size, noise levels, and reconstruction algorithms, which directly influence the visibility and measurability of fine periodontal structures [
3,
4,
5].
Voxel size has been widely discussed as a key determinant of measurement precision in CBCT-based periodontal assessments. However, its diagnostic impact remains incompletely resolved. While smaller voxel sizes generally improve spatial resolution, they are associated with increased radiation exposure, whereas larger voxel sizes may reduce anatomical detail and increase the risk of measurement inaccuracy or diagnostic misclassification [
6,
7]. Several studies have reported that voxel enlargement compromises the detection of subtle cortical defects, external resorptions, and early-stage bone loss [
8,
9,
10]. However, other investigations suggest that clinically acceptable measurement variability may still be achievable under certain low-resolution or low-dose conditions, depending on defect morphology and examiner experience [
11]. Moreover, CBCT studies differ substantially in acquisition protocols, voxel ranges, reconstruction algorithms, and evaluation methods, limiting direct comparability across findings. Variations in exposure parameters, segmentation accuracy, and post-processing techniques further contribute to inconsistent conclusions regarding diagnostic fidelity [
12,
13,
14]. Consequently, despite extensive research, the extent to which voxel size and image degradation systematically influence periodontal bone measurements, particularly in the context of standardized measurement workflows and emerging AI applications, remains insufficiently characterized [
6].
Importantly, most previous investigations have evaluated voxel size or exposure parameters under device-specific acquisition settings rather than through systematic, experimentally controlled degradation of identical datasets. As a result, it remains difficult to isolate the independent contribution of spatial resolution reduction or image blurring to examiner-derived measurement variability. Controlled degradation modeling offers a methodological advantage by allowing the manipulation of defined image parameters while preserving all other anatomical and acquisition-related variables. Such an approach enables the assessment of relative measurement shifts under standardized conditions and provides a reproducible experimental framework for analyzing image-quality-dependent variability.
Beyond clinical diagnostics, interest in leveraging CBCT datasets for AI-driven periodontal assessment has surged in recent years. Initial studies have indeed reported promising results—for instance, deep learning models can automatically detect, classify, and quantify periodontal bone loss from both 2D radiographs and 3D CBCT scans. Some AI systems even approach expert-level performance in identifying bony defects and measuring bone levels under ideal conditions [
9,
15,
16,
17,
18,
19]. However, these advances come with notable limitations and open questions. Critically, most algorithms have been developed and tested on relatively high-quality, controlled imaging datasets—sometimes even excluding lower-quality scans, which inherently limits their generalizability [
20]. In practice, CBCT image quality varies widely due to differences in voxel size, noise levels, reconstruction algorithms and artifact burdens. While artificial intelligence (AI) models have achieved promising performance on standardized datasets, their robustness under real-world variability remains unclear. Recent research demonstrates that AI-driven noise reduction and artifact suppression can significantly enhance subjective and objective image quality parameters, such as contrast-to-noise ratio and artifact index, suggesting the potential utility of AI in dental CBCT [
21]. However, these improvements are often observed in controlled settings and do not necessarily translate across heterogeneous data sources. Systematic analyses of deep learning-based image enhancement methods indicate that, although low-dose CBCT images can be processed to improve diagnostic visibility, the intrinsic variability in CBCT image quality—including noise and artifact patterns—poses ongoing challenges that have not yet been fully resolved [
22]. Moreover, deep learning techniques for artifact reduction show promise but are currently limited by the need for consistent, high-quality training data and may not generalize well to all clinical CBCT acquisitions [
22]. Finally, broader reviews of AI applications in dental imaging highlight that models performing well on one dataset often lack external validation and may not maintain accuracy across scanners and imaging protocols, raising concerns for generalizability in real-world practice [
23]. This inconsistency underscores the need to better understand how CBCT image degradation systematically affects both human and algorithmic interpretation, and thus motivates the current study, which investigates the impact of voxel enlargement and simulated blur on periodontal bone evaluation.
From a methodological perspective, variability in image quality may introduce structured label noise into AI training datasets. If examiner-derived measurements systematically shift under specific degradation conditions, such variability may propagate into ground-truth annotations, thereby affecting model calibration and generalizability. Consequently, understanding how controlled image degradation conditions influence human measurement behavior represents a critical prerequisite for establishing reliable reference baselines in AI-based periodontal diagnostics.
Image noise and blur further complicate periodontal assessment in CBCT and reflect inherent limitations of the modality. Unlike conventional medical CT, CBCT reconstruction relies on cone-beam geometry and filtered back-projection algorithms that are particularly susceptible to noise amplification and incomplete scatter correction. Consequently, reconstruction and smoothing algorithms often fail to preserve fine anatomical detail and instead introduce image blurring—an established limitation of CBCT. Although post-processing and noise-reduction techniques may improve subjective readability, several studies have shown that such approaches can distort edge definition and bias quantitative measurements, especially when assessing thin cortical bone or subtle periodontal defects [
24,
25,
26]. Image noise is further exacerbated by artifacts caused by high-density materials. Metallic restorations, implants, and prosthetic components generate beam hardening, streak artifacts, and photon starvation, leading to localized contrast loss and artificial blurring. These effects are particularly pronounced in periodontal regions adjacent to metallic structures or complex anatomical boundaries, where partial volume effects further impair defect visualization [
27]. While metal-artifact-reduction and advanced reconstruction techniques have been proposed, their effectiveness remains limited and highly dependent on scanner design and acquisition parameters. When such degradations occur, examiner-dependent variability increases substantially, as observers rely more on subjective interpretation rather than clearly defined anatomical landmarks. This has been consistently reported in studies evaluating CBCT reproducibility under varying acquisition and reconstruction conditions [
28,
29,
30].
Despite the growing body of literature addressing voxel size, artifact reduction, and AI-based periodontal diagnostics, a structured experimental framework systematically modeling defined image degradation conditions within a single standardized dataset remains largely absent. Specifically, there is a lack of proof-of-concept investigations quantifying how controlled voxel enlargement and simulated blurring influence examiner-derived periodontal bone measurements under identical anatomical conditions. Addressing this gap is essential to disentangle measurement variability caused by image degradation from variability arising from anatomical heterogeneity or acquisition differences [
27,
31,
32].
Therefore, the aim of the present study was to assess how systematic degradation of CBCT datasets via voxel enlargement and simulated blur affects periodontal bone measurements performed by clinicians. By establishing a controlled in vitro reference standard and quantifying examiner variability across multiple image degradation conditions, this study seeks to provide a reproducible methodological framework for analyzing image-quality-dependent measurement shifts. Rather than evaluating absolute diagnostic validity, the present investigation focuses on relative measurement behavior and inter-examiner reproducibility under standardized degradation scenarios.
Moreover, given the rapid proliferation of AI-based periodontal imaging tools, understanding how image quality affects human-derived measurements baselines is a critical step toward designing robust AI training pipelines capable of accommodating real-world variability. The findings are intended to serve as hypothesis-generating evidence and as a methodological foundation for future multicenter validation studies integrating human and algorithmic performance assessments under controlled image degradation conditions.
In the present study, the term “image degradation conditions” refers to systematically modified CBCT datasets generated through voxel enlargement and/or simulated blurring.
2. Materials and Methods
2.1. Study Design and Conceptual Framework
This study was designed as an exploratory in vitro proof-of-concept investigation to evaluate the influence of controlled image degradation conditions on periodontal bone-level measurements. A single CBCT dataset was systematically modified to generate predefined image degradation conditions. The primary outcome was examiner-derived linear bone-level measurement (mm). The study aimed to quantify relative measurement shifts and inter-examiner reproducibility across degradation conditions rather than to assess diagnostic accuracy against a gold standard. Given the exploratory proof-of-concept design and the use of a single CBCT dataset under controlled experimental manipulation, the study was not intended to establish absolute diagnostic thresholds but to isolate the relative influence of defined image-quality parameters on examiner-derived measurements under standardized conditions.
2.2. CBCT Dataset and Acquisition Parameters
For systematic investigations, a CBCT dataset was acquired from an anonymized participant who exhibited one periodontally healthy tooth (31) and one tooth with clearly detectable periodontal bone loss (41). Imaging was performed using a Planmeca ProMax 3D Mid unit (Planmeca, Helsinki, Finland). The original isotropic voxel size of the acquired dataset was 0.2 mm × 0.2 mm × 0.2 mm, which served as the baseline spatial resolution for all subsequent degradation procedures. The complete dataset was anonymized in Romexis 7 (Planmeca), and the patient orientation was randomized prior to export. Randomization of orientation was performed to minimize potential observer bias related to anatomical alignment. The dataset was then saved as a single-shot DICOM series, which served as the reference for all subsequent processing steps and for generating the artificially degraded comparison datasets. The CBCT acquisition was performed with a cylindrical field of view of 200 mm in diameter and 100 mm in height, using a tube voltage of 90 kV, a tube current of 8 mA, and an exposure time of 13.5 s. The recorded dose–area product (DAP) was 1745 mGy·cm2, and the computed tomography dose index (CTDI) was 8.1 mGy. No additional reconstruction filters beyond the manufacturer’s standard reconstruction protocol were applied during initial image acquisition. All DICOM data were exported without compression, and grayscale intensity values were preserved in their native format without rescaling or histogram normalization prior to degradation processing.
2.3. Image Degradation Pipeline
The analysis was based on the exported clinical DICOM data, from which a single axial series was selected to function as the reference volume. The selected axial series contained the complete mandibular anterior region and was verified to include consistent slice spacing and intact DICOM metadata prior to further processing. Artificial image degradation was applied to this dataset using two methodological approaches: a controlled reduction in spatial resolution through voxel enlargement and a simulation of image blur using filter-based noise modification. All degradation procedures were performed on the identical reference dataset in order to isolate the independent effect of each image degradation condition while preserving anatomical constancy. No intensity normalization, contrast enhancement, histogram equalization, or additional preprocessing steps were applied before or after degradation procedures. The degradation workflow was implemented using Python-based processing (SimpleITK v2.3.1, NumPy v1.26.4, SciPy v1.11.4), and all transformation parameters were applied uniformly across datasets to ensure methodological consistency and reproducibility.
2.3.1. Voxel Enlargement
To simulate reduced spatial resolution, the original DICOM reference volume was resampled using a custom Python script built on the SimpleITK library. The original isotropic voxel size of 0.2 mm was increased to 0.4 mm (Group 2) and 0.6 mm (Group 3), respectively, in all three spatial axes (x, y, z), resulting in isotropic resampling. All valid slices of the input dataset were merged into a three-dimensional volume, and the image spacing was increased by a factor of either two or three in all spatial axes. Resampling was performed using trilinear interpolation (SimpleITK (v2.3.1; Insight Software Consortium, open-source software, USA) default linear interpolator) to approximate clinically realistic reconstruction behavior while avoiding edge exaggeration associated with higher-order spline interpolation. The resampled volume was reconstructed using linear interpolation and then re-sliced into individual layers. No nearest-neighbor interpolation was used in order to prevent voxel block artifacts. The resulting DICOM series contained regenerated UID structures to ensure DICOM conformity, while essential relative metadata such as instance number and slice location were preserved. Grayscale intensity values were preserved during resampling without histogram normalization, contrast adjustment, or intensity rescaling.
2.3.2. Simulated Blurring
The second form of image degradation involved a simulated blurring process designed to reduce image sharpness and mimic noise-related quality loss. This was carried out using Jupyter Notebook (v7.0) with pydicom, numpy, and scipy.ndimage libraries. Each slice of the dataset was modified using either a Gaussian Blur filter with a value of 4.5 or a median filter with a value of 2.0. For Gaussian filtering, a standard deviation parameter of σ = 4.5 was applied isotropically across the pixel matrix. This parameter was selected based on pilot simulations to induce measurable attenuation of high-frequency edge information while preserving overall anatomical morphology. The median filter (kernel size = 2.0) was chosen to simulate clinically realistic noise-reduction post-processing while avoiding excessive morphological distortion. Pixel matrices were modified directly, and all resulting series were saved with updated DICOM metadata under standard DICOM encoding. No additional preprocessing, contrast enhancement, thresholding, intensity normalization, or grayscale compression was applied before or after blurring procedures.
2.4. Experimental Group
The degradation workflow was applied independently to the identical reference dataset to ensure anatomical constancy across all image degradation conditions.
The experimental conditions were defined as follows:
Group 1 (Original DICOM,
Figure 1): original voxel size (0.2 mm) without modification;
Group 2 (Double Voxel,
Figure 2): isotropic resampling to 0.4 mm;
Group 3 (Triple Voxel,
Figure 3): isotropic resampling to 0.6 mm;
Group 4 (Gaussian Blur,
Figure 4): σ = 4.5 applied to original voxel size;
Group 5 (Gaussian Blur + Triple Voxel,
Figure 5): σ = 4.5 combined with 0.6 mm isotropic voxel resampling.
These group labels were used consistently throughout the manuscript. All resulting datasets were converted into STL files and exported for subsequent geometric analysis.
2.5. STL Conversion and Geometric Processing
The STL models were imported into Autodesk Netfabb (Version 2021; Autodesk Inc., San Rafael, CA, USA). STL generation inherently involves segmentation thresholds, triangulation, and mesh reconstruction. Segmentation was performed using a fixed global threshold value applied uniformly across all groups to avoid condition-specific segmentation bias. Mesh triangulation density was kept constant across all groups, and no mesh decimation or polygon reduction was performed following STL export. Automatic mesh-smoothing functions were deactivated to prevent additional geometric alteration beyond the predefined degradation procedures. These processing steps may introduce geometric approximation effects that are independent of the original DICOM voxel resolution. While identical export parameters were applied across all image degradation conditions to ensure methodological consistency, potential interactions between voxel enlargement, smoothing procedures, and mesh geometry cannot be fully excluded and should be considered when interpreting measurement variability. To ensure standardized comparability across all groups, each dataset was aligned in a unified vestibular orientation focused on the mandibular anterior region. Alignment was performed manually using anatomical landmarks to ensure consistent frontal orientation across datasets.
2.6. Measurement Protocol
Measurements were performed using the integrated digital measurement tool within the software. First, in the original reference dataset (Group 1), three consecutive measurements were taken on tooth 31, which exhibited no periodontal bone loss, followed by three measurements on tooth 41, which showed marked bone resorption. This procedure was repeated identically for all experimental groups.
Dentists (n = 6), each with a minimum of two years of clinical experience, served as examiners. Each examiner conducted all measurements independently following the same standardized protocol to ensure repeatability and inter-examiner comparability. The cemento-enamel junction (CEJ) and the most coronal alveolar bone crest were manually identified by each examiner. No automated landmark detection was used. No permanent landmark markers were stored between measurement sessions to minimize recall bias.
The primary outcome parameter was the linear distance between the cemento-enamel junction and the most coronal level of the alveolar bone, measured in millimeters (mm). All measurements were performed using the digital measurement tool (Netfabb Version 2021; Autodesk Inc., San Rafael, CA, USA), which provides sub-millimeter resolution. Prior to measurement, each STL model was spatially aligned and leveled to ensure a consistent vestibular viewing orientation across all datasets. Measurements were conducted in a standardized frontal view of the mandibular anterior region, and no alternative viewing angles were permitted during data acquisition to minimize variability related to observer-dependent repositioning.
All measurements were performed at a fixed magnification level, which could be adjusted for visualization purposes but remained constant throughout each measurement session. No additional image enhancement, smoothing, or contrast manipulation was allowed during the measurement process. Each examiner performed three consecutive measurements per tooth and image-quality condition, with a minimum temporal separation of several minutes between repeated measurements to reduce recall bias. The arithmetic mean of the three repeated measurements was used for subsequent statistical analysis. Intra-examiner reliability was not calculated separately, as repeated measurements were averaged to reduce random intra-observer variability.
2.7. Statistical Analysis
Descriptive statistics were calculated for all measurement values. Normality of the data distribution was assessed using the Shapiro–Wilk test. As the data showed significant deviations from normal distribution, non-parametric statistical methods were applied. Differences between image-quality groups were evaluated using the Kruskal–Wallis H test. In cases of significant overall group effects, pairwise comparisons were performed using the Mann–Whitney U test with Bonferroni correction to adjust for multiple testing. Inter-examiner reliability was assessed using the intraclass correlation coefficient (ICC), calculated as a two-way random-effects model with absolute agreement (ICC (2,1)). ICC values were interpreted according to established guidelines. All statistical tests were conducted at a significance level of α = 0.05. For significant pairwise comparisons, effect sizes were calculated using r = Z/√N to quantify the magnitude of measurement differences. Effect sizes were interpreted according to conventional thresholds proposed for non-parametric effect size r (small ≈ 0.1, moderate ≈ 0.3, large ≥ 0.5), analogous to Cohen’s criteria for correlation coefficients [
33]. For ICC estimates, 95% confidence intervals were calculated to provide precision estimates of inter-examiner reliability. Given the exploratory proof-of-concept design and the use of a single CBCT dataset, no a priori power calculation was performed. The analysis is therefore intended to identify relative measurement trends and reproducibility patterns rather than to establish definitive clinical thresholds.
The limited sample structure (six examiners and two teeth derived from a single dataset) restricts statistical power; therefore, findings should be interpreted as indicative of relative measurement trends rather than definitive clinical effect magnitudes.
3. Results
It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.
A total of 180 individual measurements were analyzed (6 examiners × 5 image-quality groups × 2 teeth × 3 repetitions). All measurement data showed significant deviations from normality in the Shapiro–Wilk test (
p < 0.05). Therefore, non-parametric statistical methods were applied throughout the analysis. The influence of the five image-quality groups on the measurement values was first assessed using the Kruskal–Wallis H test. In case of significant group effects, pairwise post hoc comparisons were conducted using the Mann–Whitney U test (Wilcoxon rank-sum equivalent) with Bonferroni correction to adjust for multiple testing. Inter-examiner agreement among the six dentists was evaluated using the intraclass correlation coefficient (ICC), calculated as a two-way random-effects model with absolute agreement (ICC (2,1)). For significant pairwise comparisons, effect sizes were calculated using r = Z/√N to quantify the magnitude of measurement differences. Effect sizes were interpreted descriptively (r ≈ 0.1 small, r ≈ 0.3 moderate, r ≥ 0.5 large) [
33]. For ICC estimates, 95% confidence intervals were calculated to provide precision estimates of inter-examiner reliability.
For the periodontally healthy tooth 31, the Kruskal–Wallis test revealed a statistically significant overall group effect (H = 11.99,
p = 0.017), indicating that the different types of artificial image degradation produced measurable changes in diagnostic outcomes. Post hoc analysis revealed significant pairwise differences between group 1 and group 4 (p_adj = 0.0099, r = 0.211) as well as between group 2 and group 4 (
Table 1) (p_adj = 0.0074, r = 0.236). The corresponding effect sizes indicate small-to-moderate magnitude differences. All remaining pairwise comparisons were non-significant after Bonferroni correction (p_adj > 0.05).
Inter-examiner agreement for tooth 31 was generally high across most image groups. ICC values ranged from 0.882 (group 2) to 0.770 (group 4). A marked reduction in reliability was observed in group 3 (TripleVoxel), which demonstrated substantially decreased examiner agreement (ICC = 0.407).
The 95% confidence intervals for ICC values (
Table 2) demonstrate varying degrees of precision across image-quality groups. Notably, the TripleVoxel condition (group 3) showed a wide confidence interval (−0.029 to 0.503), reflecting reduced stability and higher uncertainty in examiner agreement under pronounced spatial resolution reduction.
For the periodontally compromised tooth 41, a significant overall effect of image-quality modification was also detected (H = 14.90,
p = 0.0049). The magnitude of measurement differences was generally greater than in tooth 31, reflecting greater variability in measurement distributions compared with the healthy reference tooth. Post hoc comparisons demonstrated significant differences between group 1 and group 4 (
Table 3) (p_adj = 0.0086, r = 0.061), between group 2 and group 4 (p_adj = 0.0190, r = 0.260), and between group 4 and group 3 (p_adj = 0.0332, r = 0.489). The effect sizes ranged from small (r ≈ 0.06) to moderate-to-large (r ≈ 0.49), indicating that certain degradation conditions were associated with substantial shifts in measurement distributions.
Inter-examiner reliability for tooth 41 also varied across groups. ICC values ranged from 0.916 (group 5) to 0.497 (group 4). Notably, the combined degradation (group 5) yielded the highest agreement, whereas group 4 alone demonstrated the lowest inter-examiner consistency. These findings indicate that different degradation conditions influence measurement variability and examiner agreement to varying degrees.
The 95% confidence intervals for ICC values (
Table 4) further illustrate variability in precision. For example, the GaussianBlur condition (group 4) demonstrated a comparatively wide interval (0.054 to 0.836), indicating increased uncertainty in reproducibility estimates under isolated blurring conditions.
The statistical analyses demonstrate that image degradation conditions are associated with measurable differences in periodontal bone-level measurements and variability across groups. Significant group effects in the Kruskal–Wallis tests confirm that certain degradation conditions influence measurement distributions. Effect size calculations provide additional context regarding the magnitude of these differences, ranging from small to moderate-to-large effects depending on the comparison.
ICC results indicate that image quality conditions also affect measurement reproducibility. While original and moderately degraded datasets maintained relatively high inter-examiner agreement, pronounced voxel enlargement was associated with reduced reliability and wider confidence intervals.
For tooth 31 (
Figure 6), the boxplot shows differences in the distribution of measured distances across the five image-quality conditions. The original dataset (group 1) displays the highest median values (approximately 8.4–8.6 mm) and a comparatively narrow interquartile range (IQR ≈ 7.0–9.0 mm). Group 2 condition shows a lower median (approximately 7.3–7.5 mm) and a wider IQR (≈5.8–8.1 mm). The group 5 condition presents a median of approximately 7.6–7.8 mm with a broad IQR (≈6.3–8.7 mm). Group 4 condition shows a median similar to the group 5 condition (≈7.6–7.8 mm) with a moderately sized IQR (≈7.3–8.1 mm) and one low-value outlier. The group 3 condition exhibits the lowest median values (≈6.5–6.7 mm) and a comparatively narrow IQR (≈6.3–7.5 mm). Across all conditions, differences are observed in median values, spread, and the presence of isolated outliers.
For tooth 41 (
Figure 7), the boxplot demonstrates greater variability across image-quality conditions. The group 5 condition shows the highest median values (approximately 5.8–6.0 mm) and the widest interquartile range (IQR ≈ 4.2–6.4 mm). The group 4 condition presents the lowest median values (≈4.0–4.2 mm) and the narrowest IQR (≈3.8–4.3 mm). The group 3 condition shows median values of approximately 4.9–5.0 mm with a broader IQR (≈4.2–5.3 mm) and several high-value outliers exceeding approximately 7 mm. The group 2 condition exhibits a median of approximately 4.6–4.8 mm with a moderate IQR (≈3.7–4.8 mm). Compared with tooth 31, the distributions for tooth 41 show wider interquartile ranges and a higher frequency of outliers across multiple conditions.
4. Discussion
The present study examined how systematically degraded CBCT datasets influence examiner-derived measurement outcomes of periodontal bone levels rather than diagnostic accuracy, with a specific emphasis on the implications for both clinical decision-making and the development of AI-based diagnostic systems. The results demonstrate that reductions in spatial resolution and the introduction of artificial blur substantially affect measurement outcomes, reproducibility, and examiner agreement. These effects are consistent with a growing body of literature showing that CBCT image quality is a major determinant of measurement reliability for periodontal assessments [
4,
5,
8]. Importantly, the present analysis extends prior findings by quantifying not only statistical significance but also the magnitude of measurement shifts using effect size calculations (r), thereby providing a more nuanced interpretation of how strongly specific degradation conditions influence examiner-derived outcomes. In particular, this study highlights how voxel enlargement and blurring filters impair measurement precision, thereby underscoring the need to understand CBCT acquisition parameters not only in clinical contexts but also when generating training data for artificial intelligence algorithms, including repetitive measurements and inter-observer.
A central observation of this study is the significant influence of voxel size on measurement outcomes, which is consistent with previous research reporting that increasing voxel dimensions reduces spatial resolution and may impair the detectability of fine periodontal structures [
2,
3,
6]. In the present analysis, datasets with doubled or tripled voxel size showed measurable differences in bone-level assessments compared with the reference condition. While statistically significant differences were observed for specific image-quality comparisons, other voxel-related changes were characterized by consistent directional shifts in measured values without reaching statistical significance, indicating systematic tendencies rather than uniform effects. The incorporation of effect size estimates (r) revealed that certain significant comparisons were associated with small-to-moderate effects (e.g., r ≈ 0.21–0.24 for Tooth 31), whereas others demonstrated moderate-to-large magnitude shifts (e.g., r ≈ 0.49 for the comparison between Group 4 and Group 3 in Tooth 41). This differentiation is critical, as statistical significance alone does not necessarily reflect clinical or methodological relevance. These findings are in line with ex vivo and clinical studies demonstrating that voxel size represents an important determinant of CBCT measurement accuracy, although in the present study, no reference standard was available to assess absolute measurement correctness [
28,
30,
34]. Notably, the magnitude and consistency of voxel-related effects were not uniform across all conditions, with more pronounced differences observed for the periodontally compromised tooth (41). This observation corresponds with prior evidence suggesting that sites with irregular defect morphology exhibit increased sensitivity to image-quality variations in CBCT-based periodontal assessment [
1,
35].
The influence of blur—whether Gaussian or median filtering—further underscores the sensitivity of CBCT interpretation to noise and artifact simulation. Gaussian blurring reduced sharpness in a manner that sometimes led to underestimation of bone levels, particularly in tooth 41, while the median filter produced systematic deviations in tooth 31. Effect size analysis demonstrated that while some statistically significant blur-related comparisons exhibited only small magnitude effects (e.g., r ≈ 0.06 in Tooth 41), others reached moderate-to-large ranges (r ≈ 0.49), indicating that the impact of blurring is not uniform but condition-dependent. These observations should be interpreted as relative measurement shifts between image degradation conditions rather than definitive evidence of reduced diagnostic accuracy. These findings align with studies suggesting that post-processing or reconstruction filters can introduce clinically relevant distortions that alter linear measurements [
12,
29]. Furthermore, prior investigations into metal artifact reduction and noise suppression algorithms have similarly demonstrated that image manipulation may paradoxically introduce measurement bias [
24,
25].
The inter-examiner agreement (ICC values) observed in this study provides insight into how image degradation conditions influence measurement reproducibility. While the reference and moderately degraded datasets under image degradation conditions maintained high reliability consistent with published data [
3,
28], severely degraded image groups—particularly those with tripled voxel sizes—yielded substantially lower ICC values. The addition of 95% confidence intervals for ICC estimates allows a more precise evaluation of reliability stability. Wide confidence intervals observed in certain groups (e.g., TripleVoxel in Tooth 31: −0.029 to 0.503) indicate substantial uncertainty in reproducibility estimates under pronounced spatial resolution reduction. Similarly, the GaussianBlur condition in Tooth 41 demonstrated a broad interval (0.054 to 0.836), reflecting variability in examiner agreement under isolated smoothing conditions. These findings are consistent with previous reports indicating that voxel enlargement may have a stronger impact on reproducibility than other degradation conditions [
7,
36]. It is important to note that ICC values reflect inter-examiner agreement and do not provide direct information about measurement correctness or diagnostic validity. High agreement may therefore indicate consistency among examiners without necessarily reflecting the structural fidelity of the underlying image data. Interestingly, the combined degradation (Group 5) maintained high ICC values for Tooth 41. This suggests that certain degradation patterns may lead to more homogeneous examiner responses even when central measurement tendencies differ between groups. Such findings align with prior observations that subjective interpretability does not always correlate directly with quantitative measurement behavior [
37].
Given the exploratory proof-of-concept design of this investigation, the clinical implications of these findings should be interpreted cautiously. Periodontal diagnosis often relies on subtle radiographic indicators, and even small deviations in bone measurements can shift clinical staging or alter treatment planning [
37]. Although several comparisons yielded statistically significant differences, the observed effect sizes ranged from small to moderate in most conditions, suggesting that not all statistically detectable shifts necessarily translate into clinically meaningful diagnostic deviations. The present data suggest that lower-resolution CBCT scans may influence measurement consistency, particularly in anatomically compromised sites; however, no direct assessment of clinical misclassification was performed. Considering the continued expansion of CBCT usage in periodontics, clinicians should be cautious when interpreting scans with low voxel resolution or signs of excessive noise reduction. Prior best-evidence guidelines have similarly warned that CBCT should only be used when intraoral radiographs are insufficient, particularly due to concerns over resolution adequacy [
34,
37].
Beyond clinical diagnostics, this study has significant implications for artificial intelligence. AI-based systems for periodontal bone detection, staging, and volumetric quantification, such as convolutional neural networks (CNNs) and landmark-based models, depend heavily on the quality of their training data [
9,
16,
20]. The present findings demonstrate that controlled image degradation conditions can systematically shift examiner-derived reference measurements, thereby introducing structured variability into potential ground-truth annotations. Effect size quantification further indicates that some degradation conditions may produce only minimal measurement displacement, whereas others generate moderate-to-large distributional shifts that could materially influence AI model calibration. Low-quality or inconsistent CBCT datasets may produce algorithms with poor generalizability, increased false-positive or false-negative rates, or an inability to detect nuanced bone changes. Studies have shown that AI performance is directly correlated with image clarity, voxel consistency, and noise characteristics [
17,
38]. Furthermore, research into enhanced-resolution AI systems confirms that deep learning models can exploit subtle texture cues that become inaccessible in datasets under image degradation conditions [
18]. The present findings, therefore, provide a methodological foundation for future studies investigating how human measurement variability under controlled image degradation conditions may translate into differences in AI training robustness.
Artificial degradation studies such as the present one serve an important role in building robust AI pipelines. By characterizing precisely how degradation affects diagnostic outputs, researchers can simulate realistic low-quality conditions for data augmentation, an established technique for improving model resilience [
39,
40]. However, excessive degradation may distort anatomical morphology beyond clinically realistic scenarios and should therefore be applied cautiously in AI dataset generation. Several previous studies emphasize the necessity of standardizing voxel sizes and acquisition protocols in AI research, to avoid confounding effects that might inflate or suppress algorithmic performance [
41,
42].
This study also aligns with concerns raised in the literature about the consequences of STL conversion and mesh resolution reduction during CBCT post-processing. Prior work has demonstrated that triangulation steps can introduce geometric inaccuracies, especially when voxel resolution is low [
43,
44]. In the present study, identical STL export parameters were applied across all image degradation conditions; nevertheless, interactions between voxel enlargement, smoothing procedures, and mesh reconstruction may represent an additional source of geometric approximation that should be considered when interpreting measurement variability.
Despite its contributions, this study has several limitations. First, it uses a single CBCT dataset, which restricts generalizability; however, this allowed for systematic manipulation under tightly controlled conditions. Similar methodological approaches have been used in foundational CBCT accuracy studies [
31]. Second, although examiner variation was analyzed, automated segmentation tools were not directly evaluated. Future research should compare human and AI performance under identical degradation conditions to identify whether AI systems are more or less sensitive to resolution loss than clinicians. Third, this study examined only voxel enlargement and blur, whereas real-world CBCT scans may suffer from beam hardening, motion artifacts, or metal artifacts, all of which may interact with voxel parameters in unpredictable ways [
25,
27]. Fourth, although STL-based linear measurements were used for consistency, future studies should analyze raw volumetric CBCT data directly to minimize transformation-related errors. No a priori power calculation was performed due to the exploratory design; therefore, the findings should be interpreted as indicative of relative trends rather than definitive clinical thresholds. Furthermore, the limited sample structure (six examiners and two teeth derived from a single CBCT dataset) restricts the statistical power of the analysis. Although repeated measurements increase internal consistency, the findings should be interpreted as indicative of relative trends rather than definitive effect magnitudes. The limited sample size may reduce sensitivity to detect smaller between-group differences, and therefore, non-significant findings should not be interpreted as evidence of equivalence.
Clinically, the findings reinforce the necessity of selecting CBCT acquisition protocols that balance radiation exposure with diagnostic needs. While low-dose or large-voxel protocols can reduce patient exposure, they also risk compromising diagnostic evaluability—especially in early or mild periodontal lesions. This echoes prior recommendations that clinicians must tailor CBCT parameters to the anatomical target and specific diagnostic question [
13,
14,
32].
From an AI perspective, this study underscores the importance of curating high-quality training datasets and documenting acquisition parameters thoroughly. AI research in dentistry increasingly depends on multicenter data collection, making cross-platform voxel standardization critical [
19,
40]. The present results do not directly evaluate AI performance but provide an experimental framework for future investigations linking human measurement variability and algorithmic robustness under standardized image degradation conditions.
In conclusion, the present study demonstrates that image degradation conditions in CBCT are associated with measurable effects on periodontal bone measurement outcomes and examiner reproducibility. The integration of effect size analysis and ICC confidence intervals provides a more comprehensive understanding of both the magnitude and precision of these measurement shifts. These findings should be interpreted within the exploratory scope of the study and do not establish absolute diagnostic validity. Future investigations should expand this work by including diverse patient datasets, incorporating real-world artifact patterns, and directly comparing human and AI performance under controlled image degradation conditions.
5. Conclusions
This study demonstrates that controlled image degradation conditions of CBCT image quality through voxel enlargement and simulated blur are associated with measurable differences in examiner-derived periodontal bone-level measurements under the conditions of this exploratory investigation. Both spatial resolution and noise-related sharpness influence not only the absolute measurement values but also the reproducibility of examiner-based evaluations. The significant group effects and variability patterns observed in the Kruskal–Wallis, post hoc, and ICC analyses indicate relative measurement shifts and changes in inter-examiner agreement, particularly in periodontally compromised sites, without establishing absolute diagnostic accuracy or validity. Effect size analysis further demonstrated that the magnitude of these measurement shifts ranged from small to moderate-to-large, depending on the degradation condition and tooth morphology, indicating that image quality alterations may exert quantitatively variable impacts on examiner-derived outcomes.
Importantly, the findings highlight that image quality is not solely a technical acquisition parameter but represents a determinant of measurement reliability within the experimental framework applied here. Strong image degradation was associated with increased variability and directional measurement shifts, which may influence measurement consistency, especially when assessing subtle or early bone changes. However, the present study did not directly evaluate clinical misclassification or diagnostic correctness, and therefore, clinical implications should be interpreted as hypothesis-generating rather than definitive. Given the proof-of-concept design based on a single standardized CBCT dataset, the findings should be interpreted as indicative of relative measurement behavior under controlled degradation conditions rather than as definitive evidence of clinical performance thresholds.
From an AI perspective, the study provides methodological insights into the role of imaging fidelity in the development and training of diagnostic algorithms. Artificial intelligence systems depend on accurate, high-quality ground-truth datasets, and image degradation, whether inherent or artificially introduced, may influence the consistency of human-derived reference labels used for training purposes. Systematic degradation modeling, as applied in the present investigation, may therefore serve as a structured framework for assessing how imaging variability propagates into annotation variability, which represents a critical factor in AI model calibration and generalizability. The present study does not directly evaluate AI performance but establishes an experimental framework for subsequent validation studies.
The results emphasize that maintaining appropriate CBCT image quality may contribute to more consistent periodontal measurement behavior by human examiners. Future studies should expand on these findings using larger, more diverse datasets, incorporate real-world artifact patterns, and directly compare human and algorithmic performance under identical degradation conditions. Advancing this line of research may support the development of more standardized acquisition protocols and more methodologically robust CBCT-based assessment strategies in both clinical diagnostics and AI-assisted periodontal analysis.