Periapical Lesions in Panoramic Radiography and CBCT Imaging—Assessment of AI’s Diagnostic Accuracy

Background/Objectives: Periapical lesions (PLs) are frequently detected in dental radiology. Accurate diagnosis of these lesions is essential for proper treatment planning. Imaging techniques such as orthopantomogram (OPG) and cone-beam CT (CBCT) imaging are used to identify PLs. The aim of this study was to assess the diagnostic accuracy of artificial intelligence (AI) software Diagnocat for PL detection in OPG and CBCT images. Methods: The study included 49 patients, totaling 1223 teeth. Both OPG and CBCT images were analyzed by AI software and by three experienced clinicians. All the images were obtained in one patient cohort, and findings were compared to the consensus of human readers using CBCT. The AI’s diagnostic accuracy was compared to a reference method, calculating sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 score. Results: The AI’s sensitivity for OPG images was 33.33% with an F1 score of 32.73%. For CBCT images, the AI’s sensitivity was 77.78% with an F1 score of 84.00%. The AI’s specificity was over 98% for both OPG and CBCT images. Conclusions: The AI demonstrated high sensitivity and high specificity in detecting PLs in CBCT images but lower sensitivity in OPG images.


Introduction
Periapical lesions (PLs) are among the most common dental pathologies, the prevalence of which is estimated to be 52% at the individual level and 5% at the tooth level [1].PLs are manifestations of apical periodontitis (AP) and are defined as barriers that restrict inflammation related to pathogens and their toxins in root canals [2,3].PLs present as periapical radiolucencies and are usually asymptomatic, often being incidental radiographic findings [4].PLs both impact the survival of affected teeth and have the potential to seriously compromise systemic health [5,6].The most prevalent complication of persistent AP is tooth loss.The presence of these lesion also affects clinical decision making in various medical interventions, from observation to prosthetic treatment [3] and cardiac surgery [7].
Periapical radiolucencies should not be considered the sole manifestation of AP and might also be attributed to non-inflammatory processes.Since inflammatory processes account for 78% of periapical radiolucencies, not all lesions near the tooth root are due to inflammation [8].The differential diagnosis includes both odontogenic and non-odontogenic lesions, encompassing benign and malignant entities.Moreover, trauma can result in changes resembling a PL.Clinical assessment and pulp vitality tests remain the basic methods for assessing teeth and informing treatment decisions [9].
Radiological examination is a fundamental tool in patient management.Dental diagnostics and treatment planning, in addition to monitoring treatment outcomes and complications, rely on both clinical examination and diagnostic imaging [10].The primary diagnostic methods used in dental diagnostics include periapical radiographs (PRs), OPGs, and CBCT images [3].PRs are the most widely utilized modality in dental lesion detection, boasting a high diagnostic accuracy exceeding 90% [3,11].An OPG is a basic imaging modality that often serves as a first-line diagnostic tool for evaluating both the mandible and maxilla using one exposure.The diagnostic accuracy of OPGs in PL detection remains low, with a sensitivity ranging from 28% to 48.8% [12][13][14][15].However, OPG imaging has high specificity and positive predictive value (PPV) for detecting PLs [14], and its diagnostic accuracy relies heavily on the location of the lesion [15].PRs and OPGs are both associated with inherent limitations, such as the superimposition of anatomical structures, geometrical distortion, anatomical noise, and a two-dimensional representation [10].Moreover, the PL must reach 30-50% bone mineral loss to become radiographically visible [16].
Since its introduction in the 2000s, CBCT has proven to be a valuable tool for endodontic assessment, as validated in many studies [10,[17][18][19].CBCT overcomes the limitations of conventional two-dimensional dental imaging by providing accurate insights into the multiplanar details of dental and bony structures with a spatial resolution of less than 100 µm [20].In the diagnosis of PLs, CBCT has been shown to be more effective than periapical radiography (PR) in terms of intra-and interobserver agreement, communication with patients and other practitioners, and outcome assessment [10,21].Furthermore, multiple studies have confirmed that CBCT offers higher diagnostic accuracy in detecting PL, with approximately one-third of lesions being missed by PR [22][23][24].A recent study by Mostafapoor et al. [25] demonstrated a 95% sensitivity and 90% specificity for CBCT in the diagnosis of PL.However, there are some conflicting results showing possible overdiagnosis of PL, confirmed by negative results from a histological analysis [9,26].
The recent boom in the utilization of artificial intelligence (AI) tools in medicine has not bypassed dentistry, finding particular relevance in the field of dentomaxillofacial radiology [27][28][29].The ever-increasing number of radiological examinations [30], coupled with the increasing work burden on practitioners, has spurred the development of tools to facilitate radiological diagnostics.A system developed by Diagnocat Ltd. (San Francisco, CA, USA), which utilizes a convolutional neural network (CNN), aims to provide precise and comprehensive dental diagnostics.The company claims that the system was trained on over 35,000 dental radiographs to ensure its diagnostic performance.One of the systems' distinctive features is the use of a PL detection tool, which aids in prompt diagnosis.Despite promising results [31][32][33], some authors have reported the unacceptably low accuracy of AI in PL assessment via orthopantomograms (OPGs) [34].Therefore, evaluating the diagnostic accuracy of AI for detecting PL via both OPG and CBCT is pertinent.
The aim of the present study was to compare the diagnostic performance of an AIdriven platform for detecting periapical lesions in OPG and CBCT images acquired from the same patients and to compare the program's results with those of an experienced human reader's evaluation.

Patients
The population of this retrospective diagnostic accuracy study initially consisted of 92 consecutive patients referred for OPG and CBCT imaging at a private dental center.All patients were referred to both imaging modalities by orthodontists or dental surgeons between January and September 2023.After the initial diagnostic OPGs were obtained, the selected patients were referred for CBCT scans.The primary clinical indications necessitating CBCT scans were suspicions of PLs in the OPGs or the presence of an impacted tooth.A large FOV (10 × 13 cm) was used in all cases with suspicion of PLs and/or the presence of unerupted teeth on both sides of the dental arch.The main inclusion criterion for this study was the availability of both OPG and CBCT images taken within a 30-day interval to minimize the impact of dental procedures, aging, and other factors on dental status.The exclusion criteria consisted of the presence of a study range not covering periapical regions of all the present teeth, severe motion artifacts, and poor overall image quality.After the initial selection of patients with both OPG and CBCT images, 43 patients were excluded due to the use of a small FOV not covering the periapical regions of all the present teeth.The authors reviewed the CBCT scans and OPGs of 55 patients.Six of these patients were excluded because they did not meet the inclusion criterion due to the presence of motion artifacts and poor image quality.After applying the eligibility criteria, 49 patients were selected for the final study group from the initial group of 92 patients with both OPGs and CBCT images.

Image Acquisition and Post-Processing
All the CBCT and OPG images were acquired using a Hyperion X9 PRO 13 × 16 (MyRay, Imola, Italy) machine.One standard, marked as the "Regular" setting of the apparatus, was used (90 KV, 36 mAs, CTDI/Vol 4.09 mGy, and 13 cm field of view (FOV)) in the CBCT scans.All CBCT images were reconstructed with a slice thickness of 0.3 mm.Patient identifiers were removed to maintain anonymity, and images were coded for blinded analysis.
The reading sessions were performed on a dedicated console, using iRYS Viewer software version 6.2 (MyRay, Italy) software.During the CBCT analysis, the viewer's window width and center were predefined at 1048 and 4096 HU, respectively.All images were evaluated using a RadiForce MX243W monitor (Eizo, Hakusan, Japan), which is certified for medical use.The reading sessions were conducted in a CT reporting room without access to sunlight and with dimmed light to ensure appropriate conditions for the evaluation of radiological examinations.

Multireader Evaluation
The images were independently evaluated by two readers: one orthodontist and one radiologist, both with more than 8 years of experience.The presence or absence of PLs was recorded for each tooth.The readers assessed the presence of PLs based on signs of bone destruction.Each reader assessed each radiograph separately and independently (without knowledge of the AI results or the other reader's evaluation).The readers initially evaluated the OPGs, followed by the CBCT images.The OPG and CBCT evaluation sessions were conducted at least one month apart to avoid potential bias from the OPG reading session.After the evaluation, the readers discussed the results, jointly evaluated images, and reached a consensus, which was considered the reference standard.To assess the reliability of the AI diagnostic reports, they were compared with the consensus reached by the readers.

AI Evaluation
Both sets of images (OPG + CBCT) of each included patient were manually uploaded to the cloud-based, commercially available platform Diagnocat (Diagnocat Ltd., San Francisco, CA, USA).The AI software automatically provided separate reports for both imaging modalities with estimated probabilities of the lesion occurring.The program's threshold for a positive diagnosis was the calculated probability of lesion occurrence higher than 50%.

Statistical Evaluation
The diagnostic performance of the AI program was assessed in comparison to that of the common reference method.The sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 score were calculated.The formulas used in diagnostic performance calculations and further explanations can be found in Hicks et al. [36].The significance level was set to 0.05.All analyses were conducted in R software, version 4.3.2.

Population
The study material consisted of 49 sets of images.The mean age of all participants was 41 years (range 12-70).A total of 37 females with a mean age of 42.11 years (SD 13.39; range, 12-65 years) and 12 males with a mean age of 51.75 years (SD 12.46; range, 31-70 years) were included.
The results of sample size calculations showed that our group was sufficient for diagnostic accuracy calculations.

Diagnostic Accuracy
A summary of the detected PLs and the number of analyzed teeth is available in Table 1.The analyzed AI software presented low (33.33%)sensitivity in the detection of PLs in OPGs compared to the consensus of human readers (CBCT analysis).
The calculated sensitivity of the AI CBCT reports for PL detection was 77.78%.A summary of the diagnostic accuracy parameters is available in Table 2.In total, the analyzed software showed four false negative results and one false positive result in the CBCT assessment.Figure 1 presents one of the false negative cases misdiagnosed by Diagnocat as a widened periodontal ligament (PDL) space but classified as PL by human readers.All of the analyzed modalities showed very high (>95%) specificity.An example of a false negative AI diagnosis in both OPG and CBCT is shown in Figure 4.The analysis readers' OPG assessments showed very interesting findings, with a sensitivity of 66.67% (compared to readers' consensus on CBCT analysis).However, there was a high number of false positives, resulting in a low positive predictive value (PPV) of 31.58%(Table 3).The analysis readers' OPG assessments showed very interesting findings, with a sensitivity of 66.67% (compared to readers' consensus on CBCT analysis).However, there was a high number of false positives, resulting in a low positive predictive value (PPV) of 31.58%(Table 3).The analysis of the AI's OPG assessment results when compared to the readers' consensus on the OPG assessment showed a very low sensitivity of 21.05% paired with a low F1 score of 28.24% (Table 4).However, these results are mainly attributed to the high number of false positives in the readers' OPG analysis.

Discussion
Our study aimed to assess the diagnostic accuracy of an AI-driven platform for detecting PLs in OPGs and CBCT images.The results showed that the AI tool had high specificity and PPV in both the OPG and CBCT analyses.However, the sensitivity of the AI tool varies depending on the modality.According to the OPG analysis, the AI tool had a sensitivity of 33.33% compared to the reference standard, while according to the CBCT analysis, it had a sensitivity of 77.78%.These findings indicate that the AI tool has limitations in accurately detecting PLs in OPGs but has a relatively high diagnostic performance in CBCT evaluation for PL detection.Additionally, our results showed that OPG analysis is not a reliable method for assessing PLs, with high variability in the results of both the AI's and human readers' analyses compared to the reference standard (CBCT assessment).
PR remains the first-choice modality for PL assessment due to its superior resolution, accessibility, and low radiation dose [37].Recent advancements in AI technology have led to significant progress in developing algorithms capable of detecting PLs and caries [38,39].The assessment of periapical radiographs using AI has shown promise in enhancing diagnostic accuracy and improving clinicians' performance in PL detection [40][41][42].However, the performance of AI on OPGs presents additional challenges.Panoramic imaging is inherently prone to image distortion, overlap, and lower resolution compared to PR, which can hinder AI's ability to accurately diagnose PLs.Other limiting factors include low contrast and unclear contours of the teeth [43].Furthermore, the sensitivity of PL detection heavily relies on the size of the lesion.A study by Nardi et al. [12] showed that the sensitivity of OPG for detecting PLs larger than 4.6 mm is 48.5%, but when the lesion is smaller than 4.5 mm, the sensitivity sharply drops to 20.0%.The authors concluded that such small lesions do not represent 30-50% of mineral bone loss, which is considered the threshold in the radiographic detection of PL.These limitations make the accurate detection of PLs in OPGs challenging and have influenced the results of our analysis.
The findings from the analysis of the readers' OPG assessment reveal a moderate level of sensitivity, meaning that the readers were able to correctly identify 66.67% of the positive cases compared to the readers' consensus on CBCT analysis.However, there is a high number of false positives, resulting in a low positive predictive value (PPV) of 31.58%.This suggests that while the readers are good at detecting positive cases, they also incorrectly identify a significant number of cases as positive.On the other hand, the analysis of the AI OPG assessment shows a very low sensitivity of 21.05% compared to OPG readers' consensus, indicating that the AI system is not effective at correctly identifying positive cases compared to the readers' consensus on OPG analysis.However, it is worth noting that these results are largely affected by the high number of false positives found in the readers' OPG analysis.In combination, these results highlight the high variability of OPG-based PL assessments and indicate that OPGs are not the preferable imaging modality in this task.
Similarly, the low sensitivity of OPG for PL detection, as shown in our study, has already been reported in the literature [12][13][14][15].An interesting study by Zadro żny et al. [34] showed very similar results in terms of the use of OPG for PL detection.The study analyzed the performance of the Diagnocat platform and reported a sensitivity of 39.0% on OPGs when paired with a high specificity of 98.1%.The AI tool showed similar results in caries assessment (sensitivity of 44.5%, specificity of 98.2%).The authors concluded that the tested AI tool presented unacceptable diagnostic performance in detecting PLs and caries.Several other studies have assessed the diagnostic accuracy of OPGs/panoramic radiographs in PL assessment, showcasing various CNNs and U-Net architectures developed by the authors [44][45][46][47][48].These studies reported highly variable sensitivity and F1 scores for PL detection.The deep learning (DL) algorithm developed by Endres et al. exhibited a low F1 score [48].Nevertheless, the authors concluded that the trained algorithm outperformed 14 of the 24 oral-maxillofacial surgeons participating in their study, suggesting that the algorithm has the potential to assist surgeons in PL diagnostics.A similar study by Çelik et al. [47] demonstrated the high diagnostic performance of DL in PL detection, with F1 scores ranging between 0.8 and 0.895.Another paper by Bayrakdar et al. [46] presented impressive diagnostic performance rates for a tested U-Net, with sensitivity, precision, and an F1 score for the segmentation of PLs of 0.92, 0.84, and 0.88, respectively.Song's results were also strong; although, the F1 scores were slightly lower (74.2-82.8%)[45].In our view, this indicates significant differences in the performance of AI models tested.The quality of the images analyzed also undoubtedly plays a role.We believe that a highly reliable evaluation of the selected AI tools' capabilities would be possible in studies evaluating publicly available radiological datasets.Diagnocat analyzes a diverse array of data sent from around the world, in various formats and of varying quality.Therefore, it is conceivable that the excellent results achieved by experimental AI algorithms in laboratory settings might not be reproducible when applied to data from sources other than those used for training [49].
CBCT overcomes the limitations of OPG imaging by providing accurate insights into the multiplanar details of the tooth and bony structures.The higher sensitivity of the AI tool in CBCT analyses can be attributed to the superior imaging capabilities of CBCT compared to those of OPG.Our study supports the thesis of high AI performance in CBCT evaluation.The reported sensitivity of PL detection is in line with that of other studies [31,[50][51][52][53].A study by Orhan et al. [31] showed that Diagnocat achieved 92.8% accuracy in PL detection on CBCT images.Similar results were presented in a 2020 study by Brignardello [53], in which 93% sensitivity and no significant differences were detected in the volumetric assessment of PLs.Comparable results of different CNN networks in CBCT PL detection were subsequently achieved by several authors [54][55][56][57].Figure 3 depicts one of the false negative diagnoses made by the AI in a CBCT image.In our opinion, this was most likely due to the high noise level in the image and the presence of metallic artifacts.This indicates a direction for further improvements in future versions of the AI tool.
To date, few meta-analyses and systematic reviews have been conducted on the utilization of AI for PL detection [58][59][60][61][62][63][64].An important study by Silva et al. [60] showed that the pooled diagnostic accuracy of CBCT for PL detection was 88.75% (95% confidence interval = 85.19-92.30).However, this review included only four studies.In a 2023 review by Sadr et al. [62], the pooled sensitivity and specificity of the 28 included studies were 0.925 and 0.852, respectively.The conclusion was that AI exhibits high diagnostic accuracy for PL detection.AI tools have also been employed in more comprehensive assessments.The 2021 study by Ezhov showed [65] that the Diagnocat AI platform significantly improved the diagnostic capabilities of the readers in numerous tasks (e.g., detection of periodontitis, PLs, and caries).In a comparison of the AI-aided and unaided groups, the pooled overall sensitivity calculations were 85.4% and 76.7%, respectively, while the specificity values were 96.7% and 96.2%, respectively.In our opinion, these results are promising, and the growing scientific evidence demonstrating the benefits of AI-aided, CBCT-based dental diagnostics will soon lead to the widespread use of AI tools in dental offices.With the increasing supply of AI tools and their mass utilization, prices are likely to decrease, and their availability will further increase, including for dental practices in developing countries.
However, a very important limitation of CBCT evaluation must be mentioned-although CBCT overcomes the limitations of conventional OPGs, concerns are raised about false positives in the diagnosis of PL.A retrospective study by Pope [9] revealed that 20% of vital teeth showed widening of the PDL on CBCT, potentially leading to overdiagnosis and overtreatment of early AP.However, this study's methodology was questioned because of the use of symptoms as the only reference standard.Biopsies of periapical tissues are ideal for diagnosis but are impractical and unethical.Similar results were found in a study on persistent PLs post-surgery, where histology showed that 42% of the samples had no inflammation but that radiolucency was present on CBCT [26].A correct diagnosis was reached in 63% of patients when radiolucency was evident in both the PR and CBCT images.The importance of PRs' and CBCT's tendency to overestimate the presence of radiolucency was emphasized [26].
The use of CBCT imaging must be carefully considered due to its primary drawback: the excessive radiation exposure.As stated in the joint position statement from the American Association of Endodontists and the American Academy of Oral and Maxillofacial Radiology, CBCT imaging should be reserved for cases where lower-dose conventional radiography or alternative imaging modalities fail to provide adequate diagnostic information [37].Furthermore, it is generally true that a smaller FOV results in a lower radiation dose, while a smaller voxel size leads to higher resolution and reduced scatter and noise in CBCT images [66].Therefore, in adherence to the ALARA (As Low As Reasonably Achievable) principles, CBCT is considered a secondary imaging option after intraoral radiography [37].CBCT may be justified in instances where clinical findings and conventional PRs are inconclusive; however, its routine use in all cases of PL is unwarranted.Given that our study group was highly heterogeneous, the indications for large FOV CBCT imaging varied, including the presence of periapical lesions or impacted teeth on both sides of the dental arches, detection of bony changes, and pre-implant assessments.Under these circumstances, the use of CBCT appears justified.
Our study has the potential to stimulate further research comparing AI's diagnostic accuracy in different radiological examinations within the same patient cohort.Diagnocat offers numerous, diverse modules which also demand further scientific evaluation-such as caries detection, endodontic treatment evaluation, and others.Subsequent studies evaluating the diagnostic parameters of these tools would be very interesting.Considering the continuous development of AI and promising directions for its application, such as in radiomics, we should expect that, in the future, AI may surpass clinicians in diagnostic accuracy.Further studies with larger and more diverse study groups will also allow for a more comprehensive assessment of the diagnostic value of AI.
The limitations of our study should be acknowledged.First, the study population was relatively small, which may limit the generalizability of the findings.Second, the AI tool used in the study was commercially available, and diagnostic performance may vary depending on the specific AI algorithm and training dataset used.Third, the study focused only on PL detection, and other dental pathologies were not considered.Fourth, only one CBCT/OPG apparatus was used.Fifth, due to geographical limitations, the study group was not ethnically diverse and included only white patients.

Figure 1 .
Figure 1.One of the cases misdiagnosed by the AI tool as a widened PDL space but classified as PL by human readers (blue arrows).Coronal plane (A), axial plane (B).

Figures 2 and 3
Figures 2 and 3 present, respectively, a false negative and a false positive AI OPG diagnosis.All of the analyzed modalities showed very high (>95%) specificity.An example of a false negative AI diagnosis in both OPG and CBCT is shown in Figure4.

Figure 1 .
Figure 1.One of the cases misdiagnosed by the AI tool as a widened PDL space but classified as PL by human readers (blue arrows).Coronal plane (A), axial plane (B).

Figures 2 14 Figure 2 .
Figures 2 and 3 present, respectively, a false negative and a false positive AI OPG diagnosis.J. Clin.Med.2024, 13, x FOR PEER REVIEW 6 of 14

Figure 3 .
Figure 3. Sample false positive AI diagnosis of PL in OPG, correctly diagnosed by readers in OPG (A).No signs of periapical radiolucency in CBCT (B).

Figure 4 .
Figure 4. Sample false negative AI diagnosis in both OPG and CBCT evaluations, caused by high image noise from metal artifacts.In the OPG image, the lesion is magnified in the black rectangle (A).CBCT images of PL in color ovals in the axial plane (B) and sagittal plane (C).

Figure 4 .
Figure 4. Sample false negative AI diagnosis in both OPG and CBCT evaluations, caused by high image noise from metal artifacts.In the OPG image, the lesion is magnified in the black rectangle (A).CBCT images of PL in color ovals in the axial plane (B) and sagittal plane (C).

Table 1 .
Summary of the findings of AI software and human readers (total number of teeth: 1223).

Table 2 .
Diagnostic accuracy parameters of the AI program's OPG and CBCT analyses compared to reference standard (readers' consensus on CBCT analysis).

Table 3 .
Diagnostic accuracy parameters of readers' OPG analyses compared to reference standard

Table 3 .
Diagnostic accuracy parameters of readers' OPG analyses compared to reference standard (readers' consensus on CBCT analysis).

Table 4 .
Diagnostic accuracy parameters of AI OPG analysis compared to the results of readers' consensus on OPG analysis.