Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis

Takács, Anna; Tábi, Dalma; Cavalcante, Bianca Golzio Navarro; Szabó, Bence; Wenning, Alexander Schulze; Gerber, Gábor; Hermann, Péter; Varga, Gábor; Hegyi, Péter; Kivovics, Márton

doi:10.3390/jcm15062447

Open AccessSystematic Review

Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis

by

Anna Takács

^1,2

,

Dalma Tábi

^1,2,

Bianca Golzio Navarro Cavalcante

^2,3,

Bence Szabó

²,

Alexander Schulze Wenning

²,

Gábor Gerber

^2,4,

Péter Hermann

^2,5,

Gábor Varga

^2,3

,

Péter Hegyi

^2,6,7 and

Márton Kivovics

^1,2,*

¹

Department of Public Dental Health, Semmelweis University, Szentkirályi u. 40., 1088 Budapest, Hungary

²

Centre for Translational Medicine, Semmelweis University, Baross u. 22., 1085 Budapest, Hungary

³

Department of Oral Biology, Semmelweis University, Nagyvárad tér 4., 1089 Budapest, Hungary

⁴

Department of Anatomy, Histology and Embryology, Semmelweis University, Tűzoltó u. 58., 1094 Budapest, Hungary

⁵

Department of Prosthodontics, Semmelweis University, Szentkirályi u. 47., 1088 Budapest, Hungary

⁶

Institute for Translational Medicine, Szentágothai Research Centre, Medical School, University of Pécs, Szigeti út 24., 7624 Pécs, Hungary

⁷

Division of Pancreatic Diseases, Heart and Vascular Center, Semmelweis University, 1083 Budapest, Hungary

^*

Author to whom correspondence should be addressed.

J. Clin. Med. 2026, 15(6), 2447; https://doi.org/10.3390/jcm15062447

Submission received: 13 February 2026 / Revised: 19 March 2026 / Accepted: 20 March 2026 / Published: 23 March 2026

(This article belongs to the Special Issue Oral Surgery: Recent Advances and Future Perspectives)

Download

Browse Figures

Versions Notes

Abstract

Background/Objectives: Odontogenic cysts and ameloblastomas (AB) are mostly asymptomatic, often discovered later due to severe symptoms, and only histopathological examination provides definitive diagnosis. AI-assisted diagnostics offer a fast, noninvasive, painless diagnostic tool. To our knowledge, this is the first meta-analysis aiming to evaluate the classification, detection, and segmentation performance of artificial intelligence (AI) for odontogenic cysts and ABs as distinct entities and to determine if it can achieve clinically acceptable accuracy. Methods: Our systematic search was conducted on 11 January 2026, in Medline, EMBASE, and Cochrane Central Register of Controlled Trials without restrictions or filters. Studies comparing AI diagnostics with histopathological diagnostics for odontogenic cysts and ABs were included. Diagnostic parameters, including sensitivity, specificity, and accuracy, were extracted and analyzed; additionally, diagnostic odds ratios were calculated. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Recommendations of the GRADE workgroup were followed to determine the certainty of evidence. Results: Thirteen articles were found eligible, of which seven were included in our meta-analysis. The group with the highest sensitivity (Se) was the “no lesion” (N) group (0.9726, 95% CI 0.9284–1; I2 = 46%), followed by the radicular cyst (RC) (mean 0.9054, 95% CI 0.8051–1; I2 = 89%), dentigerous cyst (DC) (mean 0.8788, 95% CI 0.7828–0.9749; I2 = 93%), odontogenic keratocyst (OKC) (0.763, 95% CI 0.6999–0.8262; I2 = 14%) and AB (mean 0.4369, 95% CI 0.231–0.6429; I2 = 79%) groups. Results for AB, RC, and DC were statistically significant. The AB achieved the highest specificity (Sp) (mean 0.9889, 95% CI 0.9736–1; I2 = 0%), followed by RC (mean 0.9724, 95% CI 0.9431–1; I2 = 79%), DC (mean 0.9516, 95% CI 0.9116 0.9917; I2 = 90%), N (mean 0.9226, 95% CI 0.8385–1; I2 = 95%) and OKC (mean 0.8991, 95% CI 0.8683–0.9298; I2 = 8%) groups. DC, N, and RC had statistically significant results. Diagnostic odds ratios (DOR) showed that classification was better than chance for all lesion types. Conclusions: AI demonstrated high specificity, and is therefore effective in identifying healthy individuals. However, its sensitivity in detecting diseased patients remains suboptimal and requires further improvement.

Keywords:

artificial intelligence; convolutional neural networks; cysts; ameloblastoma; diagnosis; meta-analysis; review

1. Introduction

AI algorithms revolutionize modern healthcare. In dentistry, key applications include diagnostic support, personalized treatment planning (such as orthodontics or implantology), and risk analysis for conditions such as oral cancer, caries, and periodontitis [1].

In dental diagnostics and imaging, convolutional neural networks (CNNs) were inspired by the visual cortex of the brain [2]. Their exceptional ability to analyze spatial information has resulted in high diagnostic accuracy across various areas, including oral cancer detection and classification [3], caries diagnostics [4], and periodontology [5].

Odontogenic cysts are common lesions in the oral cavity (13.8% prevalence) [6]. Ameloblastomas (AB), although benign, are locally aggressive and account for 10% of odontogenic tumors, affecting 0.5 per million people per year worldwide [2].

They are mostly asymptomatic, often discovered incidentally or later on due to severe symptoms such as tooth displacement, malocclusion, facial asymmetry, or pathological fractures. The possibility of malignant transformation has also been described in the literature [3].

Despite severe complications, a definitive diagnosis can only be made after an invasive and time-consuming histopathological examination of the lesion removed.

As cysts and ABs are usually visible on panoramic radiographs (OPGs), CNN analysis provides a new diagnostic solution for early detection (the ability to recognize the presence of a particular object), classification (the ability to predefine grouping of the detected object) and segmentation (identification of pixels belonging to the detected structure) of lesions that occur at a stage where treatment is easier and severe complications can be avoided [7].

Meta-analyses support the continued use of AI methods, as the measured classification accuracy, sensitivity (Se), and specificity (Sp) ranged between 8.0 and 0.9 [7,8].

Data on individual cyst types or AB are fundamental for the validation of AI tools in clinical practice. However, these studies focused on a pooled set of cysts and did not assess the classification performance for each lesion type separately.

Research on detection [9] and segmentation [10] is promising, but still limited, with no meta-analysis available to date.

We aim to provide a comprehensive overview of recent advancements in classification, detection, and segmentation performance. In addition, we aim to verify whether artificial intelligence (AI) diagnostics can achieve clinically acceptable accuracy.

2. Materials and Methods

This systematic review and meta-analysis followed the PRISMA 2020 guideline [11] (Supplementary Materials—Table S1) and the Cochrane Handbook [12]. We adhered to the previously registered protocol (registration number CRD42024523372) with minor exceptions: in the meta-analysis part, positive predictive value (PPV) and negative predictive value (NPV) could not be evaluated due to missing data. We could only conduct a systematic review for detection performance. For this assessment, only Se and PPV could be incorporated from the predefined primary outcomes; however, average precision (AP) and F1 score were additionally included. Sample size weighted averages were calculated to reflect the trends reported in each study.

2.1. Eligibility Criteria

Using the PIRD (Population, Index test, Reference test, Diagnosis of interest) framework, we included patients with panoramic radiographs (P). AI diagnostics (I) were compared to histopathological diagnostics (R) to assess their accuracy in diagnosing odontogenic cysts and AB (D). Diagnostic performance was assessed using Se and Sp, the area under the receiver operating characteristic curve (AUC), and positive and negative predictive values.

Inclusion and Exclusion Criteria

Diagnostic studies using panoramic radiographs were included if they reported data for any lesion type separately. Diagnoses had to be based on histopathological examination.

Reviews and studies with different imaging techniques (e.g., CBCT and periapical radiographs) were excluded.

2.2. Information Sources

We conducted a systematic search on 11 January 2026, in Medline, EMBASE, and Cochrane Central Register of Controlled Trials, without any restrictions or filters. In addition, the citation and reference lists of included articles were checked manually. Our search strategy revolved around cyst types, “ameloblastoma” and “artificial intelligence” (Table 1).

2.3. Selection Process

The EndNote X9 (Clarivate Analytics, Philadelphia, PA, USA) reference management software was used. First, duplicates were removed. Subsequently, two independent authors (A.T. and D.T.) screened articles by title and abstract, then by full text. For both steps, interrater reliability was evaluated using Cohen’s kappa coefficients. Disagreements were resolved by a third independent investigator (M.K.). Study authors were contacted when full texts were unavailable.

2.4. Data Collection Process

Two authors (A.T. and D.T.) collected data individually in a standardized form (Excel [Microsoft Corporation, Redmond, WA, USA] data sheet). Disagreements were resolved by a third independent investigator (M.K.). Study authors were contacted in cases of missing or unclear data.

2.5. Data Items

The following data were extracted: first author, year of publication, type of lesion, AI system and its structure, number of panoramic radiographs per group, and outcomes.

The statistical models of AI were classified according to their mathematical structure. All included articles used neural networks and deep learning models belonging to CNNs.

Primary outcomes included Se, Sp, AUC, and positive and negative predictive values. The numbers or ratios of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) results were also collected when available.

Groups were formed according to lesion types available: AB, RC, DC, and OKC. As numerous articles tended to include a “No lesion group”, we also adapted it into our analysis as group N.

2.6. Study Risk of Bias Assessment

For each study, patient selection, index test, reference standard, and the flow and timing were assessed across four domains of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. All domains for risk of bias, as well as the first three for applicability concerns, could be described as “low,” “unclear,” or “high” [13].

Two independent examiners (A.T. and D.T.) conducted the assessment, and disagreements were resolved through a third independent author (M.K.).

2.7. Synthesis Methods

For the classification performance outcome, random-effect meta-analyses were fitted for each lesion type with at least three available studies that included either true positive, true negative, false positive and false negative numbers or a point and interval estimate of specificity and sensitivity. Specificity and sensitivity were estimated with 95% confidence intervals (CI), using a random intercept logistic regression model as recommended by Schwarzer et al. [14] and Stijnen et al. [15]. The maximum likelihood method was used to estimate the measure of heterogeneity variance (τ2). The Clopper–Pearson method [16] was used to estimate CIs of each study.

We plotted the individual and pooled sensitivities and specificities of the included studies, their summary estimates, and the corresponding 95% confidence and prediction regions on forest plots.

PPV and NPV results were calculated from the estimated specificity and sensitivity values at a prevalence rate of 30%. The equations for these calculations are

P P V = \frac{S e n s i t i v i t y \times P r e v a l e n c e}{S e n s i t i v i t y \times P r e v a l e n c e + (1 - S p e c i f i c i t y) \times (1 - P r e v a l e n c e)}

and

N P V = \frac{S p e c i f i c i t y \times (1 - P r e v a l e n c e)}{S p e c i f i c i t y \times (1 - P r e v a l e n c e) + (1 - S e n s i t i v i t y) \times P r e v a l e n c e}

.

The diagnostic odds ratio (DOR) with its 95% CI was also calculated, which is a single indicator that combines sensitivity and specificity of a diagnostic test, thus simplifying the comparison of test performance. It is defined as the ratio of odds of a positive test result for individuals with the disease to the odds of a positive test result in individuals without the disease. The DOR ranges from 0 to infinity, with higher values indicating better performance [17].

Heterogeneity was assessed by calculating I² measure and its confidence interval arising from separate univariate analyses.

For each lesion type, we also estimated a pooled random-effect meta-ROC curve with a 95% confidence region using the non-parametric approach in the nsROC package [18], which implemented the methodology proposed by Martinez-Camblor et al. [19].

Statistical analyses were carried out with R statistical software (version 4.1.2., R-core team, 2023) [20] using the meta [21] and the lme4 [22] packages, based in part on the web-tool of Freeman et al. [23]. Statistical analyses followed the advice of Harrer et al. [24].

For detection performance outcome, the criteria for a meta-analysis were not fulfilled. Instead of a meta-analysis, a summary plot was created to visualize the results reported in the studies. For the PPV, sensitivity, F1-score and AP, a simple, sample size weighted average was plotted for each lesion type to better show the trends reported in each study.

2.8. Certainty of Evidence

The recommendations of the “Grades of Recommendation, Assessment, Development, and Evaluation (GRADE)” workgroup were followed to assess the certainty of evidence [25]. Two reviewers (A.T. and D.T.) conducted the evaluation individually. A third independent investigator (M.K.) resolved disagreements.

3. Results

3.1. Search and Selection

Altogether, the systematic search identified 5664 articles. After duplication removal, 4808 studies were screened for title and abstract, and 33 were found eligible. Finally, 11 articles were included after the full-text selection [9,10,26,27,28,29,30,31,32,33,34]. Twenty articles had to be excluded because of their different study designs [35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54] and two because of missing histopathological diagnoses [55,56].

Citations and references were also searched. Of 499 articles, 21 were searched for retrieval. Seven reports could not be retrieved [57,58,59,60,61,62,63] and 12 had different designs [64,65,66,67,68,69,70,71,72,73,74,75]. Finally, two more studies could be added to our review [76,77] (Figure 1).

Seven articles included confidence intervals (CI) or confusion matrixes with TP, TN, FP, and FN data, allowing only a meta-analysis of classification performance. Detection and segmentation were addressed in the systematic review.

3.2. Basic Characteristics of Included Studies

The baseline characteristics of the included studies can be found in Supplementary Materials—Table S2.

3.3. Classification—Meta-Analysis

3.3.1. Sensitivity (Se)

Two articles provided data for AB (mean 0.4369, 95% CI 0.231–0.6429; I2 = 79%). Four studies were available for DC (mean 0.8788, 95% CI 0.7828–0.9749; I2 = 93%), OKC (0.763, 95% CI 0.6999–0.8262; I2 = 14%), and N groups (0.9726, 95% CI 0.9284–1; I2 = 46%). Five articles were included for RC (mean 0.9054, 95% CI 0.8051–1; I2 = 89%). AB, RC and DC had statistically significant results. Results were clinically significant for each group, except for AB (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6).

3.3.2. Specificity (Sp)

Two articles reported measurements for AB (mean 0.9889, 95% CI 0.9736–1; I2 = 0%), with four articles for DC (mean 0.9516, 95% CI 0.9116–0.9917; I2 = 90%), OKC (mean 0.8991, 95% CI 0.8683–0.9298; I2 = 8%) and N (mean 0.9226, 95% CI 0.8385–1; I2 = 95%), as well as five articles for RC (mean 0.9724, 95% CI 0.9431–1; I2 = 79%). DC, N and RC had statistically significant results. All groups achieved clinically significant values (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6).

3.3.3. Diagnostic Odds Ratio (DOR)

Two articles about AB could be included (mean 63.8918, 95% CI 0.0137–298,254.066; I2 = 32%), with four about DC (mean 441.1951, 95% CI 0.9041–215,309.4163; I2 = 97%), OKC (mean 25.862, 95% CI 9.1302–73.2566; I2 = 46%) and N (mean 1508.0169, 95% CI 1.3792–1,648,837.3325; I2 = 95%), as well as five about RC (mean 1044.5518, 95% CI 11.1264–98,063.2391; I2 = 96%). DC, N and RC had statistically significant results (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6).

3.3.4. Area Under the Curve (AUC)

Receiver operating characteristic (ROC) curves were generated for each group. The AUC was 0.714 for AB, 0.901 for DC, 0.93 for N, 0.823 for OKC and 0.959 for RC (Supplementary Materials—Figure S1).

3.4. Systematic Review

3.4.1. Detection

Positive Predictive Value

Four articles provided information on PPV. Kwon et al. measured the total PPV of 0.78 from DC, RC, OKC, AB, and N images [9]. Yu et al. published 0.6132 for DC, 0.575 for RC, 0.4988 for AB, and 0.5119 for OKC, with a total PPV of 0.5497 [10]. Watanabe et al. measured PPV in two groups: the first dataset had a PPV of 0.886 for RC and 0.933 for the pooled DC, OKC, and nasopalatine duct cyst (NPDC) group, and the overall PPV was 0.898. In the second dataset, the PPV for RC was 0.87, and 1 in the pooled group of DC, OKC, and NPDC, so the total PPV was 0.9 [33]. Rašić et al. measured a PPV of 0.858 for RC [30].

Considering these results, we calculated the sample size weighted averages for each lesion type: 0.4988 for AB, 0.69 for DC, 0.96 for OKC, 0.83 for RC, and 0.7 in total (Supplementary Materials—Figure S2).

Sensitivity

There were seven articles about Se. Kwon et al. measured the total Se of 0.74 from DC, RC, OKC, AB, and N images [9]. Ariji et al. found an Se value of 0.71 for AB, 1 for OKC, 0.88 for DC, and 0.81 for RC [26]. Yu et al. published 0.7236 for DC, 0.6349 for RC, 0.5112 for AB, and 0.6337 for OKC, with a total Se of 0.6259 [10]. Kise et al. measured 0.71 for AB, 0.88 for OKC, 1 for DC, 0.75 for RC, and 0.92 for Stafne cyst, and the overall value was 0.87 [27]. Watanabe et al. measured Se in two groups: in the first dataset, 0.78 for RC and 0.667 for the pooled group of DC, OKC, and NPDC, ending with a total Se of 0.746. In the second dataset, the Se for RC was 0.8 and 0.7 for the pooled group of DC, OKC, and NPDC, so the overall Se value was 0.771 [33]. Kang et al. measured an Se value of 0.893 for AB, 0.814 for OKC, and 0.917 for DC [76]. Rašić et al. measured an Se value of 0.667 for RC [30].

Considering these results, we calculated the sample size weighted averages for each lesion type: 0.82 for AB, 0.88 for DC, 0.83 for OKC, 0.71 for RC, and 0.71 in total (Supplementary Materials—Figure S2).

F1 Score

Two articles assessed the F1 score. Kwon et al. measured the total F1 score of 0.76 from DC, RC, OKC, AB, and N images [9]. Watanabe et al. measured the F1 score in two groups: in the first dataset, 0.83 for RC and 0.778 for the pooled DC, OKC, and NPDC group, ending with a total mean of 0.815. In the second dataset, F1 for RC was 0.833 and 0.824 for the pooled DC, OKC, and NPDC group, so the overall mean value was 0.831 [33].

Considering these results, we calculated the sample size weighted averages for available lesion types: 0.79 for DC, 0.79 for OKC, 0.83 for RC, and 0.78 in total (Supplementary Materials—Figure S2).

Average Precision

Three studies measured AP. Kwon et al. reported an AP of 0.91 for DC, 0.79 for RC, 0.67 for OKC, 0.78 for AB, and a pooled mean of 0.79 with a standard deviation of 0.12 [9]. Yu et al. published 0.7202 for DC, 0.6954 for RC, 0.6543 for AB, and 0.6432 for OKC and a pooled mean of 0.6783 [10]. Ver Berne et al. measured 0.83 AP for RC and 0.74 for periapical granulomas [32].

Considering these results, we calculated the sample size weighted averages for each lesion type: 0.76 for AB, 0.84 for DC, 0.67 for OKC, 0.77 for RC, and a mean of 0.73 altogether (Supplementary Materials—Figure S2).

3.4.2. Segmentation

Only two articles investigated segmentation. The Se of the DL CNN system of Yu et al. was 0.7327 for DC, 0.6751 for RC, 0.5135 for AB, and 0.6422 for OKC, and the average was 0.6409. The Sp was 0.7142 for DC, 0.7253 for RC, and 0.689 for OKC, with an overall mean of 0.7064. Pixel accuracy was 0.7132 for DC, 0.6843 for RC, 0.6725 for AB, and 0.6542 for OKC, and the average was 0.6811. The intersection over union (IoU) was 0.7326 for DC, 0.7234 for RC, 0.6754 for AB, and 0.7023 for OKC, and the mean was 0.7084 [10].

Sivasundaram et al. published data about a modified LeNet CNN. They measured an Se of 0.987 for DC, 0.989 for RC, 0.988 for “odontogenic cyst”, and 0.988 in total. The Sp was 0.987 for DC, 0.989 for RC, 0.988 for “odontogenic cyst”, and 0.988 in total. Pixel accuracy was 0.985 for DC, 0.976 for RC, 0.984 for “odontogenic cyst,” and 0.985 in total. The negative predictive value was 0.976 for DC, 0.978 for RC, 0.971 for “odontogenic cyst”, and 0.9783 in total. The positive predictive value was 0.976 for DC, 0.971 for RC, 0.971 for “odontogenic cyst”, and 0.972 in total. The IoU was 0.973 for DC, 0.981 for RC, 0.976 for “odontogenic cyst,” and 0.976 in total. The dice similarity coefficient (DSC) was 0.987 for DC, 0.976 for RC, 0.989 for odontogenic cyst, and 0.9804 in total [77].

3.5. Risk of Bias Assessment

We used the QUADAS-2 tool for our analysis. Most articles indicated a low risk of bias in the domains “index test,” “reference standard,” and “flow and timing.” In the “patient selection” domain, seven studies showed a low risk of bias, five were classified as unclear, and one had a high risk of bias [77]. Only one high-risk article could be found in both the “index test” [28] and “flow and timing” [76] domains. There were no articles with high-risk concerns in any of the applicability domains. For both the “index test” and “reference standard,” only one study had an unclear risk, while all articles in the “patient selection” domain were assessed as low-risk. The results are presented in Supplementary Materials—Figure S3.

3.6. Publication Bias and Heterogeneity

Publication bias was evaluated through funnel plots. They appeared symmetrical, indicating a low risk of publication bias. However, due to the low number of articles analyzed, these results should be interpreted with caution (Supplementary Materials—Figure S4).

Heterogeneity was high for all lesion types except for OKC. This result can be attributed to the variations of AI models used in the analysis and the differences in training and image augmentation.

3.7. Certainty of Evidence Assessment

Most studies had moderate certainty of evidence. The only exception was AB, which was rated very low due to a lack of a large effect. Risk of bias and “indirectness” were mostly not considered serious, while “inconsistency” and “imprecision” tended to be serious. Other factors that affected this were undetected publication bias and the large magnitude of effect (Supplementary Materials—Figure S5).

4. Discussion

This article aimed to assess the current knowledge on the accuracy of AI diagnostics of cysts and AB in OPGs. Due to the lack of existing literature, a meta-analysis was feasible only for assessing classification performance.

The mean Se and Sp for RC and N were both above 0.9. The Sp of DC was high, and the mean Se 0.8788 was nearing the 0.9 value. In the OKC group, none of the mean values reached 0.9, but were very close to it: the Se was 0.763, and the Sp was 0.8991. The assessment of AB was limited to two articles that reported low Se, but an Sp value exceeding 0.9.

Establishing a minimally acceptable threshold for both Se and Sp may be challenging [78]. Power et al. consider a test to be effective if Se + Sp is at least 1.5, where 2 is perfect, and 1 is useless. This narrative shows that the test was beneficial in all groups except AB [79].

Leeflang et al. noted that the sensitivity and specificity of AI should match or exceed those of other existing methods [78].

The only noninvasive diagnostic alternative is human examination. Yang et al. found that AI diagnostics were at least comparable to expert dentist opinions [34]. In their study, Cardoso et al. examined the classification performance of individuals with varying levels of expertise in interpreting OPGs. They reported a mean Se of 0.6133 and a mean Sp of 0.86 for AB, a mean Se of 0.84 and mean Sp of 0.88 for DC, and a mean Se of 0.6267 and a mean Sp of 0.8 for OKC. Comparing these results to ours, we can see that AI was superior in all groups and all measurements, except for the Se of AB [80].

Some studies did not report confusion matrices or CI. They were, therefore, inappropriate for our statistical analysis. However, their results were consistent with ours.

None of the prior meta-analyses separately evaluated the classification performance for odontogenic cysts or AB.

Although Shrivastava et al. grouped simple bone cysts, Stafne bone cysts, and glandular odontogenic cysts with odontogenic cysts within OPGs, their Se of 0.93 (95% CI 0.77–0.98) and Sp of 0.93 (95% CI 0.83–0.97) were consistent with our results [7].

Fedato Tobias et al. also reported high diagnostic accuracy with the pooled dataset of odontogenic OKC and AB [8]. Although our analysis indicated a lower performance for AB, when pooled with the OKC group, the results were consistent.

DOR exceeded 1 in all cases, indicating that each test performed better than chance. Higher values signify superiority in distinguishing those with and without the given condition. Of the groups, N achieved the most impressive result, followed by RC, DC, and AB, while OKC recorded the lowest DOR.

ROC curves were generated to assess the AUC. According to Šimundić et al., the diagnostic accuracy of a test is considered “excellent” if the AUC is between 0.9 and 1, “very good” if it is between 0.8 and 0.9, and “good” if the value is between 0.7 and 0.8 [81]. In this analysis, DC, RC, and N were rated as “excellent,” OKC as “very good,” and AB as “good.” It is important to note that the number of articles included was low, and their results were clustered in a similar area—the upper left corner—of the plot, which indicates a cautious interpretation.

Detection performance could be illustrated with the help of Se.

For AB, measured values ranged from 0.5112 to 0.893. The calculated sample size weighted mean was 0.82. For OKC detection, data from 0.6337 to 1 were published. The sample size weighted mean was 0.83. In the RC group, the Se was 0.6349–0.81, and the weighted mean was 0.71. The highest range of 0.7236–1 belonged to DC, with a weighted mean of 0.88.

A great scope of intra- and inter-subgroup differences were observed.

In some instances, Se ranged from a practically unsatisfactory 0.6337 to 1, indicating that all lesions were perfectly distinguished as positive, without any FNs. Variations in the amounts of training data, AI systems employed, or study designs serve as plausible explanations for these inconsistencies.

The highest mean values were reported for DC, followed by OKC, AB, and RC. The sample size weighted means for all groups were in the 0.8–0.9 range, which is desirable in the literature.

Two studies measured F1 scores ranging from 0.76 to 0.833. However, direct comparison between these studies is not feasible, as they investigated different pools of lesions rather than individual estimates on both occasions.

Three articles reported AP values. Results for each lesion type were congruent. The overall mean value was 0.73, which implies a need for development.

Our conclusions on detection agree with those of the systematic review by Fedato Tobias et al. [8].

The findings of the two available articles on segmentation present conflicting results. Sivasundaram et al. reported outstanding values for DC, RC, and “odontogenic cyst” groups: Se and Sp were higher than 0.9 [77]. In contrast, Yu et al. indicated that the Se and Sp for DC, RC, AB, and OKC exceeded 0.8. IoU measurements in both studies displayed similar trends [10].

4.1. Strengths and Limitations

Following a previously published protocol, this study has transparently summarized all available evidence about AI diagnostics of odontogenic cysts and Abs, established on a rigorous methodology. Each lesion type was investigated and evaluated separately to comprehensively summarize all available classification, detection, and segmentation performance evidence.

Although the whole spectrum of odontogenic cysts was explored in our meta-analysis, a major limitation was the need for further studies. Variations in AI algorithms and study designs (e.g., the number of training images and augmentation) led to higher heterogeneity in some subgroups. This may represent a potential source of bias in the results and therefore warrants careful interpretation.

Image augmentation featured in numerous articles. As this method showed great diversity, non-augmented data were extracted for greater homogeneity and precision where possible. In cases where this was not feasible, information from the augmented group, which typically encompassed a larger population, was included. Consequently, results from the augmented data may be overrepresented in our pooled analysis, which is a limitation of this study.

Researchers have highlighted the dubious applicability of the QUADAS-2 tool for AI and implemented a modified solution, “QUADAS-AI.” As the tool is under development, we followed the recommendations of Cochrane and utilized QUADAS-2. Consequently, caution should be exercised in drawing conclusions [82].

4.2. Implications for Practice

Translation of scientific achievements into healthcare is pivotal [83,84]. As a first step, healthcare professionals should be familiarized and trained in the use of novel AI tools. CNNs can be used both as a supplementary option for OPGs in routine dental check-ups or as a prediction tool for the classification of detected cysts.

Leeflang et al. stated that a test could be beneficial as a first-line solution even if the Se or Sp was lower. Therefore, until further development of Se, AI tests can be used under supervision to improve everyday diagnostics [78].

4.3. Implications for Research

More robust data is needed for a proper meta-analysis on detection and segmentation performance. The consistently high Sp value across all groups suggests that AI effectively minimizes FP results, reducing unnecessary invasive procedures. However, improvements in sensitivity are necessary to better identify diseased patients.

Future research should also address the diagnostic accuracy of tools integrating multiple modalities (e.g., detection combined with classification), along with a more in-depth examination of ethical considerations.

Furthermore, a more standardized and transparent approach to publication is essential. Providing clear details on AI systems, augmentation techniques, and the number of images used for training, testing, and validation will help accelerate progress in this field.

5. Conclusions

AI effectively identifies healthy individuals due to its high specificity. However, its sensitivity for disease detection remains suboptimal and requires further improvement.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/jcm15062447/s1: Figure S1: ROC curves for classification performance; Figure S2: Detection results; Figure S3: Risk of bias assessment; Figure S4: Funnel plots; Figure S5: GRADE assessment; Table S1: PRISMA Checklist; Table S2: Baseline Characteristics Table.

Author Contributions

A.T.: investigation, data curation, writing—original draft; D.T.: investigation; B.G.N.C.: methodology, project administration, writing—review and editing; B.S.: formal analysis, visualization; A.S.W.: methodology, validation; G.G.: resources, validation; P.H. (Péter Hermann): methodology, validation; G.V.: methodology, supervision; P.H. (Péter Hegyi): methodology, resources, supervision; M.K.: investigation, writing—review and editing, conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in the full-text articles included in the systematic review and meta-analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AB	Ameloblastoma
AI	Artificial intelligence
AUC	Area under the receiver operating characteristic curve
CBCT	Cone beam computed tomography
CI	Confidence interval
CNN	Convolutional neural network
DC	Dentigerous cyst
DOR	Diagnostic odds ratio
FN	False negative
FP	False positive
GRADE	Grades of Recommendation, Assessment, Development, and Evaluation
IoU	Intersection over union
N	No lesion
NPDC	Nasopalatine duct cyst
NPV	Negative predictive value
OKC	Odontogenic keratocyst
OPG	Orthopantomography
PIRD	Population, Index test, Reference test, Diagnosis of interest
PPV	Positive predictive value
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analyses
QUADAS	Quality Assessment of Diagnostic Accuracy Studies
RC	Radicular cyst
ROC	Receiver operating characteristic
Se	Sensitivity
Sp	Specificity
TN	True negative
TP	True positive

References

Dhingra, K. Artificial intelligence in dentistry: Current state and future directions. Bull. R. Coll. Surg. Engl. 2023, 105, 380–383. [Google Scholar] [CrossRef]
Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform. 2017, 18, 851–869. [Google Scholar] [CrossRef]
Warin, K.; Limprasert, W.; Suebnukarn, S.; Jinaporntham, S.; Jantana, P.; Vicharueang, S. AI-based analysis of oral lesions using novel deep convolutional neural networks for early detection of oral cancer. PLoS ONE 2022, 17, e0273508. [Google Scholar] [CrossRef]
Carvalho, B.K.G.; Nolden, E.L.; Wenning, A.S.; Kiss-Dala, S.; Agocs, G.; Roth, I.; Keremi, B.; Geczi, Z.; Hegyi, P.; Kivovics, M. Diagnostic accuracy of artificial intelligence for approximal caries on bitewing radiographs: A systematic review and meta-analysis. J. Dent. 2024, 151, 105388. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Zhou, J.; Zhou, Y.; Chen, Q.; She, Y.; Gao, F.; Xu, Y.; Chen, J.; Gao, X. An Interpretable Computer-Aided Diagnosis Method for Periodontitis from Panoramic Radiographs. Front. Physiol. 2021, 12, 655556. [Google Scholar] [CrossRef] [PubMed]
Salihu, B.; Ahmedi, J.; Ademi Abdyli, R.; Recica, B.; Shkreta, M.; Jerliu, N. Global prevalence of odontogenic cysts: A systematic review. Saudi Dent. J. 2026, 38, 23. [Google Scholar] [CrossRef]
Shrivastava, P.K.; Hasan, S.; Abid, L.; Injety, R.; Shrivastav, A.K.; Sybil, D. Accuracy of machine learning in the diagnosis of odontogenic cysts and tumors: A systematic review and meta-analysis. Oral Radiol. 2024, 40, 342–356. [Google Scholar] [CrossRef]
Fedato Tobias, R.S.; Teodoro, A.B.; Evangelista, K.; Leite, A.F.; Valladares-Neto, J.; de Freitas Silva, B.S.; Yamamoto-Silva, F.P.; Almeida, F.T.; Silva, M.A.G. Diagnostic capability of artificial intelligence tools for detecting and classifying odontogenic cysts and tumors: A systematic review and meta-analysis. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2024, 138, 414–426. [Google Scholar] [CrossRef] [PubMed]
Kwon, O.; Yong, T.H.; Kang, S.R.; Kim, J.E.; Huh, K.H.; Heo, M.S.; Lee, S.S.; Choi, S.C.; Yi, W.J. Automatic diagnosis for cysts and tumors of both jaws on panoramic radiographs using a deep convolution neural network. Dentomaxillofac. Radiol. 2020, 49, 20200185. [Google Scholar] [CrossRef]
Yu, D.; Hu, J.; Feng, Z.; Song, M.; Zhu, H. Deep learning based diagnosis for cysts and tumors of jaw with massive healthy samples. Sci. Rep. 2022, 12, 1855. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, 71. [Google Scholar] [CrossRef]
Higgins, J.P. Cochrane Handbook for Systematic Reviews of Interventions; Version 5.0.1; The Cochrane Collaboration: London, UK, 2008; Available online: https://www.cochrane.org/authors/handbooks-and-manuals/handbook/current (accessed on 1 January 2026).
Whiting, P.F.; Rutjes, A.W.; Westwood, M.E.; Mallett, S.; Deeks, J.J.; Reitsma, J.B.; Leeflang, M.M.; Sterne, J.A.; Bossuyt, P.M.; QUADAS-2 Group. QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Ann. Intern. Med. 2011, 155, 529–536. [Google Scholar] [CrossRef]
Schwarzer, G.; Chemaitelly, H.; Abu-Raddad, L.J.; Rucker, G. Seriously misleading results using inverse of Freeman-Tukey double arcsine transformation in meta-analysis of single proportions. Res. Synth. Methods 2019, 10, 476–483. [Google Scholar] [CrossRef]
Stijnen, T.; Hamza, T.H.; Ozdemir, P. Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data. Stat. Med. 2010, 29, 3046–3067. [Google Scholar] [CrossRef] [PubMed]
Clopper, C.J.; Pearson, E.S. The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika 1934, 26, 404–413. [Google Scholar] [CrossRef]
Glas, A.S.; Lijmer, J.G.; Prins, M.H.; Bonsel, G.J.; Bossuyt, P.M. The diagnostic odds ratio: A single indicator of test performance. J. Clin. Epidemiol. 2003, 56, 1129–1135. [Google Scholar] [CrossRef]
Pérez Fernández, S.; Martínez Camblor, P.; Filzmoser, P.; Corral Blanco, N.O. nsROC: An R package for non-standard ROC curve analysis. R J. 2018, 10, 55–77. [Google Scholar] [CrossRef]
Martinez-Camblor, P. Fully non-parametric receiver operating characteristic curve estimation for random-effects meta-analysis. Stat. Methods Med. Res. 2017, 26, 5–20. [Google Scholar] [CrossRef] [PubMed]
R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020.
Balduzzi, S.; Rücker, G.; Schwarzer, G. How to perform a meta-analysis with R: A practical tutorial. Evid. Based Ment. Health 2019, 22, 153–160. [Google Scholar] [CrossRef]
Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting Linear Mixed-Effects Models Using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
Freeman, S.C.; Kerby, C.R.; Patel, A.; Cooper, N.J.; Quinn, T.; Sutton, A.J. Development of an interactive web-based tool to conduct and interrogate meta-analysis of diagnostic test accuracy studies: MetaDTA. BMC Med. Res. Methodol. 2019, 19, 81. [Google Scholar] [CrossRef] [PubMed]
Harrer, M.; Cuijpers, P.; Furukawa, T.; Ebert, D. Doing Meta-Analysis with R: A Hands-On Guide; Chapman & Hall: London, UK, 2021. [Google Scholar]
Bezerra, C.T.; Grande, A.J.; Galvao, V.K.; Santos, D.; Atallah, A.N.; Silva, V. Assessment of the strength of recommendation and quality of evidence: GRADE checklist. A descriptive study. Sao Paulo Med. J. 2022, 140, 829–836. [Google Scholar] [CrossRef] [PubMed]
Ariji, Y.; Yanashita, Y.; Kutsuna, S.; Muramatsu, C.; Fukuda, M.; Kise, Y.; Nozawa, M.; Kuwada, C.; Fujita, H.; Katsumata, A.; et al. Automatic detection and classification of radiolucent lesions in the mandible on panoramic radiographs using a deep learning object detection technique. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2019, 128, 424–430. [Google Scholar] [CrossRef]
Kise, Y.; Ariji, Y.; Kuwada, C.; Fukuda, M.; Ariji, E. Effect of deep transfer learning with a different kind of lesion on classification performance of pre-trained model: Verification with radiolucent lesions on panoramic radiographs. Imaging Sci. Dent. 2023, 53, 27–34. [Google Scholar] [CrossRef]
Lee, H.S.; Yang, S.; Han, J.Y.; Kang, J.H.; Kim, J.E.; Huh, K.H.; Yi, W.J.; Heo, M.S.; Lee, S.S. Automatic detection and classification of nasopalatine duct cyst and periapical cyst on panoramic radiographs using deep convolutional neural networks. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2023, 138, 184–195. [Google Scholar] [CrossRef]
Lee, J.H.; Kim, D.H.; Jeong, S.N. Diagnosis of cystic lesions using panoramic and cone beam computed tomographic images based on deep learning neural network. Oral Dis. 2020, 26, 152–158. [Google Scholar] [CrossRef]
Rašić, M.; Tropčić, M.; Pupić-Bakrač, J.; Subašić, M.; Čvrljević, I.; Dediol, E. Utilizing Deep Learning for Diagnosing Radicular Cysts. Diagnostics 2024, 14, 1443. [Google Scholar] [CrossRef]
Sim, S.Y.; Hwang, J.; Ryu, J.; Kim, H.; Kim, E.J.; Lee, J.Y. Differential Diagnosis of OKC and SBC on Panoramic Radiographs: Leveraging Deep Learning Algorithms. Diagnostics 2024, 14, 1144. [Google Scholar] [CrossRef]
Ver Berne, J.; Saadi, S.B.; Politis, C.; Jacobs, R. A deep learning approach for radiological detection and classification of radicular cysts and periapical granulomas. J. Dent. 2023, 135, 104581. [Google Scholar] [CrossRef]
Watanabe, H.; Ariji, Y.; Fukuda, M.; Kuwada, C.; Kise, Y.; Nozawa, M.; Sugita, Y.; Ariji, E. Deep learning object detection of maxillary cyst-like lesions on panoramic radiographs: Preliminary study. Oral Radiol. 2021, 37, 487–493. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Jo, E.; Kim, H.J.; Cha, I.H.; Jung, Y.S.; Nam, W.; Kim, J.Y.; Kim, J.K.; Kim, Y.H.; Oh, T.G.; et al. Deep learning for automated detection of cyst and tumors of the jaw in panoramic radiographs. J. Clin. Med. 2020, 9, 1839. [Google Scholar] [CrossRef]
Committeri, U.; Barone, S.; Arena, A.; Fusco, R.; Troise, S.; Maffia, F.; Tramontano, S.; Bonavolontà, P.; Abbate, V.; Granata, V.; et al. New perspectives in the differential diagnosis of jaw lesions: Machine learning and inflammatory biomarkers. J. Stomatol. Oral Maxillofac. Surg. 2024, 125, 101912. [Google Scholar] [CrossRef]
Ding, X.; Jiang, X.; Zheng, H.; Shi, H.; Wang, B.; Chan, S. MARes-Net: Multi-scale attention residual network for jaw cyst image segmentation. Front. Bioeng. Biotechnol. 2024, 12, 1454728. [Google Scholar] [CrossRef]
Feher, B.; Kuchler, U.; Schwendicke, F.; Schneider, L.; Cejudo Grano de Oro, J.E.; Xi, T.; Vinayahalingam, S.; Hsu, T.H.; Brinz, J.; Chaurasia, A.; et al. Emulating Clinical Diagnostic Reasoning for Jaw Cysts with Machine Learning. Diagnostics 2022, 12, 1968. [Google Scholar] [CrossRef]
Hung, K.F.; Ai, Q.Y.H. Radiomics for the differential diagnosis between ameloblastomas and odontogenic keratocysts on panoramic radiography. Cancer Imaging 2023, 23, 96. [Google Scholar] [CrossRef]
Kise, Y.; Kuwada, C.; Mori, M.; Fukuda, M.; Ariji, Y.; Ariji, E. Deep learning system for distinguishing between nasopalatine duct cysts and radicular cysts arising in the midline region of the anterior maxilla on panoramic radiographs. Imaging Sci. Dent. 2024, 54, 33–41. [Google Scholar] [CrossRef] [PubMed]
Kumar, V.S.; Kumar, P.R.; Yadalam, P.K.; Anegundi, R.V.; Shrivastava, D.; Alfurhud, A.A.; Almaktoom, I.T.; Alftaikhah, S.A.A.; Alsharari, A.H.L.; Srivastava, K.C. Machine learning in the detection of dental cyst, tumor, and abscess lesions. BMC Oral Health 2023, 23, 833. [Google Scholar] [CrossRef] [PubMed]
Kuwana, R.; Ariji, Y.; Fukuda, M.; Kise, Y.; Nozawa, M.; Kuwada, C.; Muramatsu, C.; Katsumata, A.; Fujita, H.; Ariji, E. Performance of deep learning object detection technology in the detection and diagnosis of maxillary sinus lesions on panoramic radiographs. Dentomaxillofac. Radiol. 2021, 50, 20200171. [Google Scholar] [CrossRef] [PubMed]
Lee, A.; Kim, M.S.; Han, S.S.; Park, P.; Lee, C.; Yun, J.P. Deep learning neural networks to differentiate Stafne’s bone cavity from pathological radiolucent lesions of the mandible in heterogeneous panoramic radiography. PLoS ONE 2021, 16, e0254997. [Google Scholar] [CrossRef]
Li, M.; Mu, C.; Zhang, J.; Li, G. Application of Deep Learning in Differential Diagnosis of Ameloblastoma and Odontogenic Keratocyst Based on Panoramic Radiographs. Acta Acad. Med. Sin. 2023, 45, 273–279. [Google Scholar] [CrossRef]
Liang, B.; Qin, H.; Nong, X.; Zhang, X. Classification of Ameloblastoma, Periapical Cyst, and Chronic Suppurative Osteomyelitis with Semi-Supervised Learning: The WaveletFusion-ViT Model Approach. Bioengineering 2024, 11, 571. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Liu, J.; Zhou, Z.; Zhang, Q.; Wu, H.; Zhai, G.; Han, J. Differential diagnosis of ameloblastoma and odontogenic keratocyst by machine learning of panoramic radiographs. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 415–422. [Google Scholar] [CrossRef] [PubMed]
Okazaki, S.; Mine, Y.; Iwamoto, Y.; Urabe, S.; Mitsuhata, C.; Nomura, R.; Kakimoto, N.; Murayama, T. Analysis of the feasibility of using deep learning for multiclass classification of dental anomalies on panoramic radiographs. Dent. Mater. J. 2022, 41, 889–895. [Google Scholar] [CrossRef]
Poedjiastoeti, W.; Suebnukarn, S. Application of Convolutional Neural Network in the Diagnosis of Jaw Tumors. Healthc. Inform. Res. 2018, 24, 236–241. [Google Scholar] [CrossRef] [PubMed]
Rašić, M.; Tropčić, M.; Karlović, P.; Gabrić, D.; Subašić, M.; Knežević, P. Detection and Segmentation of Radiolucent Lesions in the Lower Jaw on Panoramic Radiographs Using Deep Neural Networks. Medicina 2023, 59, 2138. [Google Scholar] [CrossRef]
Schneider, T.; Filo, K.; Locher, M.C.; Gander, T.; Metzler, P.; Grätz, K.W.; Kruse, A.L.; Lübbers, H.T. Stafne bone cavities: Systematic algorithm for diagnosis derived from retrospective data over a 5-year period. Br. J. Oral Maxillofac. Surg. 2014, 52, 369–374. [Google Scholar] [CrossRef]
Veena Divya, K.; Jatti, A.; Vidya, M.J.; Joshi, R.; Gade, S. A Novel Approach towards Automatic Contour Identification of Jaw Cysts from Digital Panoramic Radiographs to improvise the Treatment planning. Int. J. Biol. Biomed. Eng. 2022, 16, 1–8. [Google Scholar] [CrossRef]
Yong, T.H.; Lee, S.J.; Woo, S.Y.; Yoo, J.Y.; Choi, M.H.; Kang, S.R.; Yi, W.J. Periodontitis detection and classification in panoramic radiographs using Deep Convolutional Neural Network (DCNN). Int. J. Comput. Assist. Radiol. Surg. 2019, 14, S192–S193. [Google Scholar] [CrossRef]
Zayed, S.O.; Abd-Rabou, R.Y.M.; Abdelhameed, G.M.; Abdelhamid, Y.; Khairy, K.; Abulnoor, B.A.; Ibrahim, S.H.; Khaled, H. The innovation of AI-based software in oral diseases: Clinical-histopathological correlation diagnostic accuracy primary study. BMC Oral Health 2024, 24, 598. [Google Scholar] [CrossRef]
Babkair, H.A.; Rashid, M.E.; Abdelghani, A.; Ibrahim, T.M.; Alam, M.K. Assessing AI-Based Software’s Precision in Identifying Oral Lesions from Radiographs. J. Pharm. Bioallied Sci. 2025, 17, S1255–S1257. [Google Scholar] [CrossRef]
van Nistelrooij, N.; Ghanad, I.; Bigdeli, A.K.; Thiem, D.G.E.; von See, C.; Rendenbach, C.; Maistreli, I.; Xi, T.; Berge, S.; Heiland, M.; et al. Automated detection and classification of osteolytic lesions in panoramic radiographs using CNNs and vision transformers. BMC Oral Health 2025, 25, 950. [Google Scholar] [CrossRef]
Endres, M.G.; Hillen, F.; Salloumis, M.; Sedaghat, A.R.; Niehues, S.M.; Quatela, O.; Hanken, H.; Smeets, R.; Beck-Broichsitter, B.; Rendenbach, C.; et al. Development of a deep learning algorithm for periapical disease detection in dental radiographs. Diagnostics 2020, 10, 430. [Google Scholar] [CrossRef]
Liu, F.; Gao, L.; Wan, J.; Lyu, Z.L.; Huang, Y.Y.; Liu, C.; Han, M. Recognition of Digital Dental X-ray Images Using a Convolutional Neural Network. J. Digit. Imaging 2023, 36, 73–79. [Google Scholar] [CrossRef]
Ariji, Y.; Araki, K.; Fukuda, M.; Nozawa, M.; Kuwada, C.; Kise, Y.; Ariji, E. Effects of the combined use of segmentation or detection models on the deep learning classification performance for cyst-like lesions of the jaws on panoramic radiographs: Preliminary research. Oral Sci. Int. 2023, 21, 198–206. [Google Scholar] [CrossRef]
Ebata, K.; Kise, Y.; Morotomi, T.; Ariji, E. Performance of a deep learning system for simultaneously diagnosing radiolucent and radiopaque lesions in the anterior maxilla on panoramic radiographs. Oral Sci. Int. 2024, 21, 385–392. [Google Scholar] [CrossRef]
Maged, S.; Adel, A.; Tawfik, M.; Badawy, W. Dental Diagnostics—A YOLOv8-Based Framework. In 2024 International Conference on Machine Intelligence and Smart Innovation (ICMISI); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Oliveira, D.; Barreto, J.B.; Mesquita, I.M.; Paula, I.C., Jr.; Chaves, F.N.; Sampieri, M.B.S.; Madeiro, J.P. Analysis of the influence of pre-processing techniques with convolutional neural networks for automatic detection of cysts in wisdom teeth. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 2022, 11, 299–310. [Google Scholar] [CrossRef]
Thomas, J.; Ulagamuthalvi, V. Automatic Detection of Dental Cysts in Panoramic Radiography Images using Preprocessing Techniques and Convolutional Neural Networks. In 2022 Fourth International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT); IEEE: New York, NY, USA, 2022. [Google Scholar] [CrossRef]
Thongsakul, P.; Paing, M.P. Comparison of Deep Learning-based Models for Oral Disease Detection. In 2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Tropčić, M.; Rašić, M.; Subašić, M. YOLOv8 Unleashed on Orthopantomograms: Deep Learning Approach for Mandibular Cyst Diagnosis. In 2024 47th MIPRO ICT and Electronics Convention (MIPRO); IEEE: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Gwak, M.; Yun, J.P.; Lee, J.Y.; Han, S.-S.; Park, P.; Lee, C. Attention-guided jaw bone lesion diagnosis in panoramic radiography using minimal labeling effort. Sci. Rep. 2024, 14, 4981. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Feng, Z.; Mao, Y.; Lei, J.; Yu, D.; Song, M. MICCAI (7)—A Location Constrained Dual-Branch Network for Reliable Diagnosis of Jaw Tumors and Cysts; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar]
Kim, P.; Seo, B.; De Silva, H. Concordance of clinician, Chat-GPT4, and ORAD diagnoses against histopathology in Odontogenic Keratocysts and tumours: A 15-Year New Zealand retrospective study. Oral Maxillofac. Surg. 2024, 28, 1557–1569. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Kim, D.; Jeong, H.-G. Detecting 17 fine-grained dental anomalies from panoramic dental radiography using artificial intelligence. Sci. Rep. 2022, 12, 5172. [Google Scholar] [CrossRef]
Ngoc, V.T.; Viet, H.; Anh, L.K.; Minh, D.Q.; Nghia, L.L.; Loan, H.K.; Tuan, T.M.; Ngan, T.T.; Tra, N.T. Periapical Lesion Diagnosis Support System Based on X-ray Images Using Machine Learning Technique. World J. Dent. 2021, 12, 189–193. [Google Scholar] [CrossRef]
Nurtanio, I.; Astuti, E.R.; Purnama, I.K.E.; Hariadi, M.; Purnomo, M.H. Classifying Cyst and Tumor Lesion Using Support Vector Machine Based on Dental Panoramic Images Texture Features. IAENG Int. J. Comput. Sci. 2013, 40, 29–37. [Google Scholar]
Qutieshat, A.; Al Rusheidi, A.; Al Ghammari, S.; Alarabi, A.; Salem, A.; Zelihic, M. Comparative analysis of diagnostic accuracy in endodontic assessments: Dental students vs. artificial intelligence. Diagnosis 2024, 11, 259–265. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.S.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Song, I.-S.; Shin, H.-K.; Kang, J.-H.; Kim, J.-E.; Huh, K.-H.; Yi, W.-J.; Lee, S.-S.; Heo, M.-S. Deep learning-based apical lesion segmentation from panoramic radiographs. Imaging Sci. Dent. 2022, 52, 351. [Google Scholar] [CrossRef]
Tajima, S.; Okamoto, Y.; Kobayashi, T.; Kiwaki, M.; Sonoda, C.; Tomie, K.; Saito, H.; Ishikawa, Y.; Takayoshi, S. Development of an automatic detection model using artificial intelligence for the detection of cyst-like radiolucent lesions of the jaws on panoramic radiographs with small training datasets. J. Oral Maxillofac. Surg. Med. Pathol. 2022, 34, 553–560. [Google Scholar] [CrossRef]
Ünal, S.; Keser, G.; Namdar, P.; Yildızbaş, Z.; Kurt, M. Evaluation of artificial intelligence for detecting periapical lesions on panoramic radiographs. Balk. J. Dent. Med. 2024, 28, 64–70. [Google Scholar] [CrossRef]
Zadrożny, Ł.; Regulski, P.; Brus-Sawczuk, K.; Czajkowska, M.; Parkanyi, L.; Ganz, S.; Mijiritsky, E. Artificial Intelligence Application in Assessment of Panoramic Radiographs. Diagnostics 2022, 12, 224. [Google Scholar] [CrossRef]
Kang, J.; Le, V.N.T.; Lee, D.-W.; Kim, S. Diagnosing oral and maxillofacial diseases using deep learning. Sci. Rep. 2024, 14, 2497. [Google Scholar] [CrossRef] [PubMed]
Sivasundaram, S.; Pandian, C. Performance analysis of classification and segmentation of cysts in panoramic dental images using convolutional neural network architecture. Int. J. Imaging Syst. Technol. 2021, 31, 2214–2225. [Google Scholar] [CrossRef]
Leeflang, M.M. Systematic reviews and meta-analyses of diagnostic test accuracy. Clin. Microbiol. Infect. 2014, 20, 105–113. [Google Scholar] [CrossRef]
Power, M.; Fell, G.; Wright, M. Principles for high-quality, high-value testing. Evid. Based Med. 2013, 18, 5–10. [Google Scholar] [CrossRef] [PubMed]
Cardoso, L.B.; Lopes, I.A.; Ikuta, C.R.S.; Capelozza, A.L.A. Study Between Panoramic Radiography and Cone Beam-Computed Tomography in the Diagnosis of Ameloblastoma, Odontogenic Keratocyst, and Dentigerous Cyst. J. Craniofac. Surg. 2020, 31, 1747–1752. [Google Scholar] [CrossRef] [PubMed]
Simundic, A.M. Measures of Diagnostic Accuracy: Basic Definitions. EJIFCC 2009, 19, 203–211. [Google Scholar] [PubMed]
Sounderajah, V.; Ashrafian, H.; Rose, S.; Shah, N.H.; Ghassemi, M.; Golub, R.; Kahn, C.E., Jr.; Esteva, A.; Karthikesalingam, A.; Mateen, B.; et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat. Med. 2021, 27, 1663–1665. [Google Scholar] [CrossRef]
Hegyi, P.; Eross, B.; Izbeki, F.; Parniczky, A.; Szentesi, A. Accelerating the translational medicine cycle: The Academia Europaea pilot. Nat. Med. 2021, 27, 1317–1319. [Google Scholar] [CrossRef]
Hegyi, P.; Petersen, O.H.; Holgate, S.; Eross, B.; Garami, A.; Szakacs, Z.; Dobszai, D.; Balasko, M.; Kemeny, L.; Peng, S.; et al. Academia Europaea Position Paper on Translational Medicine: The Cycle Model for Translating Scientific Results into Community Benefits. J. Clin. Med. 2020, 9, 1532. [Google Scholar] [CrossRef]

Figure 1. PRISMA flow chart [11].

Figure 2. AB classification sensitivity (a), specificity (b) and diagnostic odds ratio (c) [9,10,26,27,34,76].

Figure 3. DC classification sensitivity (a), specificity (b) and diagnostic odds ratio (c) [9,10,26,27,29,34,76,77].

Figure 4. OKC classification sensitivity (a), specificity (b) and diagnostic odds ratio (c) [9,10,26,27,29,31,34,76].

Figure 5. N classification sensitivity (a), specificity (b) and diagnostic odds ratio (c) [9,10,28,34,76,77].

Figure 6. RC classification sensitivity (a), specificity (b) and diagnostic odds ratio (c) [9,10,26,27,28,29,32,33,77].

Table 1. Search strategy.

Search key for PubMed/MEDLINE and Cochrane Central Register of Controlled Trials:

(autom* OR algorithm OR (artificial AND intelligence) OR ai OR (neural AND network) OR convolutional OR cnn OR (deep AND learning) OR ‘machine learning’ OR (computer AND learning) OR ML OR DL) AND (((cyst* OR cyst) AND (dental OR oral OR odonto* OR follicular OR dentigerous OR eruption OR radicular OR periapical OR periodontal OR gingival OR primordial OR keratocyst)) OR ameloblastoma) AND (radiolog* OR imaging OR OP OR x-ray OR panoramic)

Search key for EMBASE:

(autom* OR ‘algorithm’/exp OR algorithm OR (artificial AND (‘intelligence’/exp OR intelligence)) OR ai OR (neural AND (‘network’/exp OR network)) OR convolutional OR cnn OR (deep AND (‘learning’/exp OR learning)) OR ‘machine learning’/exp OR ‘machine learning’ OR ((‘computer’/exp OR computer) AND (‘learning’/exp OR learning)) OR ml OR dl) AND ((cyst* OR ‘cyst’/exp OR cyst) AND (‘dental’/exp OR dental OR oral OR odonto* OR follicular OR dentigerous OR ‘eruption’/exp OR eruption OR radicular OR periapical OR periodontal OR gingival OR ‘primordial’/exp OR primordial OR ‘keratocyst’/exp OR keratocyst) OR ‘ameloblastoma’/exp OR ameloblastoma) AND (radiolog* OR ‘imaging’/exp OR imaging OR op OR ‘x ray’/exp OR ‘x ray’ OR panoramic)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Takács, A.; Tábi, D.; Cavalcante, B.G.N.; Szabó, B.; Wenning, A.S.; Gerber, G.; Hermann, P.; Varga, G.; Hegyi, P.; Kivovics, M. Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis. J. Clin. Med. 2026, 15, 2447. https://doi.org/10.3390/jcm15062447

AMA Style

Takács A, Tábi D, Cavalcante BGN, Szabó B, Wenning AS, Gerber G, Hermann P, Varga G, Hegyi P, Kivovics M. Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis. Journal of Clinical Medicine. 2026; 15(6):2447. https://doi.org/10.3390/jcm15062447

Chicago/Turabian Style

Takács, Anna, Dalma Tábi, Bianca Golzio Navarro Cavalcante, Bence Szabó, Alexander Schulze Wenning, Gábor Gerber, Péter Hermann, Gábor Varga, Péter Hegyi, and Márton Kivovics. 2026. "Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis" Journal of Clinical Medicine 15, no. 6: 2447. https://doi.org/10.3390/jcm15062447

APA Style

Takács, A., Tábi, D., Cavalcante, B. G. N., Szabó, B., Wenning, A. S., Gerber, G., Hermann, P., Varga, G., Hegyi, P., & Kivovics, M. (2026). Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis. Journal of Clinical Medicine, 15(6), 2447. https://doi.org/10.3390/jcm15062447

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence in the Diagnosis of Odontogenous Cysts and Ameloblastomas—A Systematic Review and Meta-Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Eligibility Criteria

Inclusion and Exclusion Criteria

2.2. Information Sources

2.3. Selection Process

2.4. Data Collection Process

2.5. Data Items

2.6. Study Risk of Bias Assessment

2.7. Synthesis Methods

2.8. Certainty of Evidence

3. Results

3.1. Search and Selection

3.2. Basic Characteristics of Included Studies

3.3. Classification—Meta-Analysis

3.3.1. Sensitivity (Se)

3.3.2. Specificity (Sp)

3.3.3. Diagnostic Odds Ratio (DOR)

3.3.4. Area Under the Curve (AUC)

3.4. Systematic Review

3.4.1. Detection

Positive Predictive Value

Sensitivity

F1 Score

Average Precision

3.4.2. Segmentation

3.5. Risk of Bias Assessment

3.6. Publication Bias and Heterogeneity

3.7. Certainty of Evidence Assessment

4. Discussion

4.1. Strengths and Limitations

4.2. Implications for Practice

4.3. Implications for Research

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI