2. Materials and Methods
The paper was prepared in accordance with The PRISMA 2020 statement guidelines found at
https://www.prisma-statement.org/, accessed on 22 March 2025, (see
Supplementary Materials). This study analysed the most frequently cited articles in the field of dentistry and artificial intelligence, published in the Scopus database between 2020 and 2025. In this review, we have limited ourselves to analysing the Scopus database, due to its popularity and wide access. The citation criterion is one of the filters in this database, widely used in ongoing analyses of the popularity of scientific papers. It should be noted that the popularity criterion is biased and does not indicate the high scientific quality of the work. The Scopus database was selected due to its high-quality indexing. The research was conducted in February 2025. The articles were categorised into three groups. The first category was defined by the keywords dental caries, panoramic radiograph, and artificial intelligence [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10]. The second category included periapical lesions, panoramic radiograph, and artificial intelligence [
1,
6,
9,
11,
12,
13,
14,
15,
16,
17]. The third and final category was based on the keywords: tumours and cysts, panoramic radiograph, and artificial intelligence [
18,
19,
20,
21,
22,
23,
24,
25].
For each of the three categories, the most frequently cited studies were selected, excluding literature reviews, short reports, and book chapters. Articles written in languages other than English were excluded from the analysis. The selected articles are publicly accessible.
In the first category, 15 articles were retrieved, out of which the 10 most frequently cited ones were selected (
Table 1). In the second category, there were 16 articles, and again, the 10 most frequently cited papers were chosen (
Table 2). In the third category, only 8 articles were found, two of which had no citations. Therefore, it was decided to include these two articles in the review as well (
Table 3).
The analysed data have been compiled into tables (
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9,
Table 10,
Table 11 and
Table 12). For each category, three tables have been developed. The first table contains information regarding the research objective, the number of panoramic radiographs used, the distribution by gender, and the location where this study was conducted (
Table 4,
Table 7 and
Table 10). The second table includes details on the type of neural network, test quality, and error rate, as well as the results and conclusions drawn from the presented data (
Table 5,
Table 8 and
Table 11). The third table presents the risk of bias assessment using PROBAST (
Table 6,
Table 9 and
Table 12).
Analysing the available scientific studies on the detection of dental caries in digital panoramic images using artificial intelligence (AI) methods, significant differences can be observed in the number of analysed images, the models used, and the diagnostic effectiveness of these models.
In terms of the number of digital panoramic images used, the study with the smallest dataset was conducted by the team of Zadrożny et al. [
1]. This study included only 30 images, which is a relatively small sample size, especially when compared to other cited works. In contrast, the largest dataset was used by Gardiyanoğlu et al. [
5], who analysed 8138 panoramic images. The number of images used in other cited studies ranged between these values, from 30 to 8138—for instance, Zhu et al. [
2] based their study on 1996 images, while Asci et al. [
10] utilised 6075 panoramic images.
Regarding the most commonly used AI models, different types of models were employed. However, various variants of the U-Net neural network dominated, including its advanced versions such as Nested U-Net, U-Net++, and U-Net3+. Some studies also used other techniques and models, such as PyRadiomics ver. 1 combined with a classical artificial neural network (ANN) [
3], which aimed to perform computer-based extraction of advanced radiomic features. On the other hand, the study by Zadrożny et al. [
1] used Diagnocat, a commercial AI tool for panoramic image analysis. In the study by Zhou et al. [
8], the Swin Transformer model was applied and further optimised for detecting dental caries in children.
Analysing the diagnostic effectiveness of individual models, the best results were achieved by the PyRadiomics ver. 1 + ANN model for detecting radiation-induced caries, which had a very high AUC value of 0.9886, indicating an almost perfect ability to distinguish between healthy and diseased teeth. Similarly, high effectiveness was demonstrated by U-Net3+, achieving 95% accuracy in the study by Alharbi et al. [
7]. The Swin Transformer, tested by Zhou et al. [
8], reached an AUC of 0.9223, also indicating high model performance.
Conversely, the lowest diagnostic parameters were observed for Diagnocat in the study by Zadrożny et al. [
1], which obtained a low intraclass correlation coefficient (ICC = 0.681) for caries detection, meaning its results were not fully reliable. Additionally, Zhu et al. [
2] noted that their AI model had the lowest effectiveness, specifically in caries detection, with an AUC value of only 0.772.
Regarding the division of studied individuals by gender, most analyses did not consider this variable as a factor affecting caries detection effectiveness. An exception was the study by Faria et al. [
3], which included 15 patients, of whom 13 were men and 2 were women, indicating a gender imbalance in the dataset. In other studies, such as Asci et al. [
10], particularly those focused on paediatric diagnostics, different stages of dental development were analysed, but without a clear gender division.
In conclusion, the analysis of the cited studies suggests that the effectiveness of caries detection in panoramic images using AI largely depends on the applied model. The best results were achieved by advanced models such as PyRadiomics + ANN and U-Net3+, whereas simple CNNs and Diagnocat had significantly lower effectiveness. Additionally, the number of analysed images (training cases) varied significantly between studies, which could have influenced the obtained results. Most studies did not analyse differences in caries diagnostics between genders, which may represent a research gap and provide a basis for further analyses and quality assessments of AI models.
Research on the detection of periapical lesions in panoramic radiographs using artificial intelligence (AI) has varied in terms of dataset sizes, applied models, and detection effectiveness. Some of the cited studies also considered the division of patients by gender.
Regarding the number of analysed images, the largest dataset was presented and used in the study by Endres et al. [
11], where a total of 2902 digital panoramic radiographs were examined. This large sample size allowed for a precise determination of the prevalence of periapical lesions and an assessment of associated risk factors. Conversely, the smallest number of digital panoramic images was used in the study by Zadrożny et al. [
1], where only 30 panoramic radiographs were evaluated, which may have limited the representativeness of the results.
Among the analysed studies, the most commonly used model was the U-Net neural network, which proved effective in segmenting periapical lesions. Notably, in the studies by Bayrakdar et al. [
12] and Song et al. [
13], the use of this architecture achieved high detection accuracy. In the study by Bayrakdar et al. [
12], the U-Net model achieved a sensitivity of 0.92, a precision of 0.84, and an F1-score of 0.88, making it one of the best-performing models in this research area. Alternatively, in the studies by Kazimierczak et al. [
14] and Zadrożny et al. [
1], the Diagnocat AI system was used and evaluated for its diagnostic effectiveness compared to radiologists’ assessments.
In terms of model performance, the best results were obtained in the study by Bayrakdar et al. [
12], where U-Net achieved the highest F1-score of 0.88, confirming its high precision in detecting periapical lesions. In contrast, the lowest performance was observed in the AI used in the study by Kazimierczak et al. [
14], where the F1-score for panoramic radiographs was only 32.73%, indicating significant difficulties in the correct classification of lesions.
Some studies also included gender-based analysis. In the study by Herbst et al. [
17], which analysed 1071 digital panoramic radiographs, it was found that men had periapical lesions more frequently (52.5%) compared to women (44.8%). Similar analyses were conducted in the study by Turosz et al. [
9], although no specific differences in lesion prevalence between men and women were provided. Most other studies did not address this aspect.
In summary, among the analysed studies, the highest detection accuracy for periapical lesions was demonstrated by the U-Net model in the study by Bayrakdar et al. [
12], while the lowest results were recorded in the study by Kazimierczak et al. [
14]. The largest dataset was used in the study by Endres et al. [
11], enabling a more precise assessment of lesion prevalence. Some studies, such as the one by Herbst et al. [
17], also considered gender differences, showing a higher frequency of lesions in men. This analysis confirms that AI, particularly deep neural network-based models, has significant potential as a supportive tool in radiological diagnostics in dentistry.
In the analysed articles on the detection of cysts and tumours in digital panoramic radiographs using artificial intelligence, a significant diversity was observed in both the number of images used and the AI models applied. Each study in the reviewed papers approached the problem slightly differently, but certain common patterns can be identified, allowing conclusions to be drawn.
Regarding the number of images, the largest dataset was used in the study by Tajima et al. [
22], where a Deep Convolutional Neural Network (DCNN) model was trained on as many as 7160 digital panoramic radiographs, with an additional 100 images used for testing the system. As a result, this model achieved a very high accuracy of 98.3% and a sensitivity of 94.4%, indicating its high effectiveness in detecting pathological changes. On the other hand, the smallest dataset was found in the study by Okazaki et al. [
19], where only 150 images were used, which could have influenced the model’s training and its parameters. In this case, the accuracy was 70%, and the sensitivity was 70.8%.
The most commonly used model in the studies was YOLO (You Only Look Once), which appeared in three articles. In the study by Yang et al. [
18], YOLO v2 was applied, achieving a precision of 0.707 and a sensitivity of 0.680, which was not the highest result compared to other approaches. However, in the study by Rašić et al. [
21], the newer YOLO v8 model from 2023 was used, yielding significantly better results—the model’s precision reached 95.2%, and the mean Average Precision at IoU = 50% (mAP@50) was 97.5%, one of the highest scores in the analysed studies. This demonstrates how much newer and more advanced algorithm versions can improve diagnostic accuracy.
Apart from YOLO, several studies utilised DCNN and EfficientNet/EfficientDet. DCNN-based models proved to be effective—in the study by Tajima et al. [
22], an accuracy of 98.3% was achieved, while in the study by Ha et al. [
23], the EfficientDet model reached an effectiveness exceeding 99.8%, the highest result among all analysed studies. On the other end of the spectrum, AlexNet was used in the study by Okazaki et al. [
19]—this model had the lowest effectiveness, with its precision and sensitivity not exceeding 70%, suggesting that older CNN architectures may struggle with the analysis of digital panoramic radiographs.
An interesting aspect was the presence of gender division in some studies. In the study by Yang et al. [
18], demographic data were included, indicating that 32% of the training dataset comprised women, while 68% were men. Similarly, in the study by Okazaki et al. [
19], it was reported that 51 patients were women and 99 were men. In the remaining reviewed studies, there was a lack of information on gender distribution, which could be an important factor in assessing the effectiveness of AI models in different patient groups.
The conclusions from this analysis indicate that the best results were achieved by DCNN and EfficientDet models, particularly when trained on large datasets. AlexNet proved to be the least effective, suggesting that older neural network architectures may not be suitable for such tasks. It was observed that larger datasets led to better model performance, and the lack of gender-based analysis in most studies may point to a potential gap in evaluating the impact of demographic characteristics on diagnostic accuracy.
In summary, the studies clearly show that the use of artificial intelligence for detecting cysts and tumours in panoramic radiographs has great potential, but the effectiveness of AI models depends on both the algorithm used and the amount of data utilised for training and testing. The best results are achieved by modern neural network architectures, such as EfficientDet and the latest YOLO versions, which may indicate the direction of future research in this field.
3. Summary
Artificial intelligence (AI) is playing an increasingly significant role in dental diagnostics, particularly in the analysis of digital panoramic radiographs. The use of advanced algorithms enables the detection of various conditions, including caries, periapical lesions, cysts, and tumours. Studies conducted at various universities worldwide have demonstrated AI’s high effectiveness in analysing radiological images.
The most commonly used AI models in research include convolutional neural networks (CNN), U-Net, Swin Transformer, and deep learning models. Results indicate that AI can achieve diagnostic accuracy comparable to or even higher than that of dentists. For instance, the Swin Transformer model achieved an AUC accuracy of 92.23% in caries detection. Although some studies report high diagnostic accuracy of AI models, these results should be interpreted with caution, as they strongly depend on the model architecture, quality and size of the training data, and evaluation methods.
In caries diagnostics, neural networks analysed thousands of images, achieving high sensitivity and precision values. The U-Net3+ model reached 95% accuracy, while the ANN model demonstrated an AUC of 0.9869 in predicting radiotherapy-related caries. These studies highlight the potential of AI in improving diagnostic processes.
The analysis of periapical lesions also demonstrated the high effectiveness of AI models. Research conducted at institutions such as Harvard University, Charité-Universitätsmedizin Berlin, and Seoul National University showed that AI could even outperform less experienced dentists in radiograph assessment. The AI Insights software achieved a classification accuracy of 99% and a detection efficiency of 95% for periapical lesions.
The most frequently used models in this category include U-Net, CNN, and Diagnocat AI software. The Decision Tree model (Shallow ML) achieved an F1-score of 0.90, confirming its high effectiveness in detecting periapical lesions. In some cases, AI outperformed experienced specialists, underscoring its growing importance in dental diagnostics.
Cyst and tumour diagnostics also benefit from AI, with deep learning models demonstrating high effectiveness. Studies were conducted at Yonsei University Dental Hospital, Hiroshima University Hospital, Clinical Hospital Dubrava, and other institutions. The YOLO v8 model achieved a precision of 95.2% and an mAP@50 of 97.5% in detecting lesions in the mandible.
Models such as DCNN, EfficientDet, and EfficientNet proved particularly effective. The DCNN model achieved an accuracy of 98.3% and an F-score of 0.966 in cyst detection. EfficientDet reached 99.8% accuracy in distinguishing pathological cysts from Stafne cysts, while EfficientNetB6 effectively classified mucous cysts (
Table 13).
The study results indicate that AI has enormous potential in radiological diagnostics and can significantly assist doctors in identifying pathological changes in the oral cavity. The speed of analysis and the ability to process large datasets make AI a valuable tool in dentistry. AI can be especially beneficial in areas with limited access to specialists, enabling remote diagnostics and monitoring of patients’ oral health.
Despite AI’s high effectiveness, researchers emphasise that algorithms should be considered as support for doctors rather than a replacement. There is still a need for model optimisation, particularly in detecting more challenging cases such as periapical lesions. Despite the impressive results reported in the literature, several limitations must be acknowledged when considering the practical implementation of artificial intelligence in dental diagnostics. One of the primary challenges is the issue of overfitting—many models demonstrate high accuracy on training datasets, yet their ability to generalise to new, unseen clinical data remains uncertain. This concern is particularly relevant in studies that were conducted on relatively small datasets, which may not adequately represent the full spectrum of clinical variability.
Another critical limitation is the lack of clinical validation for most of the AI models presented. Many studies were performed in controlled laboratory settings, often without direct comparison to the decisions of experienced clinicians in real-world scenarios. The absence of standardised protocols for integrating AI systems into daily clinical workflows significantly hinders their broader adoption.
The “black-box” nature of many deep learning models also poses a substantial challenge. AI-generated diagnostic decisions are frequently not transparent or easily interpretable by clinicians, which may reduce trust in these tools and complicate the identification of system errors. As a result, there is a growing interest in explainable artificial intelligence (XAI), which aims to increase the transparency and interpretability of diagnostic models.
Finally, the ethical and societal implications of increasing reliance on automated diagnostic systems cannot be overlooked. Questions arise regarding accountability for diagnostic errors—whether it lies with software developers, end-users, or implementing institutions. Moreover, there is a concern that over-reliance on AI might reduce critical thinking skills in younger clinicians, who may become overly dependent on algorithmic outputs.
Therefore, further research is needed not only to improve and validate AI technologies but also to assess their impact on clinical decision-making, patient safety, and compliance with ethical and legal standards in healthcare.