Developments and Performance of Artificial Intelligence Models Designed for Application in Endodontics: A Systematic Review

Technological advancements in health sciences have led to enormous developments in artificial intelligence (AI) models designed for application in health sectors. This article aimed at reporting on the application and performances of AI models that have been designed for application in endodontics. Renowned online databases, primarily PubMed, Scopus, Web of Science, Embase, and Cochrane and secondarily Google Scholar and the Saudi Digital Library, were accessed for articles relevant to the research question that were published from 1 January 2000 to 30 November 2022. In the last 5 years, there has been a significant increase in the number of articles reporting on AI models applied for endodontics. AI models have been developed for determining working length, vertical root fractures, root canal failures, root morphology, and thrust force and torque in canal preparation; detecting pulpal diseases; detecting and diagnosing periapical lesions; predicting postoperative pain, curative effect after treatment, and case difficulty; and segmenting pulp cavities. Most of the included studies (n = 21) were developed using convolutional neural networks. Among the included studies. datasets that were used were mostly cone-beam computed tomography images, followed by periapical radiographs and panoramic radiographs. Thirty-seven original research articles that fulfilled the eligibility criteria were critically assessed in accordance with QUADAS-2 guidelines, which revealed a low risk of bias in the patient selection domain in most of the studies (risk of bias: 90%; applicability: 70%). The certainty of the evidence was assessed using the GRADE approach. These models can be used as supplementary tools in clinical practice in order to expedite the clinical decision-making process and enhance the treatment modality and clinical operation.


Introduction
The specialty of endodontics deals with the diseases and conditions that affect the root canal complex and are developed due to untreated or incompletely treated dental carious lesions [1,2]. Diseases related to the pulp and periapical tissues are most commonly managed by nonsurgical root canal treatment (RCT). The basis of endodontic diagnosis and treatment planning relies on an adequate and accurate understanding of the diseases related to the pulp and periapical tissues. Inaccurate diagnosis may result in unanticipated pain, which may have a negative impact on the therapeutic plan and eventually result in unpleasant experiences among patients [3]. Preoperative assessment of the tooth, before initiating RCT, is a very crucial step in determining the success of the endodontic treatment.
Intraoral periapical radiographs, orthopantomograms, and cone-beam computed tomography (CBCT) imaging are the most frequently adopted radiographic techniques for diagnosing diseases related to pulp and periapical areas [2]. Periapical and panoramic radiographs generate two-dimensional (2D) images of the maxillofacial structures, with lesser exposure than CBCT imaging [4,5]. CBCT imaging is widely used among dentists as it enables more radiological analysis. This technology provides three-dimensional images with more precision [6]. The accuracy in detecting periapical lesions is significantly higher with CBCT imaging in comparison to periapical radiography [7,8]. However, considering its high cost and radiation dose, the use of CBCT imaging is restricted to special clinical circumstances. In such cases, the benefits obtained from the imaging should outweigh any potential risks resulting from radiographic exposure associated with this technology.
The ongoing rapid technological advancements have resulted in enormous development in diagnostic models for medical imaging and diagnosis [9]. Advancements in computerassisted diagnosis have resulted in the development of AI models designed for application in health sectors. AI technology, which is mainly based on mimicking the functioning of the human brain, is a breakthrough in the technological world. Machine learning algorithms were the first AI algorithms developed, the performance of which is dependent on the characteristics and number of datasets used for training. These algorithms are utilized to learn the intrinsic statistical patterns and structures in the data and are later applied for making predictions when applied to unseen data [10]. Deep learning (DL) or convolutional neural networks (CNNs) are developed to mimic the functioning of the human brain; they are designed to solve equations by passing through a series of convolutional filters and are trained on a large number of datasets [11]. These advanced neural networks are applied for processing large and complex images, where they have demonstrated superior achievements in recognizing objects, faces, and activity [12,13]. AI models have been widely applied in medical imaging for systemic diseases such as cardiovascular diseases and respiratory diseases and have displayed exceptional performances that are similar to those of experienced specialists [14][15][16]. Additionally, in dentistry, AI models are designed for diagnosing oral diseases such as dental caries, periodontal diseases, and oral cancer as well as treatment planning for orthognathic surgeries and predicting the treatment outcomes [17][18][19]. These models have demonstrated excellent performances, with a major advantage of this being improved diagnostic efficiency with a reduced image interpretation time [20]. Nagendrababu et al. [21] reported on AI models designed for application in endodontics for performing tasks such as studying root canal anatomy, detecting and diagnosing periapical lesions and root fractures, and determining the working length for planning root canal treatment. The authors concluded that these AI models can aid clinicians with precise diagnosis and treatment planning, ultimately resulting in better treatment outcomes. Umer et al. [22] also reported on AI models designed for application in endodontic diagnosis and treatment planning. The authors concluded that these models demonstrated an accuracy greater than 90% in performing the tasks. However, the authors also stated that the reporting of AI-related research is irregular. Hence this systematic review aimed to report on the application and performances of AI models designed for application in endodontics.

Materials and Methods
Ethical clearance was obtained from King Abdullah International Medical Research Center (Institutional Review Board Approval No. 2439-22, 6 November 2002) before the literature search process was initiated for this systematic review.
The updated Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines were considered for preparing this systematic review [23]. A search of the literature was conducted systematically in various renowned electronic databases, primarily Scopus, Web of Science, Embase, PubMed, and Cochrane and secondarily the Saudi Digital Library and Google Scholar, for studies relevant to the research topic that were published from 1 January 2000 to 30 November 2022.

Search Strategy
The article search was performed based on the research question, which was developed in accordance with the PICO elements (P: problem/patient/population, I: intervention/indicator, C: comparison, and O: outcome).
Research Question: What are the developments, applications, and performances of AI models in endodontics?
Intervention: Artificial intelligence applications that were designed for the detection, diagnosis, and prediction of endodontic lesions.
Medical Subject Headings (MeSH) included artificial intelligence, automatic learning, supervised learning, unsupervised learning, deep learning, machine learning, neural networks, convolutional neural network, computer-assisted diagnosis, endodontic dentistry, root canal treatment, apical lesions, periapical lesions, periapical pathology, periapical diseases, deep caries detection, tooth segmentation, pulp cavity segmentation, root segmentation, root morphology, canal shape, cracked tooth, tooth fractures, root fractures, accuracy, prediction, and diagnosis. Boolean operators such as and/or were also used in the advanced stage of the search for combining these MeSH terms, with predetermined publication time range and language as filters ( Table 1). Table 1. Structured search strategy carried out in electronic databases.

Search/Filters Topic and Terms
"English" Language "artificial intelligence" OR "automatic learning" OR "supervised learning" OR "unsupervised learning" OR "deep learning" OR "machine learning" OR "neural networks" OR "convolutional neural network" OR "computer assisted diagnosis" "endodontic dentistry" OR "root canal treatment" OR "apical lesions" OR "periapical lesions" OR "periapical pathology" OR "periapical diseases" OR "deep caries detection" OR "tooth segmentation" OR "pulp cavity segmentation" OR "root segmentation" OR "root morphology" OR "canal shape" OR "cracked tooth" OR "tooth fractures" OR "root fractures" OR "accuracy" OR "prediction" OR "diagnosis" OR "expert systems" OR " fuzzy networks" OR " AI networks" OR " AI models" "English" Language "artificial intelligence" AND "automatic learning" AND "supervised learning" AND "unsupervised learning" AND "deep learning" AND "machine learning" AND "neural networks" AND "convolutional neural network" AND "computer assisted diagnosis" "endodontic dentistry" AND "root canal treatment" AND "apical lesions" AND "periapical lesions" AND "periapical pathology" AND "periapical diseases" AND "deep caries detection" AND "tooth segmentation" AND "pulp cavity segmentation" AND "root segmentation" AND "root morphology" AND "canal shape" AND "cracked tooth" AND "tooth fractures" AND "root fractures" AND "accuracy" AND "prediction" AND "diagnosis" AND "expert systems" AND "fuzzy networks" AND "AI networks" AND "AI models" Simultaneously, a manual search for articles was also performed by cross-referencing and screening the bibliography list of the selected articles.

Study Selection
The article selection process was carried out in two phases. In the first stage, the articles that were related to the research question were selected based on their title and abstract. In this phase, two experienced authors (S.B.K. and A.O.J.) simultaneously carried out the search process and 264 articles were selected. After screening, 124 articles were eliminated due to duplication, and the rest of the articles (140 articles) were assessed for meeting the eligibility criteria

Eligibility Criteria
The inclusion criteria were the following: (a) original research articles with a clear statement on AI applications designed for endodontics; (b) articles published between 1 January 2000 and 30 November 2022 in a scholarly peer-reviewed journal; (c) articles with a clear mention of a type of study modality used for developing, training, validating, and testing an AI model; (d) articles with a clear mention of quantifiable outcome measures for assessing the performance of the AI model; (e) AI models applied for determining working length, vertical root fractures, or root morphology or for detecting and diagnosing pulpal diseases, periapical lesions, predicting prognosis, postoperative pain, or case difficulties. The study design was not limited and hence did not affect the articles' inclusion.
The determined exclusion criteria were as follows: (a) non-full-text articles with only abstracts; (b) non-peer-reviewed publications (such as conference papers and unpublished thesis projects); (c) review articles, letters to editors, and commentaries.

Data Extraction
After the preliminary evaluation of the selected papers based on the title and abstract and the elimination of the duplicates, the authors further analyzed the full text of these articles and assessed their eligibility, following which the total number of articles included in this systematic review decreased to 38. Following that, in the second phase, the identifiers of the journal and author details were removed, and the articles were distributed for critical evaluation by two independent authors who did not contribute to the initial search (M.A. and K.A.). The data from these included articles were further extracted and entered into a Microsoft Excel sheet. This data comprised details of the authors; year of publication; objective of the study; type of algorithm used for developing the AI model; data used for training, validating, and testing the model; results; conclusions; and suggestions.
The quality assessment of the articles was conducted utilizing the Quality Assessment and Diagnostic Accuracy Tool (QUADAS-2) guidelines [24]. This tool was developed to assess the quality of studies that have reported on diagnostic tools. The assessment is based on four domains (patient selection, index test, reference standard, and flow and timing), each of which is evaluated for risk of bias and applicability. The inter-rater reliability between the two authors was assessed on a sample of articles, where Cohen's kappa showed 86% agreement. The authors had a disagreement regarding the inclusion of one article since the quantifiable outcome measures of performance were not clearly mentioned. This was further resolved through a third opinion obtained (A.F.), after which the article was excluded. Thirty-seven articles finally underwent qualitative synthesis ( Figure 1).

Results
The qualitative data synthesis was performed on the 37 articles  that fulfilled the inclusion criteria. The research trend shows there has been a gradual increase in the number of research publications that have reported on the application of AI in endodontics.
The data from these included articles were extracted. However, due to the heterogeneity in the data extracted from these articles, performing a meta-analysis was not possible. The heterogeneity was mainly with respect to the different types of data samples applied for assessing the performance of AI models. Hence, in this systematic review, only the descriptive data of the included studies are presented ( Table 2).

Study Characteristics
The study characteristics extracted from the included studies included details of the authors; year of publication; objective of the study; type of algorithm used for developing the AI model; data used for training, validating, and testing the model; results; conclusions; and suggestions.

Outcome Measures
The outcome was measured in terms of task performance efficiency. The outcome measures were reported in terms of accuracy, sensitivity, specificity, receiver operating characteristic curve (ROC), area under the curve (AUC), area under the receiver operating characteristic curve (AUROC), intraclass correlation coefficient (ICC), intersection over union (IOU), precision-recall curve (PRC), statistical significance, F1 scores, volumetric Dice similarity coefficient (vDSC), surface Dice similarity coefficient (sDSC), positive predictive value (PPV), negative predictive value (NPV), mean decreased Gini (MDG) coefficient, mean decreased accuracy (MDA) coefficient, and Dice coefficient.

Risk of Bias Assessment and Applicability Concerns
Assessment of the quality of the included studies through the risk of bias is essential in order to understand and report the selection of the samples, reference standards, and methods applied for validating and testing the models.
A low risk of bias was observed in the patient selection domain in most of the studies (risk of bias: 90%; applicability: 70%). However, cadaver samples (Saghiri et al. [25]), extracted teeth (Saghiri et al. [26], Kositbowornchai et al. [27], Johari et al. [29], Qiao et al. [38]), and bone samples (Guo et al. [46]) had been utilized in six studies. Therefore, the patient selection domain of the applicability arm of the tool for these above-mentioned studies was reported to have a high risk of bias. Index tests were regarded as low risk in both the arms of QUADAS-2 since all the studies had made use of a highly standardized system of AI for training purposes. There was no clear mention of the reference standard for interpreting index test results in four of the included studies, which raised concerns regarding bias related to patient selection, reference standard, flow, and timing of these studies in both arms. Overall, there was a low risk of bias in both arms, considering all the categories across the included studies. Details about the risk of bias assessment using QUADAS-2 are mentioned in the Supplementary Materials (Table S1) and Figure 2.         [51,57,58] Not Present Not Present Not Present Not Present Not Pre ation of AI for predicting postoperative pain [42] Not Present Not Present Not Present Present Not Pre lication of AI for predicting case difficulty [34] Not Present Not Present Not Present Not present Not Pre for determining thrust force and torque in canal preparation [46] Not Present Not Present Not Present Present Not Pre lication of AI for segmenting pulp cavities [47] Not Present Not Present Not Present Not Present Not Pre tion of AI in curative effect after treatment [53,59] Not Present Not Present Not Present Not Present Not Pre ⨁⨁⨁⨁ = high evidence; ⨁⨁⨁◯ = moderate evidence.

Assessment of Strength of Evidence
The certainty of the selected studies in the systematic review was assessed using the Grading of Recommendations Assessment Development and Evaluation (GRADE) approach [62]. Risk of bias, inconsistency, indirectness, imprecision, and publication bias are major domains under which the certainty of the evidence is rated and categorized as very low, low, moderate, or high. Overall, the studies included in this systematic review showed moderate evidence (Table 3).

Discussion
Technological advancements in health sciences have led to enormous developments in the AI models that have been designed for application in health sectors. In recent developments, CNN-based AI models have demonstrated excellent efficiency in diagnosing diseases in comparison with experienced specialists [63,64].
AI has been applied in endodontics for detecting pulpal diseases. Tumbelaka et al. [28] published details of an AI model for identifying pulpitis. This model was very efficient in precisely diagnosing reversible and irreversible pulpitis. However, the authors suggested using digital radiographs in order to achieve better validation. Zheng et al. [44] investigated a DL model designed for detecting deep caries and pulpitis, and the model demonstrated excellent performance. The ResNet18 model displayed outstanding performance when compared with reference models and experienced clinicians. However, this study focused only on teeth with single carious lesions and not on multiple carious lesions. Hence, further clinical validation is required before application in clinical practice.
Untreated dental caries progresses into periapical diseases, which are a result of the inflammatory lesions affecting the pulpal and periapical tissues, 90% of which are classified as apical granulomas, apical cysts, or abscesses [65]. The prevalence of apical periodontitis ranges between 34 and 61%, followed by periapical cysts and granulomas which range from 6 to 55% and from 46 to 94%, respectively [66][67][68]. Periapical pathosis can be detected radiographically as periapical radiolucencies, which are also termed apical lesions. Detecting apical lesions using radiographs is a daily task of clinicians; however, regardless of their discriminatory ability, radiographic examinations are influenced by interand intra-examiner reliability [69,70]. Ekert et al. [31] described the application of an AI model for detecting apical lesions; this model displayed satisfactory ability in detecting apical lesions, with an AUC of 0.85 and a sensitivity of 0.65. However, the sensitivity of the model was limited and needs to be improved by using a larger number of datasets to avoid the under-detection of the lesions before the model can be applied in clinics. Setzer et al. [35] described an AI model designed for segmenting CBCT images and detecting periapical lesions. The model displayed excellent accuracy and specificity. However, the limitation of this study was the comparison of the performance of the CNN model with clinicians' segmentation, which can be subject to human error. Another limitation was the lower Dice index ratios for segmentation of the label lesions, which need to be addressed by increasing the training size. Orhan et al. [36] described an AI model designed for detecting periapical pathosis; the model displayed outstanding reliability in correctly detecting periapical lesions, which was equivalent to the performance of human experts. However, the presence of endo-perio lesions and periodontal defects can alter the performance of the model. In addition, anatomical structures such as the mental foramen and nasal fossa need segmentation, which can impact the analysis of the models' measurements, and therefore, further programming will be required to address these issues. Endres et al. [37] reported the performance of an AI model designed for detecting periapical disease which displayed an acceptable precision and F1 score. The model achieved a better performance than experienced specialists. However, the model was trained using datasets labeled by the surgeons, which can be subject to human bias and be reflected in a degradation of the model performance. Another limitation was with the data used for training and evaluating the model, which were from a single center. Hence, further tests may be required with data from multiple centers to demonstrate generalizability. Li et al. [38]. studied the performance of a DL model designed for detecting apical lesions. The model demonstrated an excellent diagnostic accuracy of 92.5%. This model displayed a performance superior to that of a previous model [36]. However, the limitation of this model was with datasets that were obtained from a single hospital. Again, in order to demonstrate the generalizability of these results, further research is required with data from multiple sources [40].
Pauwels et al. [44] described the performance of a DL model designed for detecting periapical lesions. The results of this study were very promising, with a mean sensitivity of 0.87, specificity of 0.98, and ROC-AUC of 0.93. This model outperformed in comparison with experienced oral radiologists. This model further needs to be trained and validated on large samples/clinical radiographs before implementation in clinical scenarios, since this study used bovine ribs and simulated lesions. Ngoc et al. [49] detailed the performance of an AI model for diagnosing periapical lesions. This model displayed exceptional performance in comparison with endodontists' diagnoses. However, this model was developed with a limited number of datasets using periapical radiographs.
Kirnbauer et al. [50] described the performance of an AI model for automatically detecting periapical lesions; the model displayed a sensitivity of 97.1% and a specificity of 88.0% for lesion detection. Bayrakdar et al. [52] reported on an AI model designed for segmenting apical lesions. This model was efficient in evaluating the periapical pathology and displayed a remarkable performance. However, there were a few limitations with the radiographic data used for this study, as they were obtained from a single piece of equipment and the number of samples used was very limited. Calazans et al. [55] reported on AI models for classifying periapical lesions and compared their performance with that of experienced oral and maxillofacial radiologists. The model displayed an accuracy of 70% and specificity of 92.39%, which were superior to those of the AI model VGG-16 and human experts.
Determining the working length is one of the crucial clinical steps that influence the outcome of root canal treatment. This will reduce the chances of insufficient cleaning of the canal and help in confining the root canal filling material into the canal and not invading the periapical tissues, ultimately resulting in a successful treatment outcome [70]. Saghiri et al. [25] described the performance of an AI-based model for locating the minor apical foramen. This model demonstrated good accuracy in detecting the apical foramen. Saghiri et al. [26] also described the performance of an AI model for determining the working length. The AI model demonstrated 96% accuracy in comparison with experienced endodontists. However, the quality of patient selection in these studies was low since the samples used were extracted teeth and cadavers. Qiao et al. [38] described the performance of an AI model designed for root canal length measurement. The accuracy of the model was exceptional and was better than the accuracy of the dual-frequency impedance ratio method, which demonstrated an accuracy of 85%. However, very limited samples were used, and increasing the sample size in future studies can further enhance the performance.
VRFs are crack types that can be complete or incomplete fractures of the root in the longitudinal plane and can be seen in teeth that are either endodontically treated or untreated [71,72]. These fractures are often unnoticed by clinicians and in most cases are only thought of when significant periapical changes occur, ultimately resulting in a delay in diagnosis and treatment [73]. To increase the diagnostic efficiency of clinicians, AI models have been applied for assisting clinicians in the early diagnosis of tooth cracks and fractures. Kositbowornchai et al. [27] described the performance of an AI model designed for detecting VRFs, and the model displayed an outstanding performance. However, the limitation of this study was with the samples, since they only used single-rooted premolar teeth; thus, these results cannot be generalized unless applied to different tooth types. Johari et al. [29] described the performance of an AI model for determining VRFs; the model displayed exceptional performance. However, in this study, only single-rooted premolar teeth were used. These findings were similar to the findings of the study conducted by Fukuda et al. [32] in which the AI model displayed a precision of 0.93 and an F measure of 0.83. However, the limitation of this study was with the datasets used, which were only from a single center, and only the radiographs with clear VRF lines were included [32]. Hu et al. [60] described the performance of AI models for diagnosing VRFs; the ResNet50 model presented the highest accuracy and sensitivity for diagnosing VRF teeth. Shah et al. [30] described the application of an AI model for automatically detecting cracks in teeth; this model displayed a mean ROC of 0.97 in detecting cracked teeth.
Assessing the shape of the roots and canals of a tooth can be very important in successfully treating a carious tooth. However, the variations in the root canal morphology pose a difficulty in canal preparation, irrigation, and obturation. C-shaped canals are the most difficult variation in the performance of a root canal treatment [74,75]. Hiraiwa et al. [32] described the application of an AI model designed for assessing the root morphology of the mandibular first molar; this model displayed an accuracy of 86.9%. Sherwood et al. [39] also described the performance of a DL model for classifying C-shaped canal anatomy in mandibular second molars. Both Xception U-Net and residual U-Net performed significantly better than the U-Net model. However, the limited sample used in this study and the focus on only C-shaped root canal anatomy were limitations of the study. Jeon et al. [45] reported on a DL model designed for predicting C-shaped canals in mandibular second molars; the model displayed outstanding performance in predicting C-shaped canals. Yang et al. [58] described the performance of a DL model for classifying C-shaped canals in mandibular second molars; the model displayed excellent performance in predicting C-shaped canals in both periapical and panoramic images. However, in this study, the number of samples used was insufficient, and the samples were from a single center.
AI has also been applied in predicting the prognosis of RCT. Herbst et al. [51] reported on an AI model for predicting factors associated with the failure of root canal treatments. This model was efficient in predicting tooth-level factors. Qu et al. [58] described the application of machine learning models for the prognosis prediction of endodontic microsurgery. The gradient boosting machine (GBM) model displayed excellent performance. These findings were similar to the finding of the study conducted by Li et al. [59] in which the model displayed an accuracy of 57.96-90.20%, an AUC of 95.63%, and a sensitivity of 91.39%. These automated models can be of great value to clinicians by assisting them in decision-making, providing quick and accurate results, overcoming the requirement of high-level clinical experience, and avoiding inter-observer variability.
The findings of this systematic review show that the majority of the AI models are designed for automated digital diagnosis and treatment planning. These findings are in accordance with the systematic reviews that have previously reported on various disciplines of dentistry. Mohammad-Rahimi et al. [76] reported on the performance of deep learning models in periodontology and oral implantology, where the authors concluded that the performance of the models is generally high. Albalawi et al. [77] reported on a wide range of AI models applied in orthodontics and concluded that these AI models are reliable and can automatically complete tasks with an enhanced speed and an efficiency equivalent to that of experienced specialists. Junaid et al. [78] reported on the application and performance of AI models designed for cephalometric landmark identification. The authors concluded that these models are of great benefit to orthodontists as they can perform tasks very efficiently. Carrillo-Perez et al. [79] reported on the application of AI models in dentistry; the authors concluded that the AI models display outstanding performance in performing the tasks. Thurzo et al. [80] reported on a wide range of AI models that have been designed for application in dentistry. The authors reported that there has been extraordinary growth in the development of AI models designed for application in dentistry. In the last few years, significant growth has been witnessed in the application of AI in dentistry.
This systematic review might have a few limitations. Even though we performed a comprehensive search for articles that have reported on the application of AI models in endodontics, we might have missed a few. Another limitation could be with the assessment of the risk of bias, which might vary between subjective judgments. However, considering the potential of AI applications in improving the diagnosis and treatment outcomes in endodontics, regulatory bodies should expedite the process of policy-making, approval, and marketing of these products for application in clinical scenarios.

Conclusions
In endodontics, AI models have been applied for determining working length, vertical root fractures, and root morphology; detecting and diagnosing pulpal diseases and periapical lesions; and predicting prognosis, postoperative pain, and case difficulties. Most of the included studies (n = 21) were developed using convolutional neural networks. Among the included studies, datasets that were used were mostly cone-beam computed tomography images, followed by periapical radiographs and panoramic radiographs. QUADAS-2, used to assess the quality of the included articles, revealed a low risk of bias in the patient selection domain in most of the studies (risk of bias: 90%; applicability: 70%). These models can be used as supplementary tools in clinical practice in order to expedite the clinical decision-making process and enhance the treatment modality and clinical operation. However, in most of the studies, the models were developed using a limited number of datasets for training and evaluation. The data samples collected were from a single clinic/center and from a single radiographic instrument. Hence, the results obtained from these studies cannot be generalized due to the lack of heterogeneity in the samples. In order to overcome these limitations, future studies should focus on considering a large number of datasets for training and testing the models. Samples need to be collected from multiple centers and from different radiographic instruments.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/diagnostics13030414/s1, Table S1: Assessment of risk of bias domains and applicability concerns.