Accuracy of Artificial Intelligence Models in Dental Implant Fixture Identification and Classification from Radiographs: A Systematic Review

Background and Objectives: The availability of multiple dental implant systems makes it difficult for the treating dentist to identify and classify the implant in case of inaccessibility or loss of previous records. Artificial intelligence (AI) is reported to have a high success rate in medical image classification and is effectively used in this area. Studies have reported improved implant classification and identification accuracy when AI is used with trained dental professionals. This systematic review aims to analyze various studies discussing the accuracy of AI tools in implant identification and classification. Methods: The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed, and the study was registered with the International Prospective Register of Systematic Reviews (PROSPERO). The focused PICO question for the current study was “What is the accuracy (outcome) of artificial intelligence tools (Intervention) in detecting and/or classifying the type of dental implant (Participant/population) using X-ray images?” Web of Science, Scopus, MEDLINE-PubMed, and Cochrane were searched systematically to collect the relevant published literature. The search strings were based on the formulated PICO question. The article search was conducted in January 2024 using the Boolean operators and truncation. The search was limited to articles published in English in the last 15 years (January 2008 to December 2023). The quality of all the selected articles was critically analyzed using the Quality Assessment and Diagnostic Accuracy Tool (QUADAS-2). Results: Twenty-one articles were selected for qualitative analysis based on predetermined selection criteria. Study characteristics were tabulated in a self-designed table. Out of the 21 studies evaluated, 14 were found to be at risk of bias, with high or unclear risk in one or more domains. The remaining seven studies, however, had a low risk of bias. The overall accuracy of AI models in implant detection and identification ranged from a low of 67% to as high as 98.5%. Most included studies reported mean accuracy levels above 90%. Conclusions: The articles in the present review provide considerable evidence to validate that AI tools have high accuracy in identifying and classifying dental implant systems using 2-dimensional X-ray images. These outcomes are vital for clinical diagnosis and treatment planning by trained dental professionals to enhance patient treatment outcomes.


Introduction
Advancements in science and technology have influenced people's lives in various fields, including dentistry.With the introduction of precise digital machines, dentists can provide high-quality treatment to their patients [1,2].Various studies have shown that these computer-aided machines help dentists in various ways, from the fabrication of prostheses using CAD/CAM [2][3][4][5] to the use of robots in the treatment of patients [6][7][8].The introduction of AI has taken dentistry to the next level.These tools help/act as supplementary aids to guide dentists' diagnosis and treatment planning.Artificial intelligence involves developing and training machines through a set of data so that they are capable of decision making and problem solving, mimicking the human brain [9][10][11].Machine learning (ML), a segment of AI, involves using algorithms to perform tasks without human intervention.Deep learning (DL), e.g., convolutional neural network (CNN), is an element of ML that creates a neural network capable of identifying patterns by itself, which enhances feature identification [11][12][13].
AI functions on two levels.The first level involves training, in which data are used to train and set the parameters.The second level is the testing level, in which AI performs its designated task of problem solving or decision making based on the training data.The training data are generally from the pool of collected data of interest [14][15][16][17].Currently, AI is widely used in dentistry, which involves caries detection [18,19], periapical lesion detection [20], oral cancer diagnosis [21,22], screening of osteoporosis [23], working length determination during endodontic treatment [24,25], determination of root morphology [26,27], forensic odontology [28], pediatric dentistry [29], and implant dentistry for identification [30][31][32], diagnosis, and treatment planning [33,34].Studies have shown that, in general, AI helps dentists in diagnosis and treatment planning, as it provides logical reasons that aid in scientific assessment.
Dental implants are commonly used for replacing missing teeth.Studies have reported a high long-term success rate with a ten-year survival rate above 95% [35][36][37][38].With constantly increasing demands, dental implant manufacturers are developing different implant systems to increase the success rate [39].With the increase in the use of dental implants, an increase in complications has also been reported.These complications may be related to prosthetic or fixture components or may be biological in nature [40][41][42][43].To manage these complications, the treating dentist should know the type of implant system used so that he or she can provide the best possible treatment outcome [44].The data related to the implant system can be retrieved easily from the patient's previous records.However, in case of inaccessibility or loss of previous records due to any reason, it becomes difficult for the dentist to identify and classify the implant system using the available X-rays and clinical observation [45].Dentists with vast experience in implantology may also find this task challenging.AI is reported to have a high success rate in medical image classification and is effectively used in this area.AI has been used to manage the problem of implant system identification and classification [30][31][32][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63].The AI tool is trained using a database of implant images and is later used to identify and classify the implants.Studies have reported improved implant classification and identification accuracy when AI is used with trained dental professionals [51,53,60,62].This systematic review aims to analyze various studies discussing the accuracy of AI tools in implant identification and classification.

Registration
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [64] were followed to systematize and compile this systematic review.The study was registered with the International Prospective Register of Systematic Reviews (PROSPERO registration No.: CRD42024500347).

Inclusion and Exclusion Criterias
The details of inclusion and exclusion criteria are given in Table 1.

Exposure and Outcome
In the current study, the exposure was the identification of the type and classification of an implant system using an artificial intelligence tool.The outcome was the accuracy of identification.The focused PICO (Population (P), Intervention (I), Comparison (C), and Outcome (O)) question for the current study was "What is the accuracy (outcome) of Studies evaluating the accuracy of artificial intelligence tools in identification of other dental/oral structures Studies in which three or more implant models were identified Studies having only the abstract and not the full text Studies in which less than three implant models were identified Studies discussing artificial intelligence tools under trial

Information Sources and Search Strategy
Four electronic databases (Web of Science, Scopus, MEDLINE-PubMed, and Cochrane) were searched systematically to collect the relevant published literature.The search strings were based on the formulated PICO question.The article search was conducted in January 2024 using the Boolean operators and truncation.The search was limited to articles published in English in the last 15 years (January 2008 to December 2023).Studies performed on animals were not included.Details about the search strategy are mentioned in Table 2. Minor changes were made in the search strings based on the requirements of the database.Grey literature was searched, and bibliographies of selected studies and other review articles were checked manually to ensure that no relevant articles were left.

Screening, Selection of Studies, and Data Extraction
Two reviewers, M.S.A. and M.N.A., independently reviewed the titles and abstracts obtained by the electronic search.Duplicate titles were eliminated.The remaining titles were assessed based on the preset selection criteria and the PICO question.Full texts of the selected studies were reviewed independently by two reviewers, R.S.P. and W.I.I., and relevant articles were shortlisted.Any disagreements were resolved by discussion between them and with the third reviewer, M.N.A. Articles that did not meet the selection criteria were discarded, and the reason for exclusion was noted.The inter-examiner agreement was calculated using kappa statistics.W.I.I. created a data extraction chart and collected information related to the author, year of publication, country where the research was conducted, type and name of the algorithm network architecture, architecture depth, number of training epochs, learning rate, type of radiographic image, patient data collection duration, number of implant images evaluated, number and names of implant brands and models evaluated, comparator, test group, and training/validation number and ratio.Accuracy reported by the studies, author's suggestions, and conclusions were also extracted.These data were checked and verified by a second reviewer (M.S.A.).("dental implants" OR "dental implantation" OR "dental implant*" OR "dental implant system*" OR "Dental Implant System Classification" OR "dental implant fixture" OR "dental implant fixture classification") AND ("dental diagnostic imaging" OR "dental digital radiography" OR "dental radiography" OR "oral digital radiography" OR "dental Digital radiograph" OR "Panoramic image*"

Quality Assessment of Included Studies
The quality of all the selected articles was critically analyzed using the Quality Assessment and Diagnostic Accuracy Tool (QUADAS-2) [65].This tool is used for studies evaluating diagnostic accuracy (Table S1).This tool assesses the risk of bias and applicability concerns.The risk of bias arm has four domains that primarily focus on patient selection, index test, reference standard, and flow and timing.Meanwhile, the applicability concern arm has three domains focusing on patient selection, index test, and reference standards.

Identification and Screening
After an electronic search of the databases, 561 hits were displayed.A total of 36 articles were found to be duplicates and were removed, and the titles and abstracts of 525 articles were reviewed and checked for eligibility based on inclusion and exclusion criteria.Twenty-eight articles were selected for full-text review.Out of these twenty-eight articles, six were rejected, as they discussed the use of AI in diagnosis and treatment planning of dental implants, and one was rejected because it discussed the diagnostic accuracy of AI in evaluating the misfit of abutment and implant.Eventually, twenty-one articles were included in the study.No relevant articles meeting the selection criteria were found during the manual search of the bibliographies of the selected studies and other review articles (Figure 1).During the full-text review phase, Cohen's kappa value was found to be 0.89 for two reviewers (R.S.P. and W.I.I.), which is an excellent agreement.
sources and grant numbers, respectively, but the studies by Kong et al. [31,61] also shared a common research registration number.The number of algorithm networks evaluated for accuracy varied in the selected studies.Ten studies [46][47][48][49]51,53,57,58,60,62] evaluated the accuracy of one algorithm network; three evaluated two algorithm networks [32,59,61]; two tested three algorithm networks [31,54]; one tested four algorithm networks [56]; three tested five algorithm networks [50,52,55]; one study each tested six [30] and ten [63] algorithm networks.All the included studies evaluated the accuracy of tested AI tools in implant detection and classification, whereas four studies [51,53,60,62] also compared this to trained dental professionals.More than 431,000 implant images were used to train and test the selected AI tools' implant detection and classification accuracy.Eight studies [30,31,47,50,52,56,60,61] used cropped panoramic X-ray images, and six studies [49,55,[57][58][59]63] used cropped periapical X-ray images, whereas another six studies [46,48,51,53,54,62] used both periapical and sources and grant numbers, respectively, but the studies by Kong et al. [31,61] also shared a common research registration number.The number of algorithm networks evaluated for accuracy varied in the selected studies.Ten studies [46][47][48][49]51,53,57,58,60,62] evaluated the accuracy of one algorithm network; three evaluated two algorithm networks [32,59,61]; two tested three algorithm networks [31,54]; one tested four algorithm networks [56]; three tested five algorithm networks [50,52,55]; one study each tested six [30] and ten [63] algorithm networks.All the included studies evaluated the accuracy of tested AI tools in implant detection and classification, whereas four studies [51,53,60,62] also compared this to trained dental professionals.More than 431,000 implant images were used to train and test the selected AI tools' implant detection and classification accuracy.Eight studies [30,31,47,50,52,56,60,61] used cropped panoramic X-ray images, and six studies [49,55,[57][58][59]63] used cropped periapical X-ray images, whereas another six studies [46,48,51,53,54,62] used both periapical and panoramic implant images.In one study [32], artificially generated X-ray images were used to test AI accuracy.In most of the selected studies, the test group to training group ratio was 1:4.The learning rate of the AI algorithm ranged between 0.0001 and 0.02, the number of training epochs ranged from 50 to 2000, and the architecture depth varied from 3 to 150 layers.Also, the number of implant brands and models identified and classified varied from N = 3 to N = 130.

Quality Assessment of Included Studies
The QUADAS-2 tool was used to assess the risk of bias in diagnostic tests.Out of the 21 studies evaluated, 14 were found to be at risk of bias, with high or unclear risk in one or more domains.The remaining seven studies, however, had a low risk of bias.All the included studies utilized photographic data as input to AI, resulting in a low risk of bias in the data selection domain across all studies.The results from the risk-of-bias arm demonstrated that 80.95% of the studies had a low risk, 14.28% had an unclear risk, and 4.76% had a high risk in the index test domain.In contrast, in the reference standard domain, 47.62% of the studies had a low or unclear risk of bias, while 4.76% had a high risk of bias.As the data feeding in AI technology is standardized, the final output will not affect the flow or time frame.Therefore, all studies regarded both aspects as low-risk categories (100%).Based on the risk-of-bias arm of the QUADAS-2 assessment tool, applicability concerns generated similar results.(Table S1 and Figure 4).
Diagnostics 2024, 14, x FOR PEER REVIEW 9 of 27 panoramic implant images.In one study [32], artificially generated X-ray images were used to test AI accuracy.In most of the selected studies, the test group to training group ratio was 1:4.The learning rate of the AI algorithm ranged between 0.0001 and 0.02, the number of training epochs ranged from 50 to 2000, and the architecture depth varied from 3 to 150 layers.Also, the number of implant brands and models identified and classified varied from N = 3 to N = 130.

Quality Assessment of Included Studies
The QUADAS-2 tool was used to assess the risk of bias in diagnostic tests.Out of the 21 studies evaluated, 14 were found to be at risk of bias, with high or unclear risk in one or more domains.The remaining seven studies, however, had a low risk of bias.All the included studies utilized photographic data as input to AI, resulting in a low risk of bias in the data selection domain across all studies.The results from the risk-of-bias arm demonstrated that 80.95% of the studies had a low risk, 14.28% had an unclear risk, and 4.76% had a high risk in the index test domain.In contrast, in the reference standard domain, 47.62% of the studies had a low or unclear risk of bias, while 4.76% had a high risk of bias.As the data feeding in AI technology is standardized, the final output will not affect the flow or time frame.Therefore, all studies regarded both aspects as low-risk categories (100%).Based on the risk-of-bias arm of the QUADAS-2 assessment tool, applicability concerns generated similar results.(Table S1 and Figure 4).

Accuracy Assessment
The overall accuracy of deep learning algorithms (DLA) in implant detection and identification ranged from a low of 67% [56] to as high as 98.5% [52].Most included studies reported mean accuracy levels above 90% [30,46,[50][51][52][53][54][55]58,59,63].The accuracy of the latest finely tuned versions of DLAs was reported to be higher when compared to basic DLAs.Six studies [46,48,51,53,54,62] used both periapical and panoramic implant images to test the DLA models.Four studies reported higher accuracy when periapical radiographs were used [46,51,53,54,62].One study reported higher accuracy with panoramic radiographs [48], whereas one study did not provide these details [46].Four studies compared the accuracy of DLAs with dental professionals [51,53,60,62].All four reported higher accuracy for DLAs when compared to dental professionals.A study by Lee et al. [60] reported that the board-certified periodontists with the assistance of DLA reported higher accuracy when compared to automated DL alone.

Discussion
The current systematic review involved all the recently published studies evaluating the accuracy of AI in implant detection and classification [30][31][32][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61][62][63].Overall, the outcome of this review revealed that the application of AI in implant detection and classification is a reliable and accurate method and can help dentists manage cases with no previous data related to the type of implant.With the advancements in AI, the accuracy levels may improve to a great extent.
However, the outcomes of this review should be inferred with caution because there was a significant variation between the numbers of implant models evaluated for testing the accuracy in the included studies.These ranged from as low as three [49,51,56,59] to as high as one hundred and thirty [61].In general, the lower the number, the higher the accuracy rate of identification and classification, generally.There was a large variation in the sample size in the selected studies, which varied from 300 [57] to more than 150,000 [62].
The included studies have variations in the annotation process.PA images were used for training and testing the AI tool in six studies [49,55,[57][58][59]63] and panoramic images in eight studies [30,31,47,50,52,56,60,61], whereas both PA and panoramic images were used in six studies [46,48,51,53,54,62].One study used simulated images generated artificially [32].In the studies where both PA and panoramic images were used, four studies reported that the accuracy of identification and classification was higher with PA images as compared to panoramic images [51,53,54,62], whereas one study reported that the accuracy was higher with the panoramic images [48].
The dental professionals involved in image selection, cropping, image standardization, training, and validation varied in areas of practice from periodontists and prosthodontists to oral and maxillofacial surgeons [30,48,51,53,54,63].In contrast, other included studies were lacking in this information.One study validated the collected data with the help of board-certified oral and maxillofacial radiologists [48] and periodontists [53].To reduce the heterogeneity and standardize the outcomes, the validation of the selected X-ray images should be performed by a trained radiologist.There was variation in training epochs, which varied from 50 to 2000, and the architecture depth varied from 3 to 150 layers.These parameters can affect the accuracy outcomes of the included studies.The accuracy of identification and classification also depends on the generation of Dl architecture used.There was a difference in the tested algorithms in the selected studies.
In their study, Sukegawa et al. [52] trained a CNN algorithm to analyze the implant brand and treatment stage simultaneously.The AI tool was annotated for both parameters.The classification accuracy of the implant treatment stage was reported as 0.996, with a large effect size of 0.818.The accuracy of single-task and multi-task AI tools were found to be comparable.Lee et al. [54] trained and tested the accuracy of AI tools to identify and classify fractured implants.They reported an implant classification accuracy varying from 0.804 to 0.829.They reported higher accuracy levels when DCNN architecture used only PA images for identification.
All the included studies evaluated the accuracy of tested AI tools in implant detection and classification, whereas four studies [51,53,60,62] also compared this to the trained dental professionals. Lee et al. [51,53,60] and Park et al. [62] compared the accuracy of the tested DL algorithm in implant detection and classification with trained dental professionals.All the studies reported that the accuracy performance of the DL algorithm was significantly superior when compared to humans.The accuracy reported by Park et al. [62] for DL was 82.3% and for humans varied from 16.8% (dentist not specialized in implantology) to 43.3% (dentist specialized in implantology). Lee et al. [60] reported mean accuracy of 80.56% for the automated DL algorithm, 63.13% for all participants without DL assistance, and 78.88% for all participants with DL assistance.They reported that the DL algorithm significantly helped in improving the classification accuracy of all dental professionals. Lee et al. [53], in another study, reported an accuracy of 95.4% for DL and between 50.1% to 96.8% for dentists.Another study by Lee et al. [51] reported a similar accuracy rate with DL at 97.1% and periodontists at 92.5%.
Most of the currently reported AI models use two-dimensional X-rays (periapical or panoramic).In contrast, three-dimensional X-rays like cone-beam computed tomography, widely used in implantology, were not evaluated.Also, the studies included have limitations in the type of implant systems evaluated.Thus, there is a need for more studies with a vast database that can include most of the commonly used implant systems and can utilize all forms of radiographic techniques.
The DL algorithm's identification and classification abilities in all the selected studies were limited to the implant models the authors trained.There is a need to include more implant systems and models and create a vast database to help identify a wider variety of implant models and their characteristics.A comprehensive search strategy and rigorous selection strategy are the strong points of this systematic review.All articles mentioning AI and dental implants were assessed based on pre-set selection criteria, thus ensuring that every relevant article was reviewed.

Inferences and Future Directions
The field of AI is growing exponentially.There is vast literature discussing the advancements of AI in the healthcare field.Most of these AI tools focus on identification, diagnosis, and treatment planning and ways to improve them to help healthcare professionals provide the best possible treatment to their patients.All the included studies used two-dimensional images (periapical or panoramic) to identify and classify the implant systems.Three-dimensional imaging techniques like CBCT are considered a gold-standard imaging technique in dental implant planning and treatment.Thus, there is a need to develop AI tools that can use these 3D images to identify and classify the implant systems.Additionally, with the availability of newer generations of AI tools, there is a need for constant up-gradation to increase the accuracy levels of these tools.

Limitations
The current systematic review has a few limitations.This review included studies published only in English.The search period was limited to the last 25 years only (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023).As AI is a recent and advancing field, the authors believed that conducting a search before this time may provide studies in which the technology is in an immature stage.Lastly, a meta-analysis was not feasible due to the lack of heterogeneity among the selected studies.

Conclusions
To conclude, it can be stated that the articles in the present review provide considerable evidence to validate AI tools as having high accuracy in identifying and classifying dental implant systems using 2-dimensional X-ray images.These outcomes are vital for clinical diagnosis and treatment planning by trained dental professionals to enhance patient treatment outcomes.

Figure 1 .
Figure 1.Flow chart illustrating the search strategy.

Figure 2 .
Figure 2. Year-wise distribution of published studies.

Figure 3 .
Figure 3. Country-wise distribution of published studies.

Figure 2 .
Figure 2. Year-wise distribution of published studies.

Figure 2 .
Figure 2. Year-wise distribution of published studies.

Figure 3 .
Figure 3. Country-wise distribution of published studies.

Figure 4 .
Figure 4. Presentation of the risk of quality assessment summary of risk bias and applicability concerns for included studies according to the QUADAS-2 tool.Figure 4. Presentation of the risk of quality assessment summary of risk bias and applicability concerns for included studies according to the QUADAS-2 tool.

Figure 4 .
Figure 4. Presentation of the risk of quality assessment summary of risk bias and applicability concerns for included studies according to the QUADAS-2 tool.Figure 4. Presentation of the risk of quality assessment summary of risk bias and applicability concerns for included studies according to the QUADAS-2 tool.

Table 2 .
Strategy and search terms for the electronic databases.

Table 3 .
Study characteristics and accuracy results of the included studies.