Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs

Barayan, Mohammed A.; Qawas, Arwa A.; Alghamdi, Asma S.; Alkhallagi, Turki S.; Al-Dabbagh, Raghad A.; Aldabbagh, Ghadah A.; Linjawi, Amal I.

doi:10.3390/app12199588

Open AccessArticle

Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs

by

Mohammed A. Barayan

¹,

Arwa A. Qawas

²,

Asma S. Alghamdi

³,

Turki S. Alkhallagi

⁴

,

Raghad A. Al-Dabbagh

⁴,

Ghadah A. Aldabbagh

^5,6

and

Amal I. Linjawi

^2,*

¹

Department of Diagnostic Oral Sciences, Faculty of Dentistry, King Abdulaziz University, Jeddah 22254, Saudi Arabia

²

Department of Orthodontics, Faculty of Dentistry, King Abdulaziz University, Jeddah 21589, Saudi Arabia

³

Department of Restorative Dentistry, Faculty of Dentistry, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁴

Department of Oral and Maxillofacial Prosthodontics, Faculty of Dentistry, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁵

Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

⁶

Deparmtne of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9588; https://doi.org/10.3390/app12199588

Submission received: 13 August 2022 / Revised: 14 September 2022 / Accepted: 21 September 2022 / Published: 24 September 2022

(This article belongs to the Special Issue Digitalization, Technologies, New Approaches, and Telemedicine in Dentistry and Craniofacial/Temporomandibular Disorders)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Machine learning is becoming one of the major platforms for advances in all fields of dentistry. Their further exploration in automating the diagnostic process is of great importance to the field. The aim of this study is to assess the effectiveness of machine learning (ML) in assessing the diagnostic quality of bitewing (BW) radiographs at contact areas between teeth, which can help the oral radiologists in providing better radiographic qualities.

Abstract

Background: Identifying the diagnostic value of bitewing radiographs (BW) is highly dependent on the operator’s knowledge and experience. The aim of this study is to assess the effectiveness of machine learning (ML) to classify the BW according to their diagnostic quality. Methods: 864 BW radiographs from records of 100 patients presented at King Abdulaziz University Dental Hospital, Jeddah, Saudi Arabia were assessed. The radiographic errors in representing proximal contact areas (n = 1951) were categorized into diagnostic and non-diagnostic. Labeling and training of the BW were done using Roboflow. Data were divided into validation, training, and testing sets to train the pre-trained model Efficientdet-d0 using TensorFlow. The model’s performance was assessed by calculating recall, precision, F1 score, and log loss value. Results: The model excelled at detecting “overlap within enamel” and “overlap within restoration (clear margins) with F1 score of 0.89 and 0.76, respectively. The overall system errors made by the built model showed a log loss value of 0.15 indicating high accuracy of the model. Conclusions: The model is a “proof of concept” for the effectiveness of ML in diagnosing the quality of the BW radiographs based on the contact areas. More dataset specification and optimization are needed to overcome the class imbalance.

Keywords:

artificial intelligence; bitewing radiographs; contact areas; machine learning; radiographic errors

1. Introduction

Bitewing radiographs are a vital adjunct to clinical examination for diagnosis and treatment planning of dental and supporting tissue diseases [1]. A balance between risks and benefits of acquiring radiographs, following standards of justification for radiation dose protection, optimizing, and restricting radiation dose, should be achieved [2]. Reducing the radiation dose could be partly achieved by ensuring the acquisition of diagnostically acceptable radiographic images from the first attempt. However, bitewing radiographs are sometimes re-taken for diagnostic purposes. Indeed, Yeung et al., 2021, demonstrated in a systematic review that the rejection rate of bitewings because of its limited diagnostic value was 9%. The most common reasons for rejection were positional errors and cone cuts. It was further suggested that reject rates were higher with digital imagining, specifically when using intraoral sensors [3].

The decision-making process involved in identifying the diagnostic value of bitewing images is complex and is highly dependent on the operator’s/technician’s knowledge and experience. In general, bitewing images should equally capture both maxillary and mandibular crowns and crestal bone. The premolar bitewings should capture the distal surfaces of both maxillary and mandibular canines, while molar bitewings should capture the distal surface of the second premolar and the distal surface of the last standing molar in the mouth. Additionally, there should be no overlap of proximal surfaces in contact areas [4]. One possible adjunct that may improve this decision-making process could be the use of Artificial Intelligence (AI) programs in the field.

AI refers to any computer or technology capable of mimicking human thought processes such as problem-solving. Additionally, it can refer to the capacity of computers that can learn from data and represent their own intelligence so that issues can be solved from this learning [5]. This science was first developed by John McCarthy, who established the first AI laboratory in 1957 [6]. AI varies in types and classifications depending on its complexity in performance. Machine learning (ML) is a type of AI that makes software application more accurate in predicting outcomes without programming. Deep Learning (DL) is another type of AI defined as “computational systems that learn over time based on experience” [7,8,9,10].

Both ML and DL are considered as implementations of computational methods (i.e., algorithms) through which models can be generated from any given data, and those models can then be used for further prediction of new data sets with similar properties [11]. The differentiation between ML and DL can be made based on the complexity of the network used to extract features from the data. It is also related to the complexity of the data itself. ML is usually appropriate for regression or classification of simple numerical data. DL is recommended for the analysis of large collections of images or other complex data [8,9,10,11].

Clinical decision–support systems (CDSS) are software-based AI programs. They can be based on either ML or so-called rules-based expert systems [7,12]. CDSS are designed to provide the health care providers with specialist assistance in their daily activities, helping with tasks that depend on data and information exploitation [7]. There are many types of CDSS systems, such as artificial neural networks (ANNs), fuzzy expert systems, evolutionary computation, and hybrid intelligent systems. They vary in methods of handling data and the complexity of the analysis process [7,10].

Nowadays, the application of AI is increasing in multiple dental and medical applications with promising results [7,10]. Radiographic images are considered an important source of data for developing AI models in the health professional fields. Therefore, it is important to consider the factors in the quality of radiographic images that can affect the appearance of structures [7]. AI application in dentistry was assessed for three main purposes: (1) as a triage tool for optimizing the clinical workflow, (2) as a diagnostic tool for the separation of normal from abnormal conditions, and (3) as a treatment decision-making tool for the detection of specific diseases [7].

Studies in the literature about the value of AI from bitewing radiographs in assisting dentists in decision-making are plentiful. In general, these studies demonstrate the utility of AI algorithms in diagnosing dental caries, periodontal disease, and classifying restorations [13,14,15]. Most of these studies investigated the effectiveness of AI models from bitewings in diagnosing dental caries. These studies demonstrated that deep learning-based AI algorithms increased the accuracy of diagnosing proximal dental caries [13,16,17,18]. A few studies investigated the utility of AI from bitewings in diagnosing periodontal disease. Overall, AI models for diagnosing periodontitis are still developing but might be useful adjunctive tools [16].

In one study, Mertens et al., 2021, demonstrated that deep learning-based CNNs from periapical and bitewing radiographs was a potentially beneficial technique in detecting and differentiating restorations [14]. However, there are no studies in the literature that investigate the development or utility of AI algorithms to classify bitewings based on their diagnostic value. Accordingly, the aim of this study is to develop and assess the effectiveness of machine learning-based AI algorithm to classify bitewing radiographs according to their diagnostic quality.

2. Materials and Methods

2.1. Study Design

This is a laboratory study approved by the ethical committee at King Abdulaziz University, Faculty of Dentistry, Jeddah, Saudi Arabia (Ethical no.: 172-12-20). The study was conducted in collaboration with the Faculty of Computing and Information Technology at the same institute.

2.2. Data Collection

A research account was created for the investigators to access the patients’ records through the University Dental Hospital health record system (CareStream R4 Clinical+ software, Jeddah, Saudi Arabia), and to retrieve the required BW radiographs. The inclusion criteria for the BW radiographs were: (1) for patients under treatment at the University Dental Hospital with fully erupted permanent dentition, and (2) the BW was either premolar or molar view. The exclusion criteria included: (1) records with findings that might affect the machine learning process such as severe periodontitis that cannot be assessed by a horizontal BW, removable partial dentures components, fixed retainers, provisional crowns, and defective direct/indirect restorations that hinder the contacts assessment, and (2) certain acquisition errors such as double exposure, patient movement, or extremely low/high kVp that resulted in non-distinguishable enamel, dentin, or pulp, and cannot be corrected. The R4 Clinical+ system was searched for patients’ records from 2017–2021 that fulfilled the study inclusion criteria.

2.3. Data Preprocessing and Entry

Two investigators, who were general dental practitioners, were trained and calibrated to perform labeling under the supervision of an experienced Oral and Maxillofacial Radiologist.

The selected BW radiographs were reviewed and had the rotation and contrast manually adjusted, if needed, to an acceptable level of differentiation between enamel, dentin, pulp, and air; to aid the model in the machine learning process, then exported as TIFF format with dimensions of 1920 × 1440 pixels.

For each patient, two categories of data were collected: demographic data and radiographic data. The demographic data included: patient’s file number, date of birth, nationality, ethnicity, and number of bitewings taken.

The radiographic data included the following datasets:

Radiographic ID: Each BW image was renamed with ten digits. The last seven digits referred to the patient’s file number on the R4 Clinical+ software, while the first three numbers represented the BW radiograph number for each specific patient.
Acquisition date: the date when the BW radiograph was taken in MM/DD/YYYY format.
Machine type: the type of machine used to acquire the BW radiographs, either sensor or digitalized plates.
Radiograph view side: which side of the dentition was included in the BW radiograph (right or left).
Score: the scoring system for each proximal surface and contact area between two teeth was established based on tooth structure, amount of contact overlap, and the field of the taken radiograph. The scores ranged from 0 to 11.

2.4. Dataset

After scoring all selected bitewing radiographs, feature selection was performed to identify prominent features leading to successful classification results. Accordingly, the labels that were included in the model training process were the following:

Overlap within enamel.
Overlap within a restoration (clear margin).
Overlap at DEJ.
Overlap within dentin.
Overlap within a restoration (unclear margin).

Since the machine learning task is an object detection task, the “out-of-field” label had to be excluded, as the contacts in this case are not present in the images and thus cannot be identified using bounding boxes. In addition, labels that were not errors, such as missing tooth/surface, and open contacts have also been excluded to avoid misclassification.

Images were converted from TIFF to JPG format then manually labeled using Roboflow, a tool created by for-profit Roboflow company that facilitates collaborative image annotation and allows for easy split, augmentation, and exportation. The resulting dataset was then split into three sets: training set (583 images), validation set (167 images), and test set (84 images).

It seems that since the data were randomly collected, and some labels of more common errors are over-represented in the dataset, as shown in Table 1. Hence, the model would probably excel at detecting these labels more than it would with others. On the other hand, we can see that “overlap with restoration -unclear margin-”, “overlap within dentin”, “overlap at DEJ”, are all under-represented, thus it is expected from the model to not be as efficient at detecting these classes.

To overcome the imbalance issue, oversampling and under sampling techniques were applied for over-represented labels, and randomly selected images were removed. To increase under-represented labels image augmentation was performed, where we increased the number of images that contained these labels by using shift, flip, brightness, and zoom image data augmentation.

2.5. Model Training

The model was created using TensorFlow Object Detection API “Application Programming Interface”, an open-source platform with a preceding software library that was developed by Google Brain team. It allows the use and customization of many saved networks termed as pre-trained models. They were previously trained on a large general dataset, which meant that a pre-trained model could be taken advantage of to effectively work as a learned generic model, which aids in reducing the amount of data needed to train a model from scratch [19].

Several pre-trained models were experimented with. These models are: ssd_mobilenet_v2, RetinaNet, and EfficientDet-D0, which were all pre-trained on coco_2018 dataset.

Among the three different object detection model architectures, EfficientDet achieved the best performance, as it required the lowest number of training epochs with similar accuracy. Accordingly, the EfficientDet-D0 model was chosen for this study, which is a light and small version of EfficientDet (Figure 1) [20].

In this study, a fine-tuning approach was used for pre-trained model customization. This is done by unfreezing then training the last layers of the frozen model base, together with the newly added classifier layers. This allows us to modify the representations needed in the model for feature detection and make them more relevant for the specified task [19]. Training of the model to identify errors was done using the training set within an open-source product from Google Research, called Google Colab, using GPU “Graphics Processing Unit” runtime. The initial learning rate was chosen as 0.001 with further exponential decay. There were 7000 training steps, with a batch size of 16 [21].

2.6. Model Assessment

To evaluate inference performance, each radiograph was compared to the ground truth labeling provided previously, using the data of the testing set that was not seen by the system during the training phase.

2.7. Statistical Analysis

Cohen’s kappa coefficient was calculated to assess intra-rater and inter-rated reliability. The performance of the machine learning model was evaluated by the following classification metrics.

Log loss: represents the summation of system errors.
Recall: quantifies the number of correct positive predictions made from all positive predictions that could have been made.
Precision: quantifies the number of correct positive predictions made by the model.
F1 score: combines recall and precision into a single score by calculating the harmonic mean of precision and recall, which is the measure that is used to evaluate this model’s performance.

3. Results

3.1. Data Analysis

Cohen’s kappa coefficient for intra-rater and inter-rater reliability were 0.95 and 0.96, respectively, indicating excellent agreement.

A total of 834 randomly chosen BW radiographs of 100 patients presented to King Abdulaziz University Dental Hospital were collected with a minimum number of one BW and a maximum of 33 BW per patient (mean ± SD; 15.08 ± 9.553). The final dataset consisted of 583 images for the training set, 167 for the validation set, and 84 for the test set. Details of numbers of labeled contact area within the assessed dataset and training set radiographs are presented in Table 1.

Results revealed class imbalance, which means over-representation of some labels, such as for “overlap within enamel”, while others were under-represented, such as “overlap at DEJ” and “overlap within dentin” (Table 1, Figure 2).

3.2. Machine Learning Performance Assessment

To obtain evaluation metrics, a confusion matrix was constructed for each error label (X):

“True positives (TP) of “X” are all X instances that are classified as X;
“True negatives (TN) of “X” are all non-X instances that are not classified as X;
“False positives (FP) of “X” are all non-X instances that are classified as X;
“False negatives (FN) of “X” are all X instances that are not classified as X.

True negative (TN) metric is not useful for object detection. Therefore, it was ignored. The following metrics were calculated as seen in Table 2 using the confusion matrix: recall (sensitivity) = TP/TP + FN, precision = TP/TP + FP, and F1 score = (2 × Precision × Recall)/(Precision + Recall), where TP, FP, FN represent true-positive, false-positive, and false-negative results, respectively.

Results showed that the model excelled in detecting the following two classes: “overlap within enamel” (Precision: 0.893, Recall: 0.899, F1 score: 0.896) and “overlap within restoration “clear margins)” (Precision: 0.893, Recall: 0.619, F1 score: 0.764) (Table 2). However, for the other classes, the evaluation metrics were unreliable due to under-representation of those classes (Table 2). The overall system errors made by this built model was represented by the log loss value, which had a value of 0.15.

3.3. Web App

As an additional step, a preliminary trial of building a web app was done to check the compatibility with the system used at King Abdulaziz University Dental Hospital. The app was built using Html and CSS for front-end, Flask framework for back-end, and Google App Engine for hosting. App Engine was used because it allows for creating custom deployment environments which is necessary for such applications (34.122.82.96, the link is protected by a password). The app was then tested on a number of images and showed promising results in detecting the classes at contact areas, as shown in Figure 3. However, this was not part of the main objective of the study, and thus will need further assessment on a larger sample size.

4. Discussion

The aim of the current study was to assess the effectiveness of ML in detecting the diagnostic quality of BW radiographs so it can be used as a decision–support tool to help technicians check the quality of the taken radiographs faster and with less effort [6,16]. The diagnostic quality of the BW depends on many factors and features, however, this study focused on the assessment of the contact areas as an important feature of the quality of the BW images. The detection of different radiographic errors regarding the contact area is subject to the discriminatory skills of examiners which varies significantly and is heavily related to examiner experience. Therefore, the application of ML to detect radiographic errors will aid in reducing this variation significantly.

Scoring proximal contact areas were developed through stages—from a comprehensive list to a more compact encompassing one—in this project. The objective was to identify prominent features in bitewing radiographs for a successful classification of diagnostic and non-diagnostic images and to yield a less complex model for the AI program. Initially, we identified eleven possible proximal contact area presentations based on tooth structure, amount of proximal contact overlap, and the acquired radiographic field. Then, to facilitate cross-tabulation and coding of proximal contact areas, these scores were further grouped in terms of diagnostic value into four basic groups. Finally, images were labeled based on the most prominent proximal contact area features to aid in identifying the diagnostic value. This included proximal contact overlap within tooth (enamel, DEJ, and dentin) or within restoration (with or without clear margins).

In this study, Python and Python-friendly environments were used to build the model, which is a popular and powerful interpreted IT language. The descriptive analysis of the dataset revealed class imbalance, which means over-representation of some labels while others were under-represented. The class “overlap within enamel” was over-represented and the class “overlap within restoration (clear margin) was adequately represented in the dataset, while the classes “overlap within restoration (unclear margin), “overlap within dentin”, and “overlap at DEJ” were all under-represented in the dataset. Thus, it is expected that the model will be more efficient at detecting the well-represented classes compared to the under-represented classes.

Additionally, most of the assessed proximal contact areas, if they had errors, were still of diagnostic value (overlap within enamel or within restoration with clear margin). This could partly be because all assessed radiographs in this study were taken with the holder technique. Indeed, it was reported that bitewing radiographs that were acquired by the holder technique (Rinn XCP film positioning devices) had fewer horizontal errors than those that were taken by the loop technique (conventional method) [22].

As for ML performance analysis, recall, precision, and F1 score are all highly affected by the class imbalance and thus are not suitable evaluation metrics for the current model. Thus, log loss was the best evaluation metric to indicate our model accuracy. The loss is calculated on training and validation and its interpretation is how well the model is doing for these two sets. Unlike other accuracy metrics, loss is not a percentage. It is a summation of the errors made for each example in training or validation sets. The lower the loss, the better the accuracy of the model. Log loss nearer to 0 indicates high accuracy, whereas if the log loss was away from 0 then it indicates low accuracy. The built model in the current study had a training loss of 0.15, which indicates high accuracy.

The current findings have several impacts in the field of dentistry. The availability of a machine learning model to assess the quality of radiographs will reduce the errors of misreading the radiographs from both the unskilled technicians and dentists. It will also help the radiology department in a busy dental practice to have efficient outcomes by reading large number of radiographs more efficiently and in less time.

This study has several limitations. For instance, the resulting model is a “proof of concept” model; it is not by any means ready for production. Many different adaptations, tests, and experiments still need to be conducted. The model is yet to be deployed and integrated with the web-app. However, to enhance the accuracy of the model, class imbalance needs to be managed by collecting more label specific data instead of random data. Despite such limitation, the model has proven that ML could be used to help identify errors within contacts in BW radiographs, with the ability to be continuously reinforced by new data to increase the precision level. The results of the current study add to other findings in the literature about the promising efficacy of AI and ML application in dentistry [5,7,12,23,24,25,26,27,28].

5. Conclusions

The model is a “proof of concept” for the effectiveness of ML in diagnosing the quality of the BW radiographs based on the contact areas. More dataset specification, optimization, and reinforced learning are needed to overcome the class imbalance.

Author Contributions

Conceptualization, G.A.A., R.A.A.-D., A.I.L., T.S.A. and M.A.B.; methodology, G.A.A., R.A.A.-D., A.I.L., T.S.A. and M.A.B.; software, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; validation, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; formal analysis, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; investigation, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; resources, R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; data curation, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; writing—original draft preparation, R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; writing—review and editing, G.A.A., R.A.A.-D., A.I.L., T.S.A., M.A.B., A.A.Q. and A.S.A.; visualization, G.A.A., R.A.A.-D., A.I.L., T.S.A. and M.A.B.; supervision, G.A.A., R.A.A.-D., A.I.L., T.S.A. and M.A.B.; project administration, G.A.A., R.A.A.-D. and A.I.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

All data were evaluated retrospectively, and ethical approval was taken from the Ethical Committee at King Abdulaziz University, Faculty of Dentistry, Jeddah, Saudi Arabia [Ethical no.: 172-12-20]. Formal consent is not required for this type of study.

Informed Consent Statement

All data were evaluated retrospectively, thus, informed consent statement is not applicable.

Data Availability Statement

Data supporting reported results will be available on request.

Acknowledgments

We would like to thank Maryam Omer and Hind Tayeb from the Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia for their great support and contribution in running the machine learning model and running the tests.

Conflicts of Interest

The authors declare no conflict of interest.

References

Boeddinghaus, R.; Whyte, A. Trends in maxillofacial imaging. Clin. Radiol. 2018, 73, 4–18. [Google Scholar] [CrossRef] [PubMed]
Jaju, P.P.; Jaju, S.P. Cone-beam computed tomography: Time to move from ALARA to ALADA. Imaging Sci. Dent. 2015, 45, 263–265. [Google Scholar] [CrossRef] [PubMed]
Yeung, A.W.; Wong, N.S. Reject Rates of Radiographic Images in Dentomaxillofacial Radiology: A Literature Review. Int. J. Environ. Res. Public Health 2021, 18, 8076. [Google Scholar] [CrossRef]
White, S.C.; Pharoah, M.J. Intraoral projections. In Oral Radiology–Principles and Interpretation, 7th ed.; Elsevier Mosby: Philadelphia, PA, USA, 2014; pp. 91–130. [Google Scholar]
Khanagar, S.B.; Al-Ehaideb, A.; Maganur, P.C.; Vishwanathaiah, S.; Patil, S.; Baeshen, H.A.; Sarode, S.C.; Bhandi, S. Developments, application, and performance of artificial intelligence in dentistry–A systematic review. J. Dent. Sci. 2021, 16, 508–522. [Google Scholar] [CrossRef] [PubMed]
Rajaraman, V. JohnMcCarthy—Father of artificial intelligence. Resonance 2014, 19, 198–207. [Google Scholar] [CrossRef]
Yaji, A.; Prasad, S.; Pai, A. Artificial intelligence in dento-maxillofacial radiology. Acta Sci. Dent. Sci. 2019, 3, 116–121. [Google Scholar]
Heidari, A.; Jafari Navimipour, N.; Unal, M.; Toumaj, S. Machine learning applications for COVID-19 outbreak management. Neural Comput. Appl. 2022, 10, 15313–15348. [Google Scholar] [CrossRef]
Heidari, A.; Toumaj, S.; Navimipour, N.J.; Unal, M. A privacy-aware method for COVID-19 detection in chest CT images using lightweight deep conventional neural network and blockchain. Comput. Biol. Med. 2022, 145, 105461. [Google Scholar] [CrossRef]
Heidari, A.; Navimipour, N.J.; Unal, M.; Toumaj, S. The COVID-19 epidemic analysis and diagnosis using deep learning: A systematic literature review and future directions. Comput. Biol. Med. 2021, 14, 105141. [Google Scholar] [CrossRef]
Pauwels, R. A brief introduction to concepts and applications of artificial intelligence in dental imaging. Oral Radiol. 2021, 37, 153–160. [Google Scholar] [CrossRef]
Ekert, T.; Krois, J.; Meinhold, L.; Elhennawy, K.; Emara, R.; Golla, T.; Schwendicke, F. Deep learning for the radiographic detection of apical lesions. J. Endod. 2019, 45, 917–922. [Google Scholar] [CrossRef] [PubMed]
Mohammad-Rahimi, H.; Motamedian, S.R.; Rohban, M.H.; Krois, J.; Uribe, S.; Nia, E.M.; Rokhshad, R.; Nadimi, M.; Schwendicke, F. Deep learning for caries detection: A systematic review: DL for Caries Detection. J. Dent. 2022, 30, 104115. [Google Scholar] [CrossRef]
Karatas, O.; Cakir, N.N.; Ozsariyildiz, S.S.; Kis, H.C.; Demirbuga, S.; Gurgan, C.A. A deep learning approach to dental restoration classification from bitewing and periapical radiographs. Quintessence Int. 2021, 52, 568–574. [Google Scholar] [PubMed]
Revilla-León, M.; Gómez-Polo, M.; Barmak, A.B.; Inam, W.; Kan, J.Y.; Kois, J.C.; Akal, O. Artificial intelligence models for diagnosing gingivitis and periodontal disease: A systematic review. J. Prosthet. Dent. 2022. online ahead of print. [Google Scholar] [CrossRef]
Bayrakdar, I.S.; Orhan, K.; Akarsu, S.; Çelik, Ö.; Atasoy, S.; Pekince, A.; Yasa, Y.; Bilgir, E.; Sağlam, H.; Aslan, A.F.; et al. Deep-learning approach for caries detection and segmentation on dental bitewing radiographs. Oral Radiol. 2021, 22, 468–479. [Google Scholar] [CrossRef] [PubMed]
Mertens, S.; Krois, J.; Cantu, A.G.; Arsiwala, L.T.; Schwendicke, F. Artificial intelligence for caries detection: Randomized trial. J. Dent. 2021, 115, 103849. [Google Scholar] [CrossRef] [PubMed]
Devlin, H.; Williams, T.; Graham, J.; Ashley, M. The ADEPT study: A comparative study of dentists’ ability to detect enamel-only proximal caries in bitewing radiographs with and without the use of AssistDent artificial intelligence software. Br. Dent. J. 2021, 231, 481–485. [Google Scholar] [CrossRef]
TensorFlow. Transfer Learning and Fine-Tuning|TensorFlow Core. 2021. Available online: https://www.tensorflow.org/ (accessed on 1 November 2021).
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Google Colab Notebook. Available online: https://colab.research.google.com (accessed on 1 November 2021).
Potter, B.J.; Shrout, M.K.; Harrell, J.C. Reproducibility of beam alignment using different bite-wing radiographic techniques. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. Endod. 1995, 79, 532–535. [Google Scholar] [CrossRef]
Saghiri, M.A.; Garcia-Godoy, F.; Gutmann, J.L.; Lotfi, M.; Asgar, K. The reliability of artificial neural network in locating minor apical foramen: A cadaver study. J. Endod. 2012, 38, 1130–1134. [Google Scholar] [CrossRef]
Devito, K.L.; de Souza Barbosa, F.; Felippe Filho, W.N. An artificial multilayer perceptron neural network for diagnosis of proximal dental caries. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2008, 106, 879–884. [Google Scholar] [CrossRef]
Johari, M.; Esmaeili, F.; Andalib, A.; Garjani, S.; Saberkari, H. Detection of vertical root fractures in intact and endodontically treated premolar teeth by designing a probabilistic neural network: An ex vivo study. Dentomaxillofac. Radiol. 2017, 46, 20160107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lee, J.H.; Kim, D.H.; Jeong, S.N.; Choi, S.H. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm. J. Periodontal. Implant. Sci. 2018, 48, 114–123. [Google Scholar] [CrossRef] [PubMed]
Aubreville, M.; Knipfer, C.; Oetter, N.; Jaremenko, C.; Rodner, E.; Denzler, J.; Bohr, C.; Neumann, H.; Stelzle, F.; Maier, A. Automatic classification of cancerous tissue in laser endomicroscopy images of the oral cavity using deep learning. Sci. Rep. 2017, 7, 11979. [Google Scholar] [CrossRef] [PubMed]
Yasa, Y.; Çelik, Ö.; Bayrakdar, I.S.; Pekince, A.; Orhan, K.; Akarsu, S.; Atasoy, S.; Bilgir, E.; Odabaş, A.; Aslan, A.F. An artificial intelligence proposal to automatic teeth detection and numbering in dental bite-wing radiographs. Acta Odontol. Scand. 2021, 79, 275–281. [Google Scholar] [CrossRef] [PubMed]

Figure 1. EfficientDet architecture that consists of EfficientNet backbone, BiFPN as feature network neck, and output head.

Figure 2. Representative samples of assessed contact area overlap; Cyan box: within enamel, White box: at DEJ, Off-white box: within dentin, Green box: within restoration (clear margins), Blue box: within restoration (unclear margins).

Figure 3. The model’s interface (an unseen radiograph was uploaded through the interface and the model detected three errors of different classes: within enamel, within dentin, and within restoration (clear margins)).

Table 1. Contact areas label distribution.

Class	Contact Area Count (%) in the Whole Dataset	Contact Area Count (%) in the Training Set
Overlap within enamel	1124 (57.6)	555 (52.8)
Overlap within restoration (clear margins)	352 (18.0)	226 (21.5)
Overlap at DEJ	162 (8.3)	83 (7.9)
Overlap within dentin	109 (5.6)	64 (6.1)
Overlap within restoration (unclear margins)	204 (10.5)	123 (11.7)
Total	1951 (100)	1051 (100)

Table 2. Precision, recall, and F1 score per class.

	Overlap within Enamel	Overlap within Restoration (Clear Margin)	Overlap at DEJ	Overlap within Dentin	Overlap within Restoration (Unclear Margin)
TP	133	13	7	3	2
FP	16	0	5	2	0
FN	15	8	5	11	4
Precision	0.893	(1no FP)	0.583	0.600	(1no FP)
Recall	0.899	0.619	0.583	0.214	0.333
F1 score	0.896	0.764	0.583	0.316	0.499

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barayan, M.A.; Qawas, A.A.; Alghamdi, A.S.; Alkhallagi, T.S.; Al-Dabbagh, R.A.; Aldabbagh, G.A.; Linjawi, A.I. Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs. Appl. Sci. 2022, 12, 9588. https://doi.org/10.3390/app12199588

AMA Style

Barayan MA, Qawas AA, Alghamdi AS, Alkhallagi TS, Al-Dabbagh RA, Aldabbagh GA, Linjawi AI. Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs. Applied Sciences. 2022; 12(19):9588. https://doi.org/10.3390/app12199588

Chicago/Turabian Style

Barayan, Mohammed A., Arwa A. Qawas, Asma S. Alghamdi, Turki S. Alkhallagi, Raghad A. Al-Dabbagh, Ghadah A. Aldabbagh, and Amal I. Linjawi. 2022. "Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs" Applied Sciences 12, no. 19: 9588. https://doi.org/10.3390/app12199588

APA Style

Barayan, M. A., Qawas, A. A., Alghamdi, A. S., Alkhallagi, T. S., Al-Dabbagh, R. A., Aldabbagh, G. A., & Linjawi, A. I. (2022). Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs. Applied Sciences, 12(19), 9588. https://doi.org/10.3390/app12199588

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effectiveness of Machine Learning in Assessing the Diagnostic Quality of Bitewing Radiographs

Abstract

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Data Collection

2.3. Data Preprocessing and Entry

2.4. Dataset

2.5. Model Training

2.6. Model Assessment

2.7. Statistical Analysis

3. Results

3.1. Data Analysis

3.2. Machine Learning Performance Assessment

3.3. Web App

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI