Multi-Task Deep Learning Model for Classification of Dental Implant Brand and Treatment Stage Using Dental Panoramic Radiograph Images

It is necessary to accurately identify dental implant brands and the stage of treatment to ensure efficient care. Thus, the purpose of this study was to use multi-task deep learning to investigate a classifier that categorizes implant brands and treatment stages from dental panoramic radiographic images. For objective labeling, 9767 dental implant images of 12 implant brands and treatment stages were obtained from the digital panoramic radiographs of patients who underwent procedures at Kagawa Prefectural Central Hospital, Japan, between 2005 and 2020. Five deep convolutional neural network (CNN) models (ResNet18, 34, 50, 101 and 152) were evaluated. The accuracy, precision, recall, specificity, F1 score, and area under the curve score were calculated for each CNN. We also compared the multi-task and single-task accuracies of brand classification and implant treatment stage classification. Our analysis revealed that the larger the number of parameters and the deeper the network, the better the performance for both classifications. Multi-tasking significantly improved brand classification on all performance indicators, except recall, and significantly improved all metrics in treatment phase classification. Using CNNs conferred high validity in the classification of dental implant brands and treatment stages. Furthermore, multi-task learning facilitated analysis accuracy.


Introduction
Dental implants have been used for more than half a century and currently are a highly reliable treatment option for long-term replacement (10+ years) of missing teeth [1,2]. As a long-term prognosis-established treatment, incidences of mechanical complications, such as fractures on fixtures or abutment screws, and biological complications, such as peri-implantitis, will inevitably occur [3][4][5][6]. An accurate understanding of the efficacy of different implant brands and the identification of treatment status are important for continued implant maintenance management and for dealing with any complications that arise. Unfortunately, it has been reported that about 3% of implants must be removed without continuous prosthetic treatment or repair, solely due to the inability to identify the type of implant used [7]. Patients may not be able to visit the same dentist for several reasons, including poor general condition, treatment transfer, emigration, or closure of their dentist's office. Consequently, the necessary information regarding the implants is unavailable. Therefore, it is very important to be able to independently and accurately identify the type of implant used in a patient.
Each brand of dental implant has a relatively distinctive morphology that can be used for proper identification. Brand identification, along with the accurate determination of treatment stage is crucial for efficient treatment and can be achieved through the use of medical images. Dental panoramic radiography is widely used in dental clinics and in dental or oral surgery. Because the entire tooth and jaw can be imaged at the same time, this method is useful for simultaneously capturing a large amount of information, such as the condition of the teeth and jawbone, prosthesis details, and the implant shape [8].
By the early 2000s, it was estimated that over 2000 different types of dental implants were available in the market [9]. The development and verification of a wide variety of fixture structures and implants is ongoing. Unfortunately, it is difficult for dentists to accurately identify certain implants if they have no experience with them.
Deep learning systems employ artificial intelligence machine learning techniques that allow computers to learn human-like tasks based on neural networks; therefore, they mimic neural circuitry in the human brain. The most frequently used computer technology in these types of research is deep learning using convolutional neural networks (CNNs). Deep learning using CNNs are very useful for classification and diagnosis when using medical images [10]. In recent years, research on deep learning models using panoramic radiographs has been reported, including tooth detection and numbering [11], osteoporosis prescreening [12], cystic lesion detection [13,14], atherosclerotic carotid plaque detection [15], and maxillary sinusitis diagnosis [16]. There are many different methods of machine learning; among them, multi-task learning is based on the theory that by learning interrelated concepts, classification methods can be developed through a wide range of generalizations and can ultimately improve performance compared to single-task learning [17].
In this study, we propose a novel approach for interpreting medical images while improving the generalization capabilities of multiple tasks. The purpose of this study was to build and evaluate a method that classifies implant brands and treatment stages from dental panoramic radiographic images using a multi-task deep learning approach.

Study Design
The purpose of this study was to classify implant brands and implant treatment stages from datasets segmented from dental panoramic radiographs using several Residual Neural Network (ResNet) CNNs Available online: https://github.com/qubvel/classification_ models (accessed on 19th April 2021). Supervised learning was used as a method of deep learning. We also compared the multi-task and single-task accuracy of brand classification and implant treatment stage classification.

Ethics Statement
We retrospectively used radiographic data from January 2005 to December 2020. This study protocol was approved by the institutional review committee of Kagawa Prefectural Central Hospital (approval number 1019, approved on 8th March 2021). The institutional review committee waived the need for individual informed consent. Therefore, written/verbal informed consent was not obtained from any participant because this study featured a noninterventional retrospective design, and all data were analyzed anonymously.

Data Acquisition and Preprocessing
Dental panoramic radiographs of each patient were used to acquire images using AZ3000CMR and Hyper-G CMF (Asahiroentgen Ind. Co., Ltd., Kyoto, Japan). All data images were output in Tagged Image File Format (TIFF) format (2964 × 1464, 2694 × 1450,  2776 × 1450 or 2804 × 1450 pixels) from the Kagawa Prefectural Central Hospital Picture Archiving and Communication Systems (PACS) system (Hope Dr Able-GX, Fujitsu Co., Tokyo, Japan). The radiographic images were classified and labeled based on electronic medical records and the dental implant usage ledger of our department. From a collection of 7079 selected digital panoramic dental radiographs, a dataset of 9767 manually cropped image segments, each focused on a dental implant, was synthesized.
Each dental implant image included was manually cropped as needed for each dental panoramic radiograph taken. These dental implant data included implant fixtures, healing abutments, provisional settings, and final prostheses. For preparation before analysis, we used Photoshop Element (Adobe Systems, Inc., San Jose, CA, USA) to manipulate the images so that all dental implant fixtures would fit (see Figures 1 and 2). The cropped image was saved in portable network graphics (PNG) format. Oral and maxillofacial surgeons who performed the cropping were completely unaware of the accurate implant brand of each patient.
The data were digitally preprocessed. Each image captured in PNG format was resized to 128 × 128 pixels. The preprocessing did not change the orientation of the image.

Classification of Dental Implant Brand
The 12systems mainly used at Kagawa Prefectural Central Hospital were selected as the dental implants targeted in this study. The types of dental implant systems and corresponding number of images are shown in Table 1. Among them, images containing the following 12 types of dental implant systems were selected for this work:

CNN Model Architecture
In this study, the evaluation was performed using the standard CNN model ResNet [18]. ResNet was invented by He et al. [18]. It is generally accepted that the accuracy of image discrimination is improved by deepening the network layer; conversely, if the network layer is too deep, the accuracy will decrease. To deal with this, we introduced an already developed learning method called residual learning, which has the following advantage: its batch normalization solves the gradient disappearance and makes model deterioration less likely to occur [19]. Thus, ResNet is a network that can be deepened to very deep layers of over 100 layers. This representative of the ResNet architecture has layers 18, 34, 50, 101 and 152, which were selected as the CNN model in this study.

Classification of Dental Implant Treatment Stages
The treatment stage is either the implant fixture after the primary surgery, the implant fixture after the implant placement by the secondary surgery, or the one-stage implant placement with the healing abutment attached, and the prosthetic set, including the final prosthesis and the provisional restoration setting. We included this task in this study because understanding the stage of treatment is clinically important. Different implant brands use different drivers for cover screws and healing abutments, and it is necessary to prepare the equipment according to the stage of treatment. Understanding the treatment stage and brand at the same time is essential for smooth treatment. These were classified into three categories. All classifications were based on clinical chart records.

CNN Model Architecture
In this study, the evaluation was performed using the standard CNN model ResNet [18]. ResNet was invented by He et al. [18]. It is generally accepted that the accuracy of image discrimination is improved by deepening the network layer; conversely, if the network layer is too deep, the accuracy will decrease. To deal with this, we introduced an already developed learning method called residual learning, which has the following advantage: its batch normalization solves the gradient disappearance and makes model deterioration less likely to occur [19]. Thus, ResNet is a network that can be deepened to very deep layers of over 100 layers. This representative of the ResNet architecture has layers 18, 34, 50, 101 and 152, which were selected as the CNN model in this study.
With efficient model construction, fine-tuning the weight of existing models as initial values for additional learning is possible; therefore, all CNNs were used to transfer learning with fine-tuning employed pre-trained weights using the ImageNet database [20]. The process of deep learning classification was implemented using Python language (version.3.7.10) and the Keras (version.2.4.3) Available online: https://github.com/keras-team/keras (accessed on 19th April 2021).

Model Training
The model training was generalized using k-fold cross-validation in the model training algorithm. The training algorithm used k = 4 for k-fold cross-validation to avoid overfitting and bias and to minimize generalization errors [21]. The data were divided into four and the test data consisted of 1950 images. Within each fold, the dataset was partitioned into independent training and validation sets, using an 80-20 percentage split. The selected validation set was a completely independent fold from the other training folds, and it was used to evaluate the training status during the training. After completing one model training step, we performed similar validations four times with different test data.

Deep Learning Procedure
All models were trained and evaluated on a 64-bit Ubuntu 16.04.5 LTS operating system with 8 GB memory and an NVIDIA GeForce GTX 1080 8 GB graphics processing unit. The optimizer, weight decay, and momentum were common to all the CNNs. In this study, the optimizer used stochastic gradient descent, and the weight decay and momentum were 0 and 0.9, respectively. Learning rates of 0.001 were used for ResNet for each. All the models analyzed a maximum of 50 epochs and minibatch size 32. We used the early stop method to terminate data training to prevent overfitting if the validation error did not update 10 times in a row.

Multi-Task
As a novel approach for the implant brand and treatment stage classifier, a deep neural network with two independent outputs was implemented and evaluated. The proposed multi-task CNN can analyze the implant brand and the treatment stage simultaneously. The model can reduce the number of trainable parameters that would otherwise be required when using two independent CNN models for implant brand and treatment stage classification. The proposed model has feature learning shared layers, including convolutional and polling layers that are shared with two separated branches, with independent, fully connected layers used for classification. The multi-task CNN consisted of ResNet18, 34, 50, 101 and 152. The convolution layer and pooling layer, excluding the FC layer from each ResNet architecture, were used for feature learning. For classification, two individual branches composed of dense layers were connected to single-output layers for the implant brand and the treatment stage, each one with softmax activations.
The scheme proposed for the implant assessment using the multi-task CNN models is shown in Figure 2. Table 2 shows the number of parameters for each multi-task and single-task model in ResNet. In the multi-task model, learning about the classification of implant brands and learning about the classification of treatment stages are performed. For both learnings, the cross entropy calculated in (1) was used as the error function. The error function (L_total) for the entire proposed multi-task model was the sum of the error (L_ib) for the prediction of the implant brand and the error (L_ts) for the prediction of the treatment stage (2).
(t i : correct data, y i : predicted probability of class i)

Performance Metrics
Our performance metrics were precision, recall, specificity, and F1 score defined in Equations (3)-(7), respectively, which account for the relations between the positive labels of the data and those given by the classifier. We also calculated the receiver operating characteristics (ROC) curve and measured the area under the curve (AUC), which relates to a classifier's ability to avoid false classification. In these equations, TP, FN, TN, and F1 score = 2 × precision × recall precision + recall (7)

Statistical Analysis
The differences between performance metrics were tested using chi-square analysis, using the JMP statistical software package version 14.2.0 for Mackintosh (SAS Institute Inc., Cary, NC, USA). The significance level was set to p < 0.05. Non-parametric tests were performed based on the results of the Shapiro-Wilk test. The difference between the multi-task and the single-task model was calculated for each performance metric using the Wilcoxon test. Effect sizes were calculated for the non-parametric tests, which are classified as follows: 0.5 is a large effect, 0.3 is a medium effect, and 0.1 is a small effect [22].

Visualization of Computer-Assisted Diagnostic System
CNN model visualization helps clarify the most relevant features used for classification. To identify potential correct classifications based on incorrect features, and to gain some intuition into the classification process, we identified the image pixels most relevant for classification using gradient-weighted class activation maps (Grad-CAM) [23]. Map visualizations are heatmaps of the gradients with the "hotter" colors representing the regions of more importance for classification. The heat map using Grad-CAM was reconstructed with the final convolutional layer in this study.

Implant Brand Classification Performance
A comparison of ResNet models by model showed that the larger the number of parameters and the deeper the network, the better the accuracy. Comparing multi-task and single-task models, multi-tasking was superior in all CNNs. The highest accuracy on the ResNet 152 was 0.9908 (Table 3). The ROC curve in the multi-task models is shown in Figure S1. (The change ratio) = (multi-task each performance metrics)/(single-task each performance metrics) ×100.

Implant Treatment Stage Classification Performance
Similar results to the implant brand classification were obtained in the treatment stage classification. A comparison of each ResNet model by model showed that the larger the number of parameters and the deeper the network, the better the performance metrics. Comparing the multi-task and single-task models, multi-tasking was superior in all CNNs. The highest accuracy on the ResNet 152 was 0.9972 (Table 4). The ROC curve in the multi-task models is shown in Figure S1. (The change ratio) = (multi-task each performance metrics)/(single-task each performance metrics) ×100.

Comparison the Multi-Task and Single-Task Models in Classification Performance
We compared two groups of multi-task and single-task models for each performance metric in ResNet50. Table 5 shows the results of 30 times of 4-fold-cross validation analysis. In implant brands, the classification ability was significantly improved in all performance metrics except recall. In treatment stages, the classification ability was significantly improved in all performance metrics. In terms of effect size, the implant brand classification was 0.4484 in accuracy, which was an effect size that could be classified as medium. The classification of the implant treatment stage was 0.8183 in accuracy, which was an effect size that could be classified to a large effect. Figure 3 shows an image of 12 different dental implants and treatment stages of dental implants, classified using each CNN model visualized by Grad-CAM. The single-task and multi-task models both showed an identification area that could be used to identify similar images. The implant brand focused mainly on fixtures as a feature area. On the other hand, the classification of implant treatment stages focused on healing abutments and superstructures. Biomolecules 2021, 11, x FOR PEER REVIEW 10 of 14

Discussion
This study achieved very high performance in the classification of dental implant brands and treatment stages using CNNs. Furthermore, multi-task learning of implant

Discussion
This study achieved very high performance in the classification of dental implant brands and treatment stages using CNNs. Furthermore, multi-task learning of implant brand and treatment stage classification enabled more accurate analysis with a very small number of parameters.
There have been several reports on dental implant branding studies using deep learning [24][25][26][27][28]. All of these studies were single-task, with a classification analysis performance of 0.935-0.98 for accuracy, 0.907-0.98 for recall, and 0.918-0.971 for AUC. In our study, single-task implant branding results were 0.9787-0.9851 for accuracy, 0.9726-0.9809 for recall, and 0.9996-0.9998 for AUC, similar to previous reports. Even better, our multi-task analysis showed an accuracy of 0.9803-0.9908, a recall of 0.9727-0.9886, and an AUC of 0.9997-0.9999, which were extremely high performances compared to previous reports. Accurate image classification by CNNs using dental panoramic radiographs can be more difficult than intraoral radiography [24]. Furthermore, although there were 3, 4 and 6 types of implant classifications in several previous studies [24,25,27,28], our research features 12 types of implant brand classifications, and many types could be classified. The high classification accuracy under these conditions is very meaningful.
In our previous research on dental implant classification [26], VGG was used as a CNN. In this study, we used ResNet for the implant classification and treatment stages, and the classification accuracy improved. We consider that the large amount of data used in this study and the fact that ResNet was a useful CNN for implant classification contributed to this improvement of classification performance. Interestingly, this is also the first study to classify the treatment stages of dental implants using deep learning. In this study, we classified three stages: fixture, fixture with abutment, and prosthesis, and showed much higher classification accuracy than the implant brand. Understanding the treatment stage from dental panoramic radiographs is clinically very useful for facilitating patient implant maintenance and information sharing with the dental staff.
This study was able to simultaneously classify implant brands and treatment stages. When multi-task learning correlates several tasks, the synergistic effect of learning those tasks can improve the performance of each task.
This study was able to simultaneously classify implant brands and treatment stages. When multi-task learning correlates several tasks, the synergistic effect of learning those tasks can improve the performance of each task [29]. Through multi-task transfer learning, CNNs can learn intermediate representations shared between tasks, resulting in more functional representations and better performance [17]. In fact, the difference in performance metrics between the two methods of single-task and multi-task performance was statistically significant in all but recall of the brand classification.
Another advantage is that the learning time and the total number of parameters can be reduced by solving multiple tasks with one learner [29]. This time, the total number of parameters for the two single tasks was about twice the number of parameters for the multitask model. For example, the total number of parameters for ResNet50 was 25,659,608 for multi-tasking, 25,656,533 for brand classification, and 25,647,308 for treatment stage classification, and the total number of single-tasks was 51,303,841. Despite the small number of parameters, the classification performance was improved, and we were able to greatly benefit from the multi-task model.
In deep learning, it is necessary to prepare sufficient data for learning general-purpose parameters, but when sufficient data cannot be prepared, the learning data may be inflated to improve the recognition accuracy [30]; this is called data argumentation. On the other hand, in multi-task learning, multi-class classification is performed using a common intermediate layer, and it is possible to learn using a larger data set than when targeting a single class. If there are features common to multiple tasks in multi-task learning, the features can be regarded as performing data expansion because the learning can be performed by combining the data sets of each task [31]. In other words, the implicit data augmentation effect can be predicted. Nevertheless, it was possible to classify with high accuracy. It is considered that the common feature amount could be learned because the implant brand and the treatment stage judgment feature are in the joint part between the implant fixture and the abutment.
In this study, we were able to measure the effect size for multi-task learning. An effect size was defined as "a quantitative reflection of a magnitude of some phenomenon that is used for the purpose of addressing a question of interest" [32]. Effect size is an index that expresses the effect of experimental manipulation and the strength of association between variables. Regarding the effect size in this study, the accuracy of the implant brand classification was 0.4484, and the accuracy of the implant treatment stage classification was 0.8183; these effect sizes that could be classified as moderate and large effects, respectively. From the effect size results, the higher effect in Multitask now affects the treatment stage. It also had a moderate impact on implant brands. It is suggested that the greater contribution of Multitask was in the therapeutic phase. Our study is the first paper to show the effect size in the implant classification using deep learning, and we are confident that it will play a role as a previous study in future research. The effect sizes calculated from this experiment are useful in determining sample sizes for other studies, as there are few reports on such effect size calculations from comparisons between various deep learning models.
In this study, the accuracy of ResNet, the smallest number of parameters, was 0.980 in the brand classification and 0.9497 in the treatment stage classification. ResNet152, which has the largest number of parameters, showed extremely high accuracy, but there is a problem in practicality. In order to process all processing on the edge side by edge computing, it is necessary to consider the size of the deep learning model required for processing computational resources and tasks. In the future, it will be required to build a more accurate network with a small number of parameters. It is then desirable to export this to software available to clinicians so that dental implants can be recognized almost instantly.
This study has two limitations. First, there are several types of dental implant brands. Some implant classifications have been clarified in this study; however, they are still few, but the number of brand classifications in our study is the largest among currently published papers. Efficient classification of major dental implant brands should be the basis for classifying various types of implants around the world, including the rare implants that will be required in the future. Therefore, we believe that our research is useful; however, we expect future research to increase the number of implant brands used. Second, regarding the consideration of classification errors, this study showed very high accuracy in various CNNs. However, unfortunately, some classification errors existed even with high-precision CNNs. This study does not focus on the analysis of classification errors. In order to further improve accuracy, we would like to conduct further research with other machine learning methods, such as reinforcement learning, which will improve accuracy by using error information.

Conclusions
We have demonstrated very high performance in the classification of dental implant brands and treatment stages using CNNs in our study. Furthermore, multi-task learning of implant brand and treatment stage classification enabled more accurate analysis with a very small number of parameters. These results may play an important role in the rapid identification of dental implants in the clinical setting of dentistry.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/biom11060815/s1, Figure S1: Mean ROC curves of each CNN models for 12 types of dental implant classification and treatment stage.

Informed Consent Statement:
The institutional review committee waived the need for individual informed consent. Therefore, written/verbal informed consent was not obtained from any participant be-cause this study featured a non-interventional retrospective design, and all data were analyzed anonymously.

Conflicts of Interest:
The authors declare no conflict of interest.