1. Introduction
Jaw cysts are highly prevalent [
1] yet frequently asymptomatic; thus, they often remain undiagnosed until their dimensions require radical surgery [
2,
3]. At such a late stage, the extent of the defect can present a risk for neighboring anatomical structures, including teeth, alveolar bone, and nerves [
4]. Furthermore, the time required for complete postoperative osseous regeneration is exponentially proportional to the preoperative volume of the defect [
5]. Consequently, timely diagnosis ensures a smaller osseous defect and a shorter regeneration time, thus an overall better prognosis.
While jaw cysts are identifiable at an early stage on panoramic radiographs (i.e., orthopantomograms, OPGs) [
1], in practice, they are usually incidental findings [
6]. Supporting the radiological diagnosis of jaw cysts has thus been a focus of artificial intelligence research in oral medicine [
7,
8,
9]. Specifically, previous work has applied both object detection [
10] and classification [
11] methods to oral cyst diagnostics using OPGs. Notwithstanding, the explainability of existing methods remains a concern [
1,
12,
13], and no previous work has focused on a machine learning approach that purposefully emulates the explainable human thought process in a clinical setting.
Clinically, experience from past encounters as well as contextual knowledge, including the spatial relation of the cystic lesion to neighboring anatomical structures (e.g., proximity of a tooth apex to a radicular cyst) is frequently used to establish a preliminary diagnosis until further imaging is performed [
14,
15] or a definitive histopathological diagnosis is made. This thought process is well-described in medical research and education as clinical diagnostic reasoning [
16,
17,
18].
We hypothesized that clinical diagnostic reasoning can be emulated by using machine learning to individually replicate each step: first detecting a cystic lesion, then recognizing neighboring anatomical structures and their proximity to or overlap with the lesion, and finally using these as contextual information to establish a preliminary classification. Thus, the aim of our study was to purposefully emulate the human clinical diagnostic reasoning process step by step through the implementation of a combined object detection and image segmentation approach for the detection and preliminary classification of cystic lesions on OPGs.
4. Discussion
In this study, we aimed to apply a combined object detection and image segmentation approach to emulate clinical diagnostic reasoning in the detection and classification of cystic lesions on OPGs. Our object detection model achieved an average precision of 0.42 (IoU: 0.50, maximal detections: 100) and an average recall of 0.394 (IoU: 0.50–0.95, maximal detections: 100). Our classification model achieved a sensitivity of 0.84 for odontogenic cysts and 0.56 for non-odontogenic cysts as well as a specificity of 0.59 for odontogenic cysts and 0.84 for non-odontogenic cysts (IoU: 0.30). Comparing our results to an international human control group of ten dental professionals, we found that the human control group achieved a sensitivity of 0.70 for odontogenic cysts, 0.44 for non-odontogenic cysts, and 0.56 for OPGs without cysts as well as a specificity of 0.62 for odontogenic cysts, 0.95 for non-odontogenic cysts, and 0.76 for OPGs without cysts. Notwithstanding the variability inside the human control group, these results are largely comparable to the results from our classification model. Taken together, the results support the plausibility of our approach in emulating clinical diagnostic reasoning in detecting and classifying jaw cysts.
The novelty of our study lies in its aim as well as its use of both multicenter datasets and international human controls. As opposed to developing models with the highest detection accuracy, we specifically aimed to replicate the multi-step thought process of human clinical reasoning in the radiographic diagnosis of jaw cysts. While the simultaneous detection and classification of jaw cysts and tumors has previously been published with good results [
27], to our knowledge, our combined object detection and image segmentation approach is the first that is deliberately analogous to the way a clinician makes a preliminary diagnosis. Furthermore, to mitigate location bias, we used a multicenter dataset to train our models and then compared them to an international human control group which consisted of dental professionals from seven different countries with different dental backgrounds and education levels. The size of our datasets is largely comparable to the most recent work in this field [
10,
11]. Notably, one recent study used more than eleven times as many negatives as positives for pretraining, resulting in a massive overall dataset from a single center [
1]. Nonetheless, the number of positives is comparable to our multicenter dataset. In contrast to our work, this previous study also applied segmentation masks to the lesions themselves. While segmentation masks can be more accurate than bounding boxes, cystic lesions do not always present as sharply defined radiolucencies on OPGs, hence our decision to use bounding boxes.
Regarding the individual models in our study, it is apparent that the classification model performs better than the object detection model. In fact, the difference in F1-score between the classification models based on predicted and ground truth bounding boxes was only 0.01 for odontogenic cysts and 0.08 for non-odontogenic cysts (IoU: 0.30). This implies a good differentiability of odontogenic and non-odontogenic cysts based on their spatial relations to neighboring anatomical structures. Importantly, this differentiability does not seem to suffer from imperfect detection of the lesions themselves. A further performance difference can be observed between the classification of odontogenic and non-odontogenic cysts. A plausible explanation is the lower heterogeneity with which odontogenic cysts appear on OPGs. Odontogenic cysts present as periapical radiolucencies; thus, any detected odontogenic cyst shows a high overlap with the segmentation model for individual teeth. Such strict anatomical requirements are not applicable for non-odontogenic cysts, which might explain the lower classification performance of our models. It should also be noted that our maxillary segmentation model reached a lower Dice score (0.465) compared with our other segmentation models (0.644–0.978). One possible explanation is the extensive overlap of the maxilla with other anatomical structures in OPGs. The upper dentition, particularly in the posterior, nearly completely overlaps the maxilla itself, rendering recognition potentially difficult. The maxilla further does not feature any large, distinct, overlap-free area. In comparison, the mandibular rami are substantial, easily recognizable areas with almost no overlap at all except for the mandibular foramen.
Our study is limited by the low number of study centers which potentially compromised the generalizability of the results. We further did not differentiate between histopathological diagnoses other than odontogenic and non-odontogenic cysts. While this was a deliberate decision to lower class imbalance and increase the number of OPGs per diagnosis, a more granular differentiation would have allowed the development of a classifier with higher clinical applicability. To do so, a larger sample size would have been needed, especially for histopathological diagnoses with lower prevalence (e.g., ameloblastoma). This would also enable screening for lesions with aggressive growth and potential malignant transformations (e.g., keratocyst) [
28]. A limitation with regard to the human control group is that while its members were trained on the definitions of the diagnoses as well as the usage of the online platform before conducting their analysis, no further calibration was performed. This, along with their varying experience levels, represents a potential source of bias in the results of the human control group.
The clinical implications of our results are twofold. First, our models could be utilized to aid diagnostics as well as the surgical decision-making process. Several deep learning tools are already applied in clinical practice [
29,
30], yet to our knowledge, this is the first study augmenting the object detection of cystic lesions with an image segmentation approach of neighboring anatomical structures to predict the pathogenesis of the lesion. This methodology mimics the human clinical decision-making process in everyday practice. In combination with a clinical examination, our classifier can be used to determine whether a dental pathology is involved, which further influences the treatment. The necessity of a thorough clinical examination should be emphasized, as our models do not provide information with regard to the vitality, mobility, and discoloration of affected or neighboring teeth. Thus, in clinical practice, our models should only be used in combination with a clinical examination. Second, our approach could serve as a baseline for further research into machine learning methods in the diagnosis of cysts and tumors of the jaw. Recent work has already employed radiomics to identify features characteristic to certain lesions. This could be combined with our analysis of spatial relations of the lesions to neighboring anatomical structures to increase diagnostic accuracy potentially further.