1. Introduction
Panoramic radiography (orthopantomography) remains a cornerstone of dentomaxillofacial imaging because of its relatively low radiation dose, broad availability, and ability to depict the maxillofacial complex in a single acquisition [
1,
2]. Owing to these advantages, panoramic images are frequently utilized not only for routine clinical assessment but also as a primary data source for computer-assisted diagnostic applications. In recent years, advances in deep learning (DL) have substantially improved the automated analysis of panoramic radiographs, with promising results reported for tasks such as tooth detection, caries identification, and mandibular canal localization [
3,
4].
However, extending these applications to complex midfacial landmarks presents significant challenges. The pterygomaxillary fissure (PMF) represents a particularly challenging target owing to its composite radiographic appearance, marked morphological variability, and sensitivity to projection-dependent distortion [
5].
Anatomically, the PMF is a vertical cleft located between the posterior wall of the maxilla and the pterygoid process of the sphenoid bone, forming a communication between the infratemporal and pterygopalatine fossae. This region transmits important neurovascular structures, including terminal branches of the maxillary artery and the posterior superior alveolar nerve, thereby holding substantial relevance for maxillofacial surgery, regional anesthesia, and skull base-adjacent procedures [
6]. On panoramic radiographs, the PMF is commonly perceived as a radiolucent “inverted teardrop” configuration. However, this appearance should not be interpreted as the depiction of a precisely delineated or anatomically isolated structure. Rather, the radiographic image of the PMF represents a projection-based composite appearance created by overlapping adjacent osseous structures. Its apparent borders are not true anatomical boundaries but are instead influenced by projection geometry, patient positioning, image magnification, and overlapping surrounding structures. The radiographic appearance is generated primarily by the posterior wall of the maxillary sinus anteriorly and the anterior surface of the pterygoid process posteriorly, particularly the lateral pterygoid plate [
7]. Consequently, superimposition patterns and cortical definition, influenced by adjacent structures such as the maxillary tuberosity and pterygoid plates, substantially affect its radiographic clarity [
2].
The radiographic integrity of the PMF also carries diagnostic implications. Cortical interruption or loss of definition may indicate destructive processes involving the posterior maxillary wall, including aggressive sinus pathology [
2]. For AI-based systems, distinguishing true anatomical alteration from apparent invisibility related to acquisition geometry introduces additional complexity [
8]. Since panoramic imaging operates within a defined focal trough, structures located outside this image layer are susceptible to distortion or blurring, while positioning errors—particularly horizontal misalignment and midsagittal rotation—may simulate asymmetry or structural irregularity [
2,
9].
Beyond its diagnostic relevance, accurate evaluation of the pterygomaxillary junction (PMJ) and adjacent PMF is also of considerable importance in pre-surgical anatomical assessment of the posterior maxilla. Anatomical and morphological variations in this region have been associated with differences in surgical accessibility and complication patterns, emphasizing the need for careful preoperative assessment. In this context, cone-beam computed tomography (CBCT) provides detailed three-dimensional morphometric information, including PMJ thickness, height, and orientation, thereby facilitating safer and more predictable osteotomy planning [
10]. Although the integration of CBCT into preoperative evaluation protocols has been increasingly advocated, routine CBCT use for every patient remains limited due to higher radiation exposure and cost, making panoramic radiographs a commonly used modality for initial screening [
2].
Although convolutional neural networks derived from architectures such as AlexNet and ResNet have advanced craniofacial landmark localization and interpretability through techniques including Grad-CAM [
4], most AI research in dental radiology remains focused on dentition and mandibular structures.
Consequently, anatomically complex regions of the midface, including the pterygomaxillary region, have received comparatively limited attention despite their clinical relevance. Although definitive surgical planning requires CBCT-based three-dimensional evaluation, identification of the pterygomaxillary region on panoramic radiographs may still provide useful preliminary anatomical orientation during the initial assessment stage [
11,
12,
13,
14].
Given the anatomical complexity, projection sensitivity, and clinical significance of the PMF, a structured anatomical and radiographic framework may be beneficial for dataset development and model design. Effective DL-based detection requires a consideration of morphological variability, bilateral asymmetry, and physiological–pathological differences.
Beyond radiographic analysis, artificial intelligence is increasingly being explored as a clinical support tool in dental medicine, including pharmacological risk assessment and drug–drug interaction detection. These non-imaging applications may complement image-based AI systems in supporting radiographic interpretation and patient safety [
15,
16].
Recent studies have demonstrated that AI-assisted approaches may provide rapid, standardized, and reproducible support within dental radiographic assessment workflows. Nevertheless, despite their promising performance, these systems should not be regarded as replacements for clinician expertise. The potential for false-positive findings, reduced interpretability, and variability in model performance across different imaging conditions continue to represent important limitations that necessitate cautious clinical integration [
17,
18,
19].
Our primary objective in this study is to evaluate the feasibility and diagnostic performance of a deep learning-based artificial intelligence model, specifically utilizing the U2-Net architecture, for the automated segmentation of the pterygomaxillary fissure on dental panoramic radiographs. Clinically, we investigate whether AI-supported panoramic analysis can provide more standardized anatomical landmark identification during preliminary anatomical assessment. This work is not intended to substitute three-dimensional cone-beam computed tomography (CBCT) or expert clinician decision-making, but rather to function as a preliminary technical and pre-clinical validation of automated landmark detection in anatomically complex craniofacial regions.
2. Materials and Methods
All procedures in this study were carried out in accordance with the principles outlined in the Declaration of Helsinki, and ethical clearance was granted by the Cyprus International University Ethics Committee (Approval No. EKK25-26/15/09). In our study, we aimed to achieve the segmentation of the pterygomaxillary fissure by using a U2-Net-based deep learning approach. A randomized allocation approach was implemented to partition the dataset into three separate groups: 70% for model training, 20% for validation, and 10% for testing.
Panoramic radiographs were retrospectively retrieved from institutional archives. Images exhibiting poor quality or noticeable artifacts were excluded from analysis. Following screening, 270 anonymized panoramic radiographs satisfied the eligibility criteria and were retained for the study. Image acquisition was performed using a Newtom GO 3D/2D panoramic imaging system (Quantitative Radiology s.r.l., Verona, Italy) under standardized settings of 80 kVp, 8 mA, and a 14.2 s exposure duration. The radiographs were subsequently converted into PNG format and imported into the Computer Vision Annotation Tool (CVAT v1.7.0) to facilitate manual labeling by the evaluators. The pterygomaxillary fissure was identified as a complex, projection-dependent radiolucent anatomical landmark on the panoramic radiograph. Two examiners independently assessed the radiographs and labeled the pterygomaxillary fissure on the images using CVAT v1.7.0. To evaluate interobserver reliability, the annotations of the two examiners were compared using Cohen’s kappa coefficient, yielding a value of 0.86. To assess intraobserver reliability, one examiner repeated the annotation procedure for all 270 radiographs after a one-week interval under the same conditions. The intraobserver agreement was evaluated using Cohen’s kappa coefficient (κ = 0.88). Subsequently, any remaining annotation discrepancies between the two examiners were resolved by a third senior specialist to establish the final ground truth.
This investigation focused on developing a machine learning-based segmentation model for identifying the pterygomaxillary fissure in two-dimensional panoramic radiographs. The framework employed the U2-Net (U-square Net) architecture, a deep learning model widely recognized for its strong performance in semantic segmentation applications.
The workflow implemented in this study consisted of several stages. Initially, image preprocessing was performed. Subsequently, the trained model classified each image pixel as either belonging to the pterygomaxillary fissure or the background. The final output was the segmentation of the pterygomaxillary fissure.
The study included 270 adult patients (132 males and 138 females; mean age: 52.3 ± 14.7 years; range: 18–70 years) who underwent panoramic radiography at the Cyprus International University Faculty of Dentistry. The mean age was 51.8 ± 15.2 years for male patients and 52.7 ± 14.3 years for female patients (
p > 0.05) (
Table 1). Patients were retrospectively selected from the clinical archive, and demographic data were obtained from electronic medical records associated with each anonymized radiograph.
Inclusion Criteria: (1) Digital panoramic radiographs obtained from adult patients aged 18–70 years, and (2) images displaying clear and geometrically acceptable bilateral visibility of the pterygomaxillary fissure (PMF) region.
Exclusion Criteria: Radiographs were strictly excluded if they presented (1) significant patient positioning errors or severe motion artifacts, (2) extensive maxillofacial deformities or pathologies distorting the posterior maxillary anatomy, or (3) localized contrast degradation, overexposure, or beam hardening that rendered the posterosuperior maxillary structures uninterpretable.
The dataset comprised 2D panoramic radiographs, with a total of 270 images utilized for pterygomaxillary fissure segmentation. Of these, 189 images were allocated for training, 54 for validation, and 27 for testing. Normalization was applied using a fixed-window min–max scaling approach. This standardized preprocessing approach was consistently applied across the entire dataset to ensure training stability and uniform pixel-intensity scaling for the U2-Net architecture. The dataset was randomly partitioned into training, validation, and testing subsets comprising 70%, 20%, and 10% of the data, respectively. To enhance data diversity and improve model robustness, horizontal flipping augmentation was implemented with a probability of 0.5.
Semantic segmentation refers to the process of categorizing every individual pixel within an image into predefined classes. In this study, pixels were classified as either representing the pterygomaxillary fissure or background regions (
Figure 1) using the U
2-Net framework, which was developed as an enhanced version of the conventional U-Net architecture. U
2-Net follows an encoder–decoder structure, where the encoder progressively extracts contextual and semantic features through down sampling operations, while the decoder reconstructs spatial information through up sampling layers to generate detailed segmentation outputs. This dual-path design enables the network to retain fine anatomical details while simultaneously learning complex semantic representations required for precise pixel-level classification.
Accurate segmentation of panoramic radiographs remains challenging because anatomical landmarks, including the pterygomaxillary fissure, are often surrounded by overlapping and structurally complex regions. Although U-Net has demonstrated strong performance in medical image segmentation, its conventional single-scale architecture may have limitations in capturing subtle anatomical variations. To overcome this issue, U2-Net incorporates nested U-shaped residual modules operating across multiple scales, enabling richer feature extraction and enhanced supervision depth without substantially increasing computational complexity. This multiscale design makes the architecture particularly advantageous for panoramic radiographic analysis, where slight grayscale variations can significantly influence segmentation performance.
The segmentation framework employed the U2-Net architecture with an input dimension of 512 × 1024 × 1. To improve feature learning across different resolution levels, deep supervision was incorporated throughout the network structure. A batch size of 2 was selected to ensure efficient utilization of GPU memory resources during training. To reduce the risk of overfitting and promote stable convergence, optimization was performed using the AdamW algorithm with a learning rate of 0.0002.
To address the imbalance between the pterygomaxillary fissure and background categories, a weighted Dice loss function was adopted. The network generated predictions for two output classes corresponding to the pterygomaxillary fissure and the background region. The implementation was developed using a Python (v3.10; Python Software Foundation, Wilmington, DE, USA) and JAX (v0.6.0; Google Research, Mountain View, CA, USA) version of the U2-Net framework. Model training and experimental procedures were executed using an NVIDIA® GeForce® RTX 3090 graphics processing unit (NVIDIA Corporation, Santa Clara, CA, USA). The training process was conducted over 500 epochs.
3. Results
The dataset comprised 2D panoramic radiographs, with a total of 270 images utilized for pterygomaxillary fissure segmentation. The dataset was partitioned on a random basis, with approximately 70% (n = 189) allocated for training, 20% (n = 54) for validation, and 10% (n = 27) for testing.
Prior to presenting the quantitative findings, the evaluation measures applied in this study are outlined. Intersection over Union (IoU) was employed to assess the degree of overlap between the predicted segmentation output and the reference annotation. It is computed as the ratio between the overlapping area and the combined area of both segmentations, producing values from 0 to 1, where a value closer to 1 indicates greater agreement.
The Dice coefficient was used to evaluate segmentation performance and is expressed as 2|A∩B|/(|A| + |B|), where A denotes the predicted segmentation and B represents the ground truth. Similarly to IoU, Dice values range from 0 to 1, with higher values indicating improved segmentation accuracy. Additionally, precision was used to measure the proportion of correctly identified positive predictions, whereas recall was used to evaluate the model’s ability to detect the target anatomical structures. For the independent test dataset, 95% confidence intervals (95% CIs) were calculated for the Dice coefficient, IoU, Precision, Recall, and F1-score to provide an estimate of the statistical reliability of the segmentation performance metrics.
Upon evaluation with the test dataset, the pterygomaxillary fissure segmentation model achieved a Dice coefficient of 0.904 (95% CI: 0.876–0.930) and an Intersection over Union (IoU) of 0.846 (95% CI: 0.810–0.879). Precision and recall were computed to be 0.921 and 0.902, respectively, yielding an F1-score of 0.911. During training, the model reached a maximum validation Dice coefficient of 0.910, with a validation IoU of 0.844 and a validation accuracy of 0.998. The training set performance yielded a Dice coefficient of 0.982, an IoU of 0.971, and an accuracy of 0.999, with a training loss of 0.031. These metrics indicate the model’s strong capability to accurately segment the pterygomaxillary fissure in panoramic radiographs (
Table 2).
4. Qualitative Evaluation of Segmentation Performance
To evaluate the model’s behavior across clinically variable situations, a qualitative analysis was conducted on representative cases from the test dataset. As illustrated in
Figure 1, the proposed U
2-Net model successfully delineated the boundaries of the pterygomaxillary fissure in radiographs presenting standard anatomical contrast and positioning. In contrast, borderline or more challenging cases, characterized by substantial anatomical superimposition, horizontal misalignment, or low grayscale contrast, are presented in
Figure 2 to outline the model’s operational boundaries. In these sub-optimal imaging conditions, such as arches with extensive metallic restorations causing localized beam hardening and contrast degradation (
Figure 2A) or cases presenting severe projection-dependent structural asymmetry (
Figure 2B), the model occasionally showed slight over- or under-segmentation. Showcasing these qualitative variations underscores the necessity of clinician verification under challenging clinical scenarios.
Figure 3 and
Figure 4 show representative examples of pterygomaxillary fissure (PMF) segmentation mapping comparisons from the independent test dataset.
Figure 3A and
Figure 4A present the ground truth annotations manually delineated by the senior specialist, while
Figure 3B and
Figure 4B demonstrate the corresponding automated U
2-Net model predictions.
Figure 3 illustrates a representative segmentation result in which the model prediction closely matches the expert reference annotation.
Figure 4 depicts a challenging, borderline clinical scenario where adjacent osseous superimposition caused the model to restrict its boundary to the primary radiolucent body of the fissure.
5. Discussion
This study developed a U2-Net-based artificial intelligence model for the automatic segmentation of the pterygomaxillary fissure (PMF) on panoramic radiographs. In the present study, the proposed model achieved a Dice coefficient of 0.904 and an IoU of 0.846, indicating promising segmentation performance despite the anatomical complexity and projection-dependent appearance of the region. Although the obtained performance was not perfect, this finding may be expected considering that the PMF does not represent a discrete osseous structure but rather a composite radiographic projection influenced by superimposition, patient positioning, and adjacent anatomical components.
When interpreting these findings, it is important to consider the inherent difficulty of the segmentation target. Previous deep learning studies in dental radiology have frequently reported Dice coefficients exceeding 0.90 for relatively well-defined anatomical structures such as teeth, mandibular canals, and maxillary sinuses [
20,
21].
These structures typically exhibit relatively high radiographic contrast, well-demarcated boundaries, and limited morphological variability, thereby facilitating more consistent segmentation performance. Specifically, previous studies utilizing advanced architectures such as DoubleU-Net for dental segmentation have reported Dice coefficients as high as 92.8%, outperforming standard U-Net models [
22]. The Dice score of 0.904 achieved for the PMF in the present study is comparable to previously reported segmentation performances for anatomically less complex structures. These findings suggest that the proposed U
2-Net architecture is capable of capturing subtle, low-contrast anatomical features despite the projection-dependent and structurally complex nature of the pterygomaxillary region.
In contrast, the pterygomaxillary fissure represents a substantially more challenging target because of its composite projection-derived radiographic appearance, low contrast, anatomical variability, and frequent superimposition with adjacent osseous structures [
9,
10].
Compared with well-demarcated dental structures, PMF segmentation remains inherently more challenging because of anatomical superimposition, projection dependency, and low radiographic contrast on panoramic images. This difficulty is reflected in the variability of segmentation performance reported across anatomically complex structures. For instance, a recent U
2-Net-based study on pulp and pulp stone segmentation reported Dice coefficients of 0.840 and 0.759, respectively [
23]. The model achieved a Dice score of 0.904 despite the anatomically complex and projection-dependent nature of the PMF, suggesting that the proposed annotation and training strategy was capable of capturing regions with variable morphology and frequent superimposition.
Recent advances in deep learning (DL) have demonstrated considerable potential in dental and medical image analysis, particularly in tasks involving anatomical landmark detection and radiographic interpretation. Convolutional neural network-based systems have shown successful performance in applications such as tooth detection, mandibular canal localization, and implant-related image analysis. However, most existing studies have focused on relatively well-defined dentoalveolar structures, whereas anatomically variable midfacial regions have received comparatively limited attention. These findings support the feasibility of AI-assisted PMF segmentation despite the anatomical complexity and radiographic variability of the region.
Artificial intelligence is increasingly being applied in oral and maxillofacial imaging and related clinical disciplines, particularly in areas such as radiographic interpretation, anatomical landmark detection, treatment support, and outcome prediction [
24,
25].
AI-based systems have been explored to support image interpretation and anatomical landmark identification in dental imaging [
26]. Considering the existing literature on AI applications in dental imaging, we observed a lack of research specifically focusing on the automated identification of the pterygomaxillary fissure on panoramic radiographs.
The posterosuperior maxillary region is anatomically complex because it is deep-seated, difficult to visualize, and located near important neurovascular structures. Accurate and reproducible identification of anatomical landmarks in this region may therefore improve anatomical orientation and enhance the consistency of landmark recognition during preliminary anatomical assessment and pre-surgical screening [
27,
28,
29].
The pterygomaxillary fissure serves as a key anatomical communication between the infratemporal and pterygopalatine fossae and is closely related to critical neurovascular structures. Within the pterygopalatine fossa, vascular elements are located anteriorly, while neural structures lie posteriorly. Lateral to the fissure, the pterygoid venous plexus is present, and the internal maxillary artery traverses the fissure toward the sphenopalatine foramen. Because of these anatomical relationships, the pterygomaxillary fissure represents an important landmark within the posterosuperior maxillary region. Accurate recognition of the pterygomaxillary region on panoramic radiographs may provide useful supportive information during the initial evaluation stage and contribute to more standardized landmark identification. However, because panoramic radiography is inherently a two-dimensional imaging modality, definitive surgical planning and detailed risk assessment require three-dimensional evaluation using cone-beam computed tomography (CBCT) [
5,
30].
Consistent with current oral and maxillofacial surgery guidelines, the use of multiple imaging modalities for the evaluation of complex maxillofacial structures has been recommended, including panoramic radiography, cephalometric imaging, CT, MRI, and CBCT [
31]. Among these modalities, panoramic radiography offers several practical advantages, including lower radiation exposure, wider anatomical coverage, greater patient comfort, and ease of acquisition. However, its inherently complex anatomical superimposition and relatively lower image resolution may complicate image interpretation and reduce diagnostic reliability, particularly in anatomically intricate regions [
32].
The interpretation of PMF segmentation results should consider the inherent limitations of panoramic radiography. Because the PMF represents a projection-derived radiographic appearance rather than a sharply delineated anatomical entity, anatomical superimposition and image acquisition factors may influence its visibility and reproducibility. This characteristic likely contributed to the segmentation difficulties observed in some cases within the present study. The discrepancy between the exceptionally high training performance (Dice: 0.982, IoU: 0.971) and the lower metrics on the independent test set (Dice: 0.904, IoU: 0.846) indicates a mild overfitting tendency, a well-documented challenge when optimization is performed on relatively compact medical imaging datasets. Furthermore, because panoramic radiography is inherently a two-dimensional projection, the anatomical boundaries of the PMF lack an absolute, distinct osseous margin, unlike three-dimensional CBCT assessments. This absence of a 3D gold standard means that the ground truth annotations represent a consensus on a composite projection rather than an isolated anatomical entity, which inherently challenges the model’s generalizability across diverse clinical imaging conditions.
Although CBCT provides a more accurate three-dimensional assessment of the pterygomaxillary region and remains essential when detailed anatomical evaluation is clinically required, its routine use may not always be feasible because of higher radiation exposure, increased cost, and limited accessibility in some clinical settings. In daily practice, panoramic radiography continues to be widely used as an initial imaging modality for the assessment of the posterior maxillary region [
5]. However, the difficulty in consistently identifying this structure in panoramic images highlights the need for more objective and reproducible detection methods, such as artificial intelligence-based approaches. Therefore, AI-assisted detection systems may serve as supportive tools for preliminary anatomical landmark identification on panoramic radiographs, particularly when CBCT is not routinely indicated or immediately available. In this context, the proposed model may contribute to improved reproducibility, reduced operator-dependent variability, and more standardized radiographic landmark identification, while functioning as an adjunctive chairside support tool, particularly for less experienced clinicians.
To the best of our knowledge, this is among the first studies to investigate the automated detection of the pterygomaxillary fissure on panoramic radiographs using artificial intelligence.
To ensure a clinically meaningful evaluation, the performance of the proposed U2-Net model was directly benchmarked against a human clinician standard. The ground truth labels utilized in this study were established through the independent annotations of two experienced dental examiners, cross-verified by a third senior specialist to resolve any anatomical ambiguities. Consequently, the test dataset metrics—such as the Dice coefficient of 0.904 and precision of 0.921—reflect the level of agreement between the AI model and expert-defined reference annotations. While this preliminary comparison suggests that the AI model can achieve a level of agreement approaching expert-defined annotations, prospective multicenter studies evaluating clinical workflows, clinical outcomes, and AI-assisted versus unassisted clinician performance are still required before clinical implementation.
The findings of the present study should be interpreted within the context of a preliminary technical and pre-clinical investigation. The aim was to explore whether AI-assisted analysis of panoramic radiographs could support anatomical landmark identification and improve the reproducibility of radiographic interpretation in the posterior maxillary region. Although the obtained segmentation performance demonstrates promising technical potential, the absence of a prospective evaluation comparing AI-assisted versus unassisted clinician performance in a real-world clinical workflow represents an important limitation of the present study. Therefore, future clinical studies, external validation, and multicenter datasets are required to fully evaluate its clinical utility and practical impact before widespread implementation.
6. Limitations
This study has several limitations that should be considered when interpreting the findings. First, the relatively small dataset size may have limited the anatomical variations captured during training. The small size of the test dataset in particular (27 images) limits our ability to fully guarantee the robustness and broad generalizability of the current results. This highly restricted test cohort compromises statistical robustness and may introduce an overestimation of the model’s true performance. Furthermore, although each panoramic radiograph belonged to a unique individual patient, evaluating bilateral structures from the same patient could still theoretically introduce subtle feature correlations akin to data leakage. Although the proposed model demonstrated favorable segmentation performance, larger datasets and more extensive independent test sets may further improve model robustness and generalizability.
Second, all panoramic radiographs were retrospectively collected from a single institution using the same panoramic imaging device and standardized acquisition parameters. While this contributed to image consistency, it may limit the applicability of the proposed model to images obtained using different imaging systems, acquisition protocols, and patient populations.
In addition, the present study evaluated only the segmentation performance of the AI model on panoramic radiographs and did not investigate its potential effects on clinical decision-making, diagnostic performance, surgical outcomes, or complication rates. Therefore, the proposed system should be considered a supportive tool for preliminary anatomical landmark identification rather than a substitute for comprehensive radiological and clinical evaluation.
Finally, external validation using larger multicenter datasets obtained from different panoramic imaging devices will be necessary before potential clinical implementation of the proposed model. Future studies should also investigate the performance of the proposed system under heterogeneous clinical conditions and evaluate its potential contribution to pre-surgical assessment workflows.
7. Conclusions
The U2-Net-based deep learning model developed in this study demonstrates promising segmentation performance in identifying the pterygomaxillary fissure on dental panoramic radiographs within the context of a preliminary technical and pre-clinical investigation. While this automated approach may provide reliable support for preliminary anatomical landmark identification and pre-surgical screening, it should be used strictly as an adjunctive chairside tool. Future prospective studies are required to evaluate its impact on diagnostic performance, landmark identification consistency, and workflow efficiency in clinical practice.