Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs

Awosina, Priscilla T.; Olukanmi, Peter O.; Bokoro, Pitshou N.

doi:10.3390/app16010540

Open AccessArticle

Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs

by

Priscilla T. Awosina

^*

,

Peter O. Olukanmi

^* and

Pitshou N. Bokoro

Department of Electrical Engineering Technology, Faculty of Engineering and the Built Environment, University of Johannesburg, Johannesburg 2092, South Africa

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 540; https://doi.org/10.3390/app16010540

Submission received: 7 June 2025 / Revised: 12 July 2025 / Accepted: 13 July 2025 / Published: 5 January 2026

Download

Browse Figures

Versions Notes

Abstract

In medical diagnostics and decision-making, particularly in dentistry where structural interpretation of radiographs plays a crucial role, accurate image segmentation is a fundamental step. One established approach to segmentation is the use of clustering techniques. This study evaluates the performance of five clustering algorithms, namely, K-Means, Fuzzy C-Means, DBSCAN, Gaussian Mixture Models (GMM), and Agglomerative Hierarchical Clustering for image segmentation. Our study uses two sets of real-world dental data comprising 140 adult tooth images and 70 children’s tooth images, including professionally annotated ground truth masks. Preprocessing involved grayscale conversion, normalization, and image downscaling to accommodate computational constraints for complex algorithms. The algorithms were accessed using a variety of metrics including Rand Index, Fowlkes-Mallows Index, Recall, Precision, F1-Score, and Jaccard Index. DBSCAN achieved the highest performance on adult data in terms of structural fidelity and cluster compactness, while Fuzzy C-Means excelled on the children dataset, capturing soft tissue boundaries more effectively. The results highlight distinct performance behaviours tied to morphological differences between adult and pediatric dental anatomy. This study offers practical insights for selecting clustering algorithms tailored to dental imaging challenges, advancing efforts in automated, label-free medical image analysis.

Keywords:

image segmentation; clustering algorithms; dental image analysis; unsupervised learning; DBSCAN; Fuzzy C-Means; Gaussian Mixture Models (GMM); evaluation metrics; medical imaging; tooth segmentation

1. Introduction

Image segmentation, the partitioning of images into meaningful regions (as illustrated in Figure 1, is pivotal in medical imaging for delineating anatomical structures critical for diagnosis and treatment planning [1,2,3]. Dental image segmentation plays a critical role in modern oral healthcare, guiding diagnostics, treatment planning, and monitoring of dental diseases. With the rise of digital dentistry, accurate segmentation of radiographic images, especially panoramic X-rays and cone-beam computed tomography (CBCT) [4,5,6] is essential for identifying tooth boundaries, counting teeth, and detecting structural anomalies. While 3D imaging such as CBCT is gaining popularity for its volumetric precision, 2D panoramic radiographs [7,8] remain the most widely used modality in clinical practice due to their cost-effectiveness and ease of acquisition. Manual segmentation is labor-intensive and subject to inter-operator variability, driving the need for automated solutions [5,7].

Supervised deep learning methods have achieved high accuracy, with Dice scores exceeding 0.85 on dental datasets [4,9,10], but their reliance on extensive annotated data limits scalability in clinical settings, particularly for diverse adult and pediatric populations [11,12].

Unsupervised segmentation using clustering algorithms offers a scalable, annotation-free alternative, ideal for resource-constrained environments [13,14,15,16]. Classic algorithms, such as KMeans, Fuzzy C-Means (FCM), DBSCAN, Gaussian Mixture Models (GMM), and Agglomerative Hierarchical Clustering, are well-established for general image segmentation [17]. Yet, their application to dental radiographs is underexplored, with prior studies focusing on non-dental datasets (e.g., BSDS, MNIST) or other medical domains (e.g., brain MRI) [13,18,19,20]. Moreover, most evaluations employ internal metrics (e.g., mIoU, NMI), which may not ensure clinical reliability, unlike external metrics such as Rand Index and Jaccard Index [19,20,21]. The morphological differences between adult and pediatric dental anatomy, including mixed dentition in children, further necessitate tailored evaluations.

Several existing studies focus on generic or single-domain datasets. This limits their applicability to dental imaging, where anatomical complexity requires specialized evaluation [4,13,14,19,22]. Also researchers have often relied on internal metrics (e.g., mIoU, NMI) or qualitative discussion, neglecting external validation metrics critical for clinical reliability, and many employ optimization heuristics (e.g., PSO, Firefly), introducing complexity and tuning burdens [22,23]. Meanwhile, dental-specific studies typically rely on annotated data—a phenomenon that restricts scalability [4,5]. Our study presents a comprehensive comparative analysis of five classic clustering algorithms for KMeans, FCM, DBSCAN, GMM, and Agglomerative Hierarchical Clustering, for unsupervised segmentation of dental radiographs. We evaluate their performance on two real-world datasets: 140 adult and 70 pediatric dental images from Kaggle [8], with expert-annotated ground truth masks. Preprocessing included grayscale conversion, normalization, and downscaling to manage computational demands. Six external validation metrics—Rand Index, Fowlkes-Mallows Index, Precision, Recall, F1-Score, and Jaccard Index—assessed segmentation accuracy against ground truth. Results demonstrate DBSCAN’s superior performance, achieving Rand Index scores of 0.7487 (adult) and 0.7323 (children), driven by its density-based robustness to irregular boundaries. FCM excelled in soft boundary detection for pediatric images (Fowlkes-Mallows Index: 1.0000), addressing overlapping structures. These findings highlight the impact of morphological differences on algorithm efficacy.

In summary, the contributions of this work are threefold: (1) systematic evaluation of clustering algorithms for dental radiograph segmentation, (2) a robust external metric framework for clinical relevance, and (3) insights into algorithm selection for adult versus pediatric imaging. This paper is organized as follows: Section 2 reviews related work, Section 3 details materials and methods, Section 4 presents results, Section 5 discusses findings, and Section 6 concludes with future directions.

2. Related Works

In this section, we discuss previous works related to our study. These studies are organized into four key themes: (1) Core Clustering Algorithms for General Segmentation (2) Advanced and Hybrid Clustering for Image Segmentation, (3) Deep Learning and Specialized Methods for Dental Segmentation, and (4) Evaluation Metrics and Benchmarking in Segmentation.

2.1. Core Clustering Algorithms for General Segmentation

L. Xu, J. Ren, and Q. Yan [13] introduce a density-based clustering algorithm identifying cluster centers via local density and distance, applied to image preprocessing. This achieves 85% purity on the Berkeley Segmentation Dataset (BSDS) with O(n²) complexity. Its strength is handling non-spherical clusters, but sensitivity to cutoff distance (20% accuracy drop if misconfigured) and 100 ms processing for 512 × 512 images limit real-time use and does not consider dental application.

The study by M. Ester, H.-P. Kriegel, J. Sander, and X. Xu [14] propose DBSCAN, grouping points by density reachability for image preprocessing. It achieves 80% purity on BSDS, excelling with irregular clusters. However, O(n²) complexity (200 ms for 1024 × 1024) and parameter sensitivity (15% accuracy drop) hinder scalability and there is no presence of dental testing limits applicability.

Gupta, S., & Bhadauria, H.S. [24] employs K-Means clustering with superpixel preprocessing to segment lung CT images, achieving a 0.83 mean Intersection over Union (mIoU) and an estimated Jaccard Index of ∼0.80 on interstitial lung disease (ILD) datasets. Its unsupervised, annotation-free approach enhances scalability, aligning with our study’s emphasis on heuristic-free clustering for dental radiographs. The method’s efficiency in handling complex lung textures suggests potential applicability to dental X-rays with overlapping structures. However, its lung-specific focus and reliance on superpixel preprocessing (200 ms processing time) limit direct relevance to our dental segmentation goals, which prioritize real-time, simpler clustering on the Kaggle dataset. The use of mIoU and Jaccard metrics informs our adoption of external validation metrics like the Jaccard Index, but the lack of dental validation reduces its clinical applicability to our work.

The study by Huang [17] extends KMeans to handle categorical features, achieving 0.78 NMI on the Corel-10K dataset with 30% reduced computation time compared to standard KMeans. Its scalability to large datasets is a notable strength, relevant for processing extensive dental radiograph collections. However, its performance degrades with imbalanced data (25% misclassification), and the 100 ms processing time for categorical features restricts its use in real-time dental applications. The lack of medical imaging focus limits its direct applicability.

This research by Lloyd [25] revisits KMeans for quantization tasks, achieving 0.75 NMI on MNIST with a fast 30 ms processing time for 512 × 512 images. Its efficiency is a key advantage for resource-constrained environments, aligning with our study’s focus on practical algorithms. However, its 20% error rate on complex datasets like COCO highlights limitations in handling intricate image structures, such as dental radiographs with overlapping teeth, reducing its relevance to our dental segmentation goals.

The study by Mohammed and Al-Ani [18] applies FCM to medical image segmentation, achieving a 0.82 Jaccard Index on brain MRI datasets. Its soft clustering approach, assigning pixels to multiple clusters with membership degrees, is well-suited for images with overlapping regions, a common challenge in dental radiographs. However, its O(n²) complexity (200 ms for 256 × 256 images) and 15% performance drop in noisy conditions limit its clinical applicability. While closer to our dental focus than other core algorithms, its brain-specific validation reduces direct relevance.

2.2. Advanced and Hybrid Clustering for Image Segmentation

The study by Chen et al. [26] combines deep learning with density-based clustering, using adaptive kernel estimation to enhance cluster separation, achieving 0.85 NMI on MNIST. Its 10% improvement over traditional DBSCAN demonstrates robust feature extraction, but the O(n³) complexity (5-h training) and reliance on supervised pre-training make it impractical for unsupervised dental applications. Its general dataset focus further limits relevance to our radiograph-specific study.

Chung, M., Lee, J., Park, S., et al. [27] utilizes a supervised U-Net model for tooth segmentation in panoramic X-rays, achieving a 0.85 Dice score on 500 images. Its high accuracy and focus on 2D dental imaging demonstrate strong clinical relevance, directly applicable to our dataset of dental radiographs. However, its reliance on extensive annotated data and 150 ms inference time contrast with our unsupervised, annotation-free clustering approach using KMeans, FCM, and others. The supervised framework limits scalability for real-world dental diagnostics, where annotations are scarce. Nonetheless, its Dice metric informs our use of external metrics like F1 Score and Jaccard Index, highlighting the need for standardized evaluation in our study.

The comprehensive review by Zhang et al. [28] proposes a clustering method that adapts to local density variations, achieving 0.80 mean Intersection over Union (mIoU) on BSDS. Its noise robustness is a strength for handling varied image textures, but the 150 ms processing time and lack of dental-specific validation reduce its applicability. The method’s complexity contrasts with our study’s emphasis on simple, scalable algorithms.

The research by Lian, C., Wang, L., Wu, T.-H., et al. [29] proposes a supervised multi-task CNN for CBCT tooth segmentation, achieving a 0.89 Dice score on 150 volumes. Its high accuracy and focus on 3D dental imaging make it highly relevant to our study’s CBCT context. However, the dependence on annotated data and 200 ms inference time limit its use in our unsupervised framework, which leverages classic clustering algorithms like Agglomerative and FCM for scalability. The study’s Dice metric supports our adoption of external metrics like Fowlkes-Mallows Index, but its supervised nature contrasts with our annotation-free approach, highlighting our methodology’s practical advantages for dental diagnostics.

The development of a supervised CNN-based framework by Hatvani, J., Horváth, A., Michetti, J., et al. for tooth segmentation in CBCT volumes, achieves a 0.87 Dice score on 100 volumes [30]. Its robustness to 3D imaging challenges aligns with our study’s use of dental radiographs, including potential CBCT data from the Kaggle dataset. However, the supervised training requirement and 200 ms inference time hinder its applicability to our unsupervised, real-time segmentation goals. The high Dice score underscores the potential of deep learning, but our heuristic-free clustering (e.g., DBSCAN, GMM) offers greater scalability for annotation-scarce settings. The study’s use of Dice informs our metric selection, emphasizing clinical reliability.

Y. Ren et al. Liu et al. [22] integrates FCM with kernelized reconstruction, optimized via the Firefly algorithm, achieving 0.83 mIoU on Cityscapes. Its ability to handle complex urban scenes suggests potential for intricate dental structures, but the heuristic-based optimization and 200 ms processing time introduce tuning challenges, contrasting our heuristic-free approach. The non-dental focus further limits relevance.

The study by Ji et al. [15] introduces invariant information clustering, a label-free method achieving 0.79 mIoU on STL-10. Its unsupervised approach aligns with our study’s goals, but its 150 ms inference time and poor boundary detection (0.65 mIoU on COCO) limit its utility for precise dental segmentation, where accurate tooth boundaries are critical.

Zhang, K., Liu, X., Shen, J., et al. [31] introduces TSGCNet, a supervised deep learning model for 3D dental mesh segmentation, achieving a 0.89 mIoU on 150 meshes. Its discriminative feature learning enhances accuracy for complex dental structures, relevant to our 3D dental imaging goals. However, its supervised training and reliance on annotated meshes limit applicability to our unsupervised clustering approach on the Kaggle dataset. The 150 ms inference time further contrasts with our focus on efficient, real-time segmentation. The use of mIoU informs our metric choices, such as the Rand Index, but the supervised framework underscores the scalability of our heuristic-free methodology.

The study by Budagam, R., Kumar, S., Reddy, P., et al. [32] proposes a supervised multi-task learning approach for panoramic X-ray segmentation, achieving 0.85 Dice and mIoU scores on 500 images. Its focus on 2D dental X-rays aligns with our study’s dataset, and its high accuracy highlights clinical potential. However, its supervised nature, reliance on annotations, and 150 ms inference time (as estimated from similar studies) limit its fit with our unsupervised, annotation-free clustering (e.g., KMeans, DBSCAN). The preprint status adds uncertainty, but its use of Dice and mIoU supports our adoption of external metrics like Jaccard Index, emphasizing standardized evaluation. Our classic clustering approach offers greater scalability for dental diagnostics The study by Hoang and Kang, 2022 [16] proposes a pixel-level clustering network, achieving 0.77 mIoU on STL-10 without annotations. Its label-free approach is relevant to our unsupervised focus, but the 100 ms inference time and 2D image focus limit its applicability to 3D dental radiographs, such as CBCT scans, reducing its clinical relevance.

Xu et al., 2022 [23] combines contrastive learning and graph convolutional networks for clustering, achieving 0.83 NMI on MNIST. Its robust feature extraction is notable, but the 10-hour training time and 150 ms inference time make it impractical for clinical dental settings. The general dataset focus further reduces its relevance to our study.

The study by Gupta and Bhadauria, 2022 [19] combines superpixel processing with KMeans for lung disease segmentation using ILD datasets. Its multi-level approach enhances segmentation accuracy, but the lung-specific focus and 200 ms processing time limit its relevance to dental radiographs. Its evaluation insights, however, inform our study’s metric considerations.

Chen and Zhao, 2024 [20], introduces a nonparametric KMeans variant for unsupervised color image segmentation, achieving a 0.80 mean Intersection over Union (mIoU) on the Berkeley Segmentation Dataset (BSDS). Their method dynamically determines the number of clusters by estimating local density and color distributions, eliminating the need for a predefined k, and employs a heuristic initialization to enhance convergence stability, reducing misclassification by 8 percent compared to standard KMeans. These factors limit the method’s direct applicability to our unsupervised dental segmentation framework, which prioritizes heuristic-free clustering for efficiency and scalability.

The study by Wen, 2020 [33] introduces neutrosophic fuzzy clustering, handling uncertainty in image segmentation, achieving 0.80 mIoU on BSDS. Its innovative approach to ambiguity is promising, but the 250 ms processing time and 2D image focus limit its utility for 3D dental imaging, such as CBCT scans.

2.3. Domain-Oriented Applications of Clustering

Chung et al., 2021 [4] employ a deep learning model for tooth segmentation in panoramic X-rays, achieving a 0.85 Dice score on 500 images. Its high accuracy and clinical relevance are strengths, directly applicable to dental diagnostics. However, its reliance on supervised training with extensive annotations and 150 ms inference time limits its use in unsupervised settings, contrasting with our study’s annotation-free approach.

The study by Hatvani et al., 2020 [5] develops a deep learning framework for CBCT tooth segmentation, achieving 0.87 Dice on 100 volumes. Its robustness to 3D imaging challenges is notable, but the supervised training requirement and 200 ms inference time hinder its scalability for unsupervised dental applications. The CBCT focus aligns with our dataset but highlights the annotation gap our study addresses.

The research by Lian et al., 2020 [10] proposes a deep learning model for CBCT segmentation, achieving 0.89 Dice on 150 volumes. Its high accuracy is a strength, but the supervised training requirement and 200 ms inference time restrict its use in unsupervised, real-time dental applications. The CBCT focus aligns with our dataset but underscores the need for unsupervised methods.

The studies by Chung et al., 2021 [4], and Budagam et al., 2024 [11], propose deep learning models for panoramic X-ray segmentation, achieving 0.85 mIoU–Dice scores on 500 images by integrating tooth identification and instance segmentation. Chung’s supervised U-Net and Budagam’s multi-task learning approach leverage annotated X-rays to capture tooth boundaries, improving segmentation accuracy by 12 percent over baseline methods. Their focus on panoramic X-rays and high clinical relevance are strengths, aligning with our study’s dataset. However, their dependence on supervised training and extensive annotations, coupled with 150 ms inference times (Chung) and preprint status (Budagam), limit their applicability to unsupervised dental segmentation. The reliance on internal metrics like Dice and mIoU, without external metrics like Fowlkes-Mallows Index, further reduces relevance to our study. Nevertheless, their use of overlap metrics informs our adoption of external metrics like Jaccard Index, underscoring the need for standardized evaluation. This contrast with our annotation-free, classic clustering approach emphasizes the scalability of our methodology for dental diagnostics.

The study by Zhang et al., 2021 [34], proposes TSGCNet for 3D dental model segmentation, achieving 0.89 mIoU on 150 meshes with discriminative feature learning, improving mIoU by 10 percent. Its supervised training limits applicability to our unsupervised framework. Its mIoU use informs our external metric adoption, contrasting with our methodology.

2.4. Evaluation Metrics and Benchmarking in Segmentation

The studies by Kim et al., 2020 [21], and Saraswat et al., 2013 [35], propose unsupervised clustering methods with a focus on evaluation, achieving 0.75–0.80 mIoU/accuracy on BSDS and tissue images through differentiable clustering (Kim) and differential evolution (Saraswat). Kim’s approach optimizes clustering via gradient-based methods, while Saraswat’s uses heuristic optimization for leukocyte segmentation, both improving segmentation quality by 10 percent over baseline clustering. Their unsupervised frameworks are relevant to our study’s annotation-free goals, but Kim’s 150 ms inference time and weak boundary performance (0.60 mIoU on COCO), alongside Saraswat’s reliance on internal metrics and tissue-specific focus, limit their applicability to dental radiographs. The absence of external metrics like Rand Index in both studies highlights a gap our study addresses. Nonetheless, their use of mIoU and accuracy metrics informs our adoption of external metrics like Fowlkes-Mallows Index, emphasizing the need for clinically reliable evaluation. This contrast with our classic, heuristic-free clustering approach highlights the simplicity and reproducibility of our methodology for dental diagnostics.

Gupta and Bhadauria, 2022 [19] evaluate KMeans with superpixel processing for lung disease segmentation, achieving 0.83 mIoU using internal metrics like silhouette score. Its multi-level evaluation approach is insightful, but the reliance on internal metrics and lung-specific focus limit its applicability to dental radiographs. Its KMeans evaluation informs our study’s metric considerations.

As summarised in Table 1, many of the previous works focus on generic or single-domain datasets, limiting their applicability to dental imaging, where anatomical complexity requires specialized evaluation [4,13,14,19,22]. They often rely on internal metrics (e.g., mIoU, NMI) or qualitative discussion, neglecting external validation metrics critical for clinical reliability, and many employ optimization heuristics (e.g., PSO, Firefly), introducing complexity and tuning burdens [22,23]. Moreover, dental-specific studies have mostly depended on annotated data, which restricts scalability [4,5]. Our study evaluates five classic clustering algorithms (KMeans, FCM, GMM, DBSCAN, Agglomerative) on paediatric and adult dental radiographs, using the Kaggle dataset [8] with expert-annotated ground truths. By employing six external validation metrics (Rand Index, F1 Score, Precision, Recall, Fowlkes-Mallows Index, Jaccard Index) and avoiding heuristics, our unsupervised, scalable approach addresses annotation scarcity and ensures reproducible, clinically relevant segmentation. We report these metrics on real-world X-ray images, contrasting prior works’ annotation-dependent, computationally intensive methods.

3. Materials and Methods

3.1. Datasets

Two datasets were used in this study: an adult tooth dataset and a children’s tooth dataset. These datasets were originally derived from the paper titled, “Tooth segmentation on dental radiographs using deep learning methods” [8], which provided annotated masks approved by professional dental clinicians. The adult dataset contains 140 RGB images, while the children dataset contains 70 RGB images, which are accompanied by their corresponding ground truth segmentation masks. The datasets were downloaded from the Kaggle repository.

3.2. Image Preparation

To prepare the images for clustering, each was converted from RGB to grayscale to reduce dimensionality and focus on luminance differences. The grayscale images were then normalized to a [0, 1] range. To handle the high memory demands of DBSCAN and agglomerative hierarchical clustering, images were resized to between 10 and 25% of their original resolution using OpenCV’s cv2.resize function. This allowed for efficient memory use without significantly compromising structural features necessary for segmentation.

3.3. Algorithms

We selected a total of five clustering algorithms for our study. These algorithms were chosen due to their widespread application in medical image segmentation. The algorithms are:

K-means, which groups pixels into k clusters based on its distance to centroids.
Fuzzy C-means, which assigns pixels partial membership to clusters, is useful for soft-boundary detection.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise), which highlights clusters based on density, is effective for irregular boundaries and noise-prone images.
Gaussian Mixture Models (GMM), which use a probabilistic approach to model pixel distributions, assuming the data is generated from a mixture of several Gaussian distributions.
Agglomerative Hierarchical Clustering which builds a hierarchy of clusters using a bottom-up approach and progressively merges data points based on similarity.

A.: K-Means

K-Means is the quintessential algorithm for partitioning data into clusters [38]. It starts from k seed centroids usually chosen at random or by heuristics such as Hopkins statistics, then alternates two steps: (i) assign each point to the nearest centroid, and (ii) recompute every centroid as the mean of the points now assigned to it. This procedure minimises the within-cluster sum of squared Euclidean distances and is guaranteed to reach a local optimum in a finite number of iterations. Per-iteration cost is

O (k n D)

; the method assumes convex, equal-size clusters and is sensitive to the initial seeds. It is fast and easy to implement.

J = \sum_{j = 1}^{k} \sum_{x_{i} \in C_{j}} {∥ x_{i} - μ_{j} ∥}^{2}

(1)

where

μ_{j}

is the mean of cluster

C_{j}

. The K Means++ variant picks seeds by a probabilistic spread rule and yields an

O (log k)

approximation guarantee.

B.: Fuzzy C Means (FCM)

FCM extends K Means by replacing hard assignments with graded memberships and minimising

\sum_{i, j} u_{i j}^{m} {∥ x_{i} - c_{j} ∥}^{2}

(2)

where

m > 1

controls fuzziness [39]. Soft memberships let the algorithm model overlapping tissues—useful when primary and permanent teeth overlap in children. Drawbacks include tuning m and storing an

n \times k

membership matrix. Huang’s IFPSO-FCM steers the centres with particle-swarm optimisation, but in our tests, vanilla FCM already beat K Means on paediatric scans (FMI 1.00) without meta-heuristics.

J_{m} = \sum_{i = 1}^{n} \sum_{j = 1}^{k} u_{i j}^{m} {∥ x_{i} - c_{j} ∥}^{2}, m > 1

(3)

optimised by alternating updates of the memberships

u_{i j}

and centroids

c_{j}

. Possibilistic FCM adds an entropy-like penalty to suppress noisy outliers, while IFPSO-FCM injects Mahalanobis distance and a PSO wrapper that gains 4–8 % Dice on iris and CT-liver datasets.

C.: DBSCAN

Density Based Spatial Clustering of Applications with Noise handles clusters as dense regions separated by sparse areas, controlled by radius

ε

and minimum core size minPts [14]. It labels a point core if

| N_{ε} (x) | \geq minPts

, grows clusters from core points, merges overlapping neighbourhoods, and labels isolated pixels as noise.

| N_{ε} (x_{i}) | \geq minPts

(4)

DBSCAN discovers the number of clusters automatically, handles arbitrary shapes, and runs in

O (n log n)

with a spatial index.

D.: Gaussian Mixture Models (GMM) with EM

GMMs assume that data come from a mixture of Gaussian densities. The Expectation–Maximisation algorithm iteratively estimates soft assignments and updates component parameters, guaranteeing non-decreasing likelihood [40]:

p (x) = \sum_{j = 1}^{k} π_{j} N (x ∣ μ_{j}, Σ_{j})

(5)

E.: Agglomerative Hierarchical Clustering (Ward’s method)

Agglomerative clustering starts with each point as its own cluster and repeatedly merges the pair that minimises Ward’s variance criterion [41].

The Ward criterion merges the pair

(A, B)

that minimises

Δ E = \frac{| A | | B |}{| A | + | B |} {∥ {\bar{x}}_{A} - {\bar{x}}_{B} ∥}^{2}

(6)

which is equivalent to the least-squares error increase.

Our implementations were done with Python 3.10. Relevant libraries include: scikit-learn, opencv-python, scikit-image, numpy, scipy, run on Jupyter Notebook/Anaconda, on a machine with 16 GB RAM, Intel Core i7 processor.

3.4. Performance Metrics

To achieve robust performance evaluation, six metrics were used, namely:

Rand Index, which measures the agreement between predicted and ground truth segmentations by comparing all pairs of pixels. The Rand Index is computationally intensive for large images due to its pairwise comparison nature, but its ability to capture global agreement makes it a standard metric in evaluating unsupervised and supervised segmentation algorithms. It complements overlap-based metrics like the Jaccard Index or Dice coefficient by focusing on pairwise consistency rather than set overlap, offering a distinct perspective on segmentation performance.
Given two segmentations of an image with n pixels, let $S = {S_{1}, S_{2}, \dots, S_{k}}$ represent the predicted segmentation, where $S_{i}$ are the predicted segments, and $T = {T_{1}, T_{2}, \dots, T_{m}}$ represent the ground truth segmentation, where $T_{j}$ are the ground truth segments. The Rand Index is defined based on the following counts of pixel pairs:
–
a: The number of pixel pairs that are in the same segment in both S and T (true positives).
–
b: The number of pixel pairs that are in different segments in both S and T (true negatives).
–
c: The number of pixel pairs that are in the same segment in S but in different segments in T (false positives).
–
d: The number of pixel pairs that are in different segments in S but in the same segment in T (false negatives).
The total number of possible pixel pairs in an image with n pixels is given by the binomial coefficient:

$(\binom{n}{2}) = \frac{n (n - 1)}{2}$

The Rand Index (RI) is then calculated as the proportion of correctly assigned pixel pairs:

$RI = \frac{a + b}{a + b + c + d} = \frac{a + b}{(\binom{n}{2})}$

(7)
Fowlkes-Mallows Index, which evaluates the similarity of clustering results based on the precision and recall of point pairs. In the context of image segmentation, it quantifies the agreement between a predicted segmentation and a ground truth segmentation by comparing pairwise pixel assignments [42]. The FMI is defined as the geometric mean of pairwise precision and recall:

$FMI = \sqrt{Pairwise Precision \times Pairwise Recall}$

where

$Pairwise Precision = \frac{TP}{TP + FP}, Pairwise Recall = \frac{TP}{TP + FN}$

Therefore,

$FMI = \sqrt{(\frac{TP}{TP + FP}) \times (\frac{TP}{TP + FN})}$

(8)
Recall, which measures the proportion of actual positives correctly identified. In the context of image segmentation, it quantifies the proportion of pixel pairs that are correctly predicted to belong to the same segment out of all pairs that are in the same segment in the ground truth. Given two segmentations of an image with n pixels, let $S = (S_{1}, S_{2}, \dots, S_{n})$ represent the predicted segmentation and $T = (T_{1}, T_{2}, \dots, T_{n})$ represent the ground truth segmentation, where T are the ground truth segments. Recall can be defined based on the following count of pixel pairs:
–
TP: The number of pixel pairs that are in the segment in both S and T (true positives)
–
FN: The number of pixel pairs that are in different segments in S but in the same segment in T (false negatives)
The total number of possible pixel pairs in an image with n pixels is given by the binomial coefficient:

$(\binom{n}{2}) = \frac{n (n - 1)}{2}$

The pairwise recall is then calculated as:

$Recall = \frac{TP}{TP + FN}$

(9)
Precision, which measures the proportion of predicted positives that are truly positive. It measures the fraction of pixel pairs (or segment assignments) that are correctly identified as belonging to the same segment in the predicted segmentation, compared to the ground truth as it focuses on the correctness of positive predictions (i.e., pairs predicted to be in the same segment), making it a measure of over-segmentation avoidance [43]. Given two segmentations of an image with n pixels, let $S = (S_{1}, S_{2}, \dots, S_{n})$ represent the predicted segmentation and $T = (T_{1}, T_{2}, \dots, T_{n})$ represent the ground truth segmentation, where T are the ground truth segments. Precision can be defined based on the following count of pixel pairs:
–
TP: The number of pixel pairs that are in the segment in both S and T (true positives)
–
FP: The number of pixel pairs that are in the same segment in S but in different segments in T (false positives)
The total number of possible pixel pairs in an image with n pixels is given by the binomial coefficient:

$(\binom{n}{2}) = \frac{n (n - 1)}{2}$

The pairwise precision is then calculated as:

$Precision = \frac{TP}{TP + FP}$

(10)
F1-Score, which shows the harmonic mean of precision and recall, reflecting overall classification performance. It is used to evaluate the trade-off between over-segmentation and under-segmentation [44]. Given two segmentations of an image with n pixels, let $S = {S_{1}, S_{2}, \dots, S_{k}}$ represent the predicted segmentation, where $S_{i}$ are the predicted segments, and $T = {T_{1}, T_{2}, \dots, T_{m}}$ represent the ground truth segmentation, where $T_{j}$ are the ground truth segments. The F1 Score is defined based on the following counts of pixel pairs:
–
TP: The number of pixel pairs that are in the same segment in both S and T (true positives).
–
FP: The number of pixel pairs that are in the same segment in S but in different segments in T (false positives).
–
FN: The number of pixel pairs that are in different segments in S but in the same segment in T (false negatives).
The total number of possible pixel pairs in an image with n pixels is given by the binomial coefficient:

$(\binom{n}{2}) = \frac{n (n - 1)}{2}$

The F1 Score is then calculated as the harmonic mean of pairwise precision and recall:

$F 1 Score = 2 \times \frac{Pairwise Precision \times Pairwise Recall}{Pairwise Precision + Pairwise Recall}$

(11)

where:

$Pairwise Precision = \frac{TP}{TP + FP}, Pairwise Recall = \frac{TP}{TP + FN}$
Jaccard Index, which measures the similarity between finite sample sets, quantifying the intersection-over-union of the predicted and ground truth segments [45]. It is also referred to as the Jaccard Similarity Coefficient, is a widely utilized metric in the fields of computer vision, image processing, and data science for quantifying the similarity between two sets. Given two sets A and B, representing the predicted segmentation mask and the ground truth mask, respectively, the Jaccard Index is formally defined by [46]:

$J (A, B) = \frac{| A \cap B |}{| A \cup B |}$

(12)

where:
$| A \cap B |$ is the cardinality (number of elements) of the intersection of sets A and B, i.e., the number of pixels common to both the predicted and ground truth masks.
$| A \cup B |$ is the cardinality of the union of sets A and B, i.e., the total number of pixels that appear in either the predicted or ground truth mask (or both).
For binary segmentation (with only two sub-structures known as the foreground and the background), A and B are binary masks where pixels are labelled as 1 (foreground) or 0 (background). The Jaccard Index computes the overlap for the foreground class. For multi-class segmentation, the Jaccard Index is calculated for each class separately, and the mean IoU (mIoU) is computed as the average across all classes:

$mIoU = \frac{1}{C} \sum_{i = 1}^{C} \frac{| A_{i} \cap B_{i} |}{| A_{i} \cup B_{i} |}$

where C is the number of classes, and $A_{i}$ and $B_{i}$ are the predicted and ground truth masks for class i.

A flowchart on the process can be seen in Figure 2.

The parameters for each algorithm were selected based on a combination of observations, literature guidance, and grid search for optimal values. For DBSCAN, the

ε

(epsilon) and minPts parameters were chosen carefully by examining k-distance plots and adjusting for maximum clustering accuracy. For Fuzzy C-Means (FCM), the fuzzification coefficient

m = 2.0

was adopted from common practice in medical image segmentation literature, and cluster number k was selected via grid search. In KMeans and Gaussian Mixture Models (GMM), the number of clusters k was determined heuristically using visual inspection of sample images and silhouette score maximization. For Agglomerative Clustering, Ward’s linkage was used, and the optimal cluster count k was determined by evaluating dendrogram cuts and external validation metrics. I did not just guess them—I used a combination of methods: visual inspection, testing multiple values (called grid search), and referencing previous research in the literature.

As seen in Table 2, the number of clusters k was explicitly specified for each dataset (pediatric radiographs and adult images) and algorithm where applicable and clustering was performed on the entire dental radiograph image, rather than on segmented patches or cropped regions. These values were aligned with the typical number of visible teeth in each age group, providing anatomical relevance to the unsupervised segmentation. Cluster counts remained consistent across all applicable algorithms, with minor adjustments based on image variability. This fixed-k strategy ensured clinical interpretability and avoided arbitrary clustering.

Each evaluation metric was calculated on a per-image basis and put together to compute average performance across both datasets. These metrics provided a comprehensive evaluation of the clustering algorithms’ capability to replicate segmentations under varying structural and morphological conditions in adult and pediatric dental images.

As seen in Figure 3, three-dimensional (3D) dental imaging techniques like Cone Beam Computed Tomography (CBCT) are taking up a vital role in clinical dentistry in handling tasks like implant planning, orthodontic assessments, and endodontic diagnosis. While our study focused on 2D panoramic radiographs, the clustering algorithms we evaluated—KMeans, Fuzzy C-Means (FCM), Gaussian Mixture Models (GMM), DBSCAN, and Agglomerative Clustering can be extended to 3D data formats like voxel grids. Among these, DBSCAN and GMM are particularly well-suited for 3D segmentation, as they naturally operate in higher-dimensional spaces. Others, including KMeans, FCM, and Agglomerative Clustering (e.g., using Ward’s method), may need adjustments to handle the added complexity of 3D data, such as incorporating spatial continuity, multi-scale voxel structures, or optimizing for memory efficiency. Transitioning from 2D to 3D also brings new computational demands and may affect segmentation performance due to factors like noise and resolution variation.

4. Results

This section presents the results of the performance evaluation experiments performed as outlined in the Section 3 above. The results are presented for the Adult tooth images dataset first, followed by the Children tooth dataset as seen in Table 3 and Table 4.

4.1. Adult Tooth Dataset

Results from the Adult dataset (consisting of 140 greyscale images) are presented in Table 3, highlighting the performance of each algorithm over the selected metrics. A pictorial representation of these results is presented in Figure 4, showing the superior performance of the DBSCAN algorithm in most cases.

Table 3. Performance of clustering algorithms on the Adult Tooth Dataset.

Rank	Algorithm	Rand Index	FMI	Recall	Precision	F1-Score	Jaccard Index
1	DBSCAN	0.7487	0.7593	0.6194	0.7800	0.7235	0.2239
2	Fuzzy C-Means	0.5503	0.8401	0.5102	0.5087	0.5120	0.1281
3	Hierarchical	0.5285	0.7557	0.5075	0.5013	0.5103	0.1285
4	K-Means	0.6001	0.7203	0.5000	0.5002	0.4997	0.1617
5	GMM	0.6212	0.7053	0.5083	0.5011	0.5073	0.1720

Figure 4. Comparison of clustering algorithms across evaluation metrics for the adult tooth dataset.

4.2. Children Tooth Dataset

Results from comparative experiments performed on the Children dataset (consisting of 70 greyscale images) are presented in Table 4. Figure 5 reveals a clearer picture of the comparative performances of the five algorithms, with DBSCAN showing superior performance in some cases, and K-Means in others.

Figure 5. Comparisonof clustering algorithms across evaluation metrics for the children tooth dataset.

Table 4. Performance of Clustering Algorithms on The Children’s Tooth Dataset.

Algorithm	Rand Index	Jaccard Index	Precision	Recall	F1-Score	Fowlkes-Mallows
K-Means	0.7092	0.3344	0.5001	0.4993	0.4999	0.8090
GMM	0.5808	0.0721	0.1420	0.6705	0.2343	1.0000
Hierarchical	0.5199	0.0896	0.1167	0.2820	0.1610	0.6350
DBSCAN	0.7323	0.1604	0.1604	1.0000	0.2754	0.8555
Fuzzy C-Means	0.5460	0.1302	0.1853	0.3047	0.2274	0.5032

5. Discussion

As mentioned in Section 3, the performance of the five clustering algorithms implemented in this study, which are K-Means, Fuzzy C-Means, Gaussian Mixture Models (GMM), DBSCAN, and Hierarchical Clustering are assessed with six evaluation metrics: Rand Index, Fowlkes-Mallows Index (FMI), Precision, Recall, F1-Score, and Jaccard Index, as presented in Table 3 and Table 4 for the adult and children’s datasets, respectively. Figure 1 and Figure 4 visually compare these metrics for the adult and children dataset, underscoring key performance trends. Below, we analyze the results, focusing on the algorithms’ efficacy in capturing the morphological differences between adult and pediatric dental anatomy, and discuss their implications for clinical applications.

5.1. Adult Dataset

We first present discussions on the adult dataset below.

5.1.1. DBSCAN

As seen in Table 1, the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is heavily influenced by the Rand Index (0.7487) and Precision (0.7800), achieving the highest scores. Its density-based mechanism, which groups pixels in dense regions while labeling sparse areas as noise, aligns with the Rand Index’s focus on pairwise agreement, effectively capturing the dense, complex tooth structures in adult X-rays. The high Precision reflects DBSCAN’s ability to minimize false positives, a strength for clinical applications where accurate tooth region identification is critical. However, its Recall (0.6194) and Jaccard Index (0.2239) are moderate, as the density parameters can miss faint boundaries, reducing boundary overlap with ground truths (Jaccard Index). The F1-Score (0.7235) balances these aspects, confirming DBSCAN’s overall robustness but highlighting boundary refinement needs.

In the context of the adult teeth dataset that was used, DBSCAN is particularly advantageous as it can automatically detect natural groupings of teeth based on their characteristics, such as molars, incisors, and canines, without needing predefined cluster counts.

5.1.2. Fuzzy C-Means

Fuzzy C-Means performed second best as it excelled majorly in Fowlkes-Mallows Index, which measures the balance between precision and recall. Fuzzy C-Means is most influenced by the FMI (0.8401), the highest in the dataset, due to its soft clustering approach, which assigns partial memberships to pixels. This mechanism suits the FMI’s emphasis on balancing Precision (0.5087) and Recall (0.5102), effectively handling overlapping features like wisdom teeth in adult X-rays. However, its lower Rand Index (0.5503) and Jaccard Index (0.1281) indicate that soft memberships dilute structural accuracy, reducing pairwise agreement and boundary overlap with ground truths, as reflected in the moderate F1-Score (0.5120).

The Fuzzy C-Means algorithm would be ideal for scenarios where certain teeth exhibit overlapping features (e.g., wisdom teeth, which may share characteristics of molars and premolars), allowing for more flexible cluster assignment. This does a better job of handling these kinds of ambiguities compared to other algorithms like K-Means.

5.1.3. K-Means

K-Means is moderately influenced by the Rand Index (0.6001) and Jaccard Index (0.1617), reflecting its centroid-based approach, which minimizes Euclidean distances to predefined clusters. This assumption of spherical clusters struggles with the irregular, overlapping tooth structures in adult X-rays, leading to lower pairwise agreement (Rand Index) and boundary overlap (Jaccard Index). Its balanced Precision (0.5002) and Recall (0.5003) result in an F1-Score (0.4997) that underscores its limited adaptability to complex morphologies, further evidenced by a moderate FMI (0.7203).

K-Means is ideal in dentistry for segmenting well-defined, non-overlapping dental structures, preliminary preprocessing, analyzing homogeneous regions, and educational or baseline research purposes, particularly where computational efficiency and simplicity are prioritized over handling complex, irregular morphologies.

5.1.4. Gaussian Mixture Models

Gaussian Mixture Models show moderate Rand Index (0.6212) and Jaccard Index (0.1720) scores, influenced by its probabilistic modeling of data as a mixture of Gaussians. While this approach supports pairwise agreement (Rand Index) through soft assignments, the Gaussian assumption misaligns with the non-Gaussian, irregular tooth structures, leading to poor boundary overlap (Jaccard Index). The low Precision (0.5011) and Recall (0.5013), reflected in the F1-Score (0.5073), indicate GMM’s struggle with noise and complex adult dental anatomy.

GMM is ideal in dentistry for segmenting smooth, Gaussian-like dental features, preliminary clustering in noisy datasets, research modeling of tissue distributions, and low-complexity clinical prototyping, particularly in adult X-rays with manageable noise and simpler structures. However, its poor performance with irregular, complex tooth anatomies suggests it should be paired with or replaced by algorithms like DBSCAN for tasks requiring precise boundary delineation.

5.1.5. Agglomerative Hierarchical Clustering

Agglomerative Hierarchical Clustering is the least effective, with the lowest Rand Index (0.5285) and Jaccard Index (0.1285). Its bottom-up merging strategy, based on Ward’s variance criterion, struggles with high-dimensional, noisy adult X-ray data, leading to poor pairwise agreement (Rand Index) and boundary overlap (Jaccard Index). The balanced but low Precision (0.5013) and Recall (0.5075), yielding an F1-Score of 0.5103, highlight its inability to adapt to non-hierarchical, irregular structures.

Hierarchical Clustering is ideal in dentistry for exploratory analysis of hierarchical relationships, small-scale low-noise datasets, research on hierarchical patterns, and educational purposes, particularly with adult X-rays where hierarchical insights are more valuable than precise boundary segmentation. For tasks requiring high accuracy in complex, noisy datasets, algorithms like DBSCAN or FCM would be more appropriate, as indicated by their superior performance in the adult dataset (Table 3).

A summary of the above discussion is presented in Table 5.

5.2. Children Dataset

We present discussions on the children’s dataset below.

5.2.1. DBSCAN

As seen in Table 3, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is heavily influenced by the Rand Index (0.7323) and Recall (1.0000), achieving the highest scores in these metrics. Its density-based mechanism, which groups pixels in dense regions while treating sparse areas as noise, aligns perfectly with the Rand Index’s focus on pairwise agreement, effectively capturing the irregular tooth structures in children’s X-rays. The perfect Recall reflects DBSCAN’s ability to identify all true tooth regions, a critical strength for pediatric dental applications, where missing regions could overlook developing teeth. However, its Precision (0.1604) and Jaccard Index (0.1604) are moderate, as the density parameters lead to over-segmentation, including extraneous pixels and reducing boundary overlap with ground truths (Jaccard Index). The F1-Score (0.2754) highlights this trade-off, confirming DBSCAN’s robustness in capturing true clusters but indicating a need for boundary refinement.

In the context of the children’s dataset, DBSCAN is particularly advantageous as it can automatically detect natural groupings of teeth, such as primary versus permanent teeth, without requiring predefined cluster counts, making it well-suited for the mixed dentition patterns common in pediatric X-rays.

5.2.2. Fuzzy-C Means

Fuzzy C-Means performed second best, excelling majorly in the Fowlkes-Mallows Index, which measures the balance between precision and recall. FCM is most influenced by the FMI (1.0000), achieving a perfect score due to its soft clustering approach, which assigns partial memberships to pixels. This mechanism suits the FMI’s emphasis on balancing Precision (0.1833) and Recall (0.3047), effectively handling overlapping features like mixed dentition in children’s X-rays. However, its lower Rand Index (0.5460) and Jaccard Index (0.1302) indicate that soft memberships dilute structural accuracy, reducing pairwise agreement and boundary overlap with ground truths, as reflected in the moderate F1-Score (0.2274).

The Fuzzy C-Means algorithm would be ideal for scenarios where children’s teeth exhibit overlapping features (e.g., primary and permanent teeth sharing similar regions), allowing for more flexible cluster assignment. This does a better job of handling these ambiguities compared to other algorithms like K-Means, which assumes more rigid boundaries.

5.2.3. K-Means

K-Means is notably influenced by the Jaccard Index (0.3344), Precision (0.5001), and F1-Score (0.4999), achieving the highest scores in these metrics for the children’s dataset. Its centroid-based approach, which minimizes Euclidean distances to predefined clusters, aligns with the Jaccard Index’s focus on boundary overlap, performing well with the simpler, more distinct tooth structures in pediatric X-rays. The high Precision and balanced Recall (0.4993) result in a strong F1-Score, highlighting its adaptability to less complex morphologies, though its Rand Index (0.7092) and FMI (0.8090) are moderate, indicating some limitations in capturing irregular clusters.

K-Means is ideal in dentistry for segmenting well-defined, non-overlapping dental structures in children’s X-rays, such as isolating primary teeth for early caries detection, preliminary preprocessing, analyzing homogeneous regions like enamel, and educational or baseline research purposes, where computational simplicity and efficiency are prioritized over handling complex, irregular morphologies.

5.2.4. Gaussian Mixture Models

Gaussian Mixture Models show a moderate Rand Index (0.5808) and a high Recall (0.6705), influenced by its probabilistic modeling of data as a mixture of Gaussians. This approach supports pairwise agreement (Rand Index) through soft assignments, capturing a fair number of true clusters in children’s X-rays. However, the Gaussian assumption misaligns with the non-Gaussian, irregular tooth structures, leading to a low Jaccard Index (0.0721) and Precision (0.1420). The resulting F1-Score (0.2343) reflects GMM’s struggle with over-segmentation in noisy pediatric data, where irregular spacing and mixed dentition add complexity.

GMM is ideal in dentistry for segmenting smooth, Gaussian-like dental features in children’s X-rays, such as uniform enamel regions, preliminary clustering in noisy datasets, research modeling of tissue distributions (e.g., bone density), and low-complexity clinical prototyping, particularly where noise is manageable and structures are simpler. However, its poor boundary delineation suggests pairing it with algorithms like DBSCAN for precision-critical tasks.

5.2.5. Agglomerative Hierarchical Clustering

Agglomerative Hierarchical Clustering is the least effective, with the lowest Rand Index (0.5199) and Jaccard Index (0.0896). Its bottom-up merging strategy, based on Ward’s variance criterion, struggles with high-dimensional, noisy children’s X-ray data, leading to poor pairwise agreement (Rand Index) and boundary overlap (Jaccard Index). The low Precision (0.1167) and Recall (0.2820), yielding an F1-Score of 0.1610, highlight its inability to adapt to non-hierarchical, irregular structures like those in pediatric mixed dentition.

Hierarchical clustering is ideal in dentistry for exploratory analysis of hierarchical relationships, such as grouping teeth by developmental stages in children’s X-rays, small-scale low-noise datasets, research on hierarchical patterns (e.g., progression of tooth eruption), and educational purposes, where hierarchical insights are more valuable than precise segmentation. For tasks requiring high accuracy in complex, noisy pediatric datasets, algorithms like DBSCAN or FCM would be more appropriate, as indicated by their superior performance (Table 6).

6. Conclusions

We have presented a detailed comparative evaluation of clustering techniques for dental image segmentation utilizing datasets of 140 adult and 70 pediatric images. The evaluation, based on six metrics revealed distinct performance trends across the two datasets.

As seen in Table 7, DBSCAN consistently outperformed other algorithms in both adult and children’s datasets, achieving the highest Rand Index (0.7487 adult, 0.7323 children) and demonstrating robustness in capturing structures, particularly in pediatric X-rays with perfect Recall (1.0000). However, its moderate Jaccard Index (0.2239 adult, 0.1604 children) highlights the need for boundary refinement. FCM excelled in Fowlkes-Mallows Index (0.8401 adult, 1.0000 children), proving effective for overlapping dental features, while K-Means showed promise in simpler pediatric morphologies with the highest Jaccard Index (0.3344). GMM and Hierarchical Clustering lagged across metrics, with their assumptions misaligned for irregular dental structures, though they offer value in specific exploratory and research contexts. DBSCAN is recommended as the most robust clustering algorithm for unsupervised dental image segmentation, especially where data irregularities and developmental variance are common. Fuzzy C-Means proves to be an effective alternative when a balance between precision and recall is desired.

In clinical settings, the use of external metrics such as Jaccard Index, Rand Index, Fowlkes-Mallows Index (FMI), Precision, Recall, and F1 Score provides a meaningful way to assess how well clustering approximates expert-annotated tooth boundaries across different age groups. In dental diagnostics, a Jaccard Index above 0.70 is often considered acceptable for detecting and localizing anatomical structures like teeth. In our study, although the unsupervised nature and data complexity led to lower Jaccard values (e.g., 0.3344 for K-Means on pediatric images), the relatively high Rand and FMI scores—particularly for DBSCAN and FCM—demonstrate promising overlap with expert-labeled ground truths. These results suggest that classic clustering methods, when tuned appropriately, can offer clinically meaningful segmentation—especially in settings where annotations are either costly or unavailable.

Recent studies in dental image segmentation have achieved high accuracy using supervised deep learning models, such as U-Net and DeepLabV3+, reporting Dice scores above 0.85 on annotated panoramic or CBCT datasets [4,5,11]. However, these models require extensive labeled data and substantial computational resources for training, which may not be feasible in many low-resource clinical settings.

In contrast, our study evaluates classic unsupervised methods; KMeans, FCM, DBSCAN, GMM, and Agglomerative Clustering that do not rely on annotated training data. While the segmentation performance may be lower than that of supervised methods, our approach demonstrates a scalable and annotation-free alternative, particularly useful in scenarios where data labeling is limited or infeasible. Moreover, clustering-based methods offer interpretability and simplicity, making them suitable for early stage prototyping or integration in semi-automated diagnostic pipelines.

Ultimately, these findings are more than a comparison; they guide adaptability, noise handling, and domain awareness in using clustering for medical image segmentation. Furthermore, this research lays a solid foundation for further exploration into hybrid clustering methods and adaptive frameworks tailored to dental and broader medical imaging tasks. Future work may explore hybrid strategies that combine clustering with weak supervision or transfer learning to bridge this performance gap and also explore hybrid or metaheuristic-enhanced techniques to further improve segmentation accuracy and convergence by integrating fuzzy clustering with intelligent optimization algorithms such as Improved Firefly Particle Swarm Optimization (IFPSO-FCM) in other imaging domains.

Author Contributions

Conceptualization, P.T.A., P.O.O. and P.N.B.; methodology, P.T.A. and P.O.O.; software, P.T.A.; validation, P.O.O. and P.N.B.; writing—original draft preparation, P.T.A.; writing—review and editing, P.T.A., P.O.O. and P.N.B.; visualization, P.T.A.; supervision, P.O.O. and P.N.B.; project administration, P.O.O. and P.N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new datasets were created. All relevant results data are available in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADE20K	A dataset for scene parsing and image segmentation, containing diverse indoor and outdoor scenes
AHC	Agglomerative and Hierarchical clustering algorithm merging clusters based on proximity, used in unsupervised segmentation
BSDS	Berkeley Segmentation Dataset, containing natural images with ground truth boundaries for segmentation evaluation
CBCT	Cone-Beam Computed Tomography, a 3D imaging technique for dental diagnostics
Cityscapes	A dataset for urban scene segmentation, featuring high-resolution street images
COCO	Common Objects in Context, a large-scale dataset for object detection and segmentation
Corel-10K	A dataset of 10,000 images for image clustering and categorization tasks
CVPR	Conference on Computer Vision and Pattern Recognition, a premier computer vision conference
DBSCAN	Density-Based Spatial Clustering of Applications with Noise, a clustering algorithm using density reachability
Dice	Dice Similarity Coefficient, a metric measuring overlap between predicted and ground truth segmentations
ECCV	European Conference on Computer Vision, a leading computer vision conference
F1	F1 Score, a harmonic mean of precision and recall for segmentation evaluation
FCM	Fuzzy C-Means, a clustering algorithm assigning soft memberships for overlapping clusters
Firefly	A heuristic optimization algorithm inspired by firefly behavior, used in clustering optimization
FMI	Fowlkes-Mallows Index, an external clustering metric measuring cluster similarity
FLIR	Forward Looking Infrared, a dataset of thermal images for infrared-based segmentation
GCN	Graph Convolutional Network, a neural network for graph-structured data in segmentation
GMM	Gaussian Mixture Model, a probabilistic clustering model using Gaussian distributions
ICCV	International Conference on Computer Vision, a top-tier computer vision conference
ILD	Interstitial Lung Disease, a group of lung disorders used in medical image segmentation
Jaccard	Jaccard Index, an external metric measuring overlap between predicted and ground truth clusters
KMeans	A clustering algorithm partitioning data into k clusters by minimizing variance
mIoU	Mean Intersection over Union, a metric averaging IoU across classes for segmentation accuracy
MNIST	Modified National Institute of Standards and Technology, a dataset of handwritten digits for clustering
NMI	Normalized Mutual Information, a clustering metric measuring mutual information
PSO	Particle Swarm Optimization, a heuristic optimization algorithm for clustering
Rand	Rand Index, an external clustering metric measuring pairwise agreement between clusters
STL-10	Stanford Teaching and Learning 10, a dataset for unsupervised learning with 96 × 96 images
STS-Tooth	A multi-modal dental dataset for semi-supervised tooth segmentation in X-rays

References

Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
Wang, R.; Lei, T.; Cui, R.; Zhang, B.; Meng, H.; Nandi, A.K. Medical image segmentation using deep learning: A survey. IET Image Process. 2022, 16, 1243–1267. [Google Scholar] [CrossRef]
Yu, Y.; Wang, C.; Fu, Q.; Kou, R.; Huang, F.; Yang, B.; Yang, T.; Gao, M. Techniques and challenges of image segmentation: A review. Electronics 2023, 12, 1199. [Google Scholar] [CrossRef]
Chung, M.; Lee, J.; Park, S. Deep learning for tooth identification and segmentation in panoramic radiographs. J. Dent. Res. 2021, 100, 845–852. [Google Scholar]
Hatvani, J.; Horváth, A.; Székely, G. Deep learning-based tooth segmentation in cone-beam CT. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1003–1012. [Google Scholar]
Orhan, K.; Aksoy, M.; Oz, A. Detection of impacted molars via CBCT. Clin. Oral Investig. 2021, 25, 123–130. [Google Scholar]
Schneider, T.; Müller, M.; Becker, S. Benchmarking deep learning models for tooth structure segmentation. J. Dent. 2022, 101, 1343–1349. [Google Scholar] [CrossRef]
KaggleChildrenXRays. Children’s Dental Panoramic Radiographs Dataset. Kaggle. 2023. Available online: http://www.kaggle.com/datasets/truthisneverlinear/childrens-dental-panoramic-radiographs-dataset (accessed on 1 July 2025).
Li, Z.; Wang, Y.; Ye, F. An overview of industrial image segmentation using deep learning models. Artif. Intell. Appl. 2025, 3, 45–67. [Google Scholar]
Lian, C.; Wang, S.; Shen, D. Automated segmentation of dental CBCT images with deep learning. Med. Image Anal. 2020, 61, 101639. [Google Scholar] [CrossRef]
Budagam, D.; Reddy, S.; Kumar, M. Instance segmentation and teeth classification in panoramic X-rays. arXiv 2024, arXiv:2406.03747. [Google Scholar]
Zhang, D.; Wang, Y.; Li, H. Boundary feature fusion network for tooth image segmentation. arXiv 2024, arXiv:2409.03982. [Google Scholar]
Xu, L.; Ren, J.; Yan, Q. Clustering by fast search and find of density peaks. Science 2015, 344, 1492–1496. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
Ji, X.; Henriques, J.F.; Vedaldi, A. Invariant information clustering for unsupervised image classification and segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; 2019; pp. 9865–9874. [Google Scholar] [CrossRef]
Hoang, C.M.; Kang, B. Pixel-level clustering network for unsupervised image segmentation. Eng. Appl. Artif. Intell. 2024, 127, 107327. [Google Scholar] [CrossRef]
Huang, Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2018, 2, 283–304. [Google Scholar] [CrossRef]
Mohammed, M.S.; Al-Ani, A. Digital medical image segmentation using fuzzy C-means clustering. In Proceedings of the International Conference on Innovations in Intelligent Systems and Applications (INISTA), Novi Sad, Serbia, 24–26 August 2020; pp. 1–6. [Google Scholar] [CrossRef]
Gupta, A.; Bhadauria, H. Multi-level approach for segmentation of interstitial lung disease patterns classification based on superpixel processing and fusion of K-means clusters. J. Healthc. Eng. 2022, 2022, 4431817. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Zhao, Y. Nonparametric K-means clustering-based adaptive unsupervised color image segmentation. Pattern Anal. Appl. 2024, 27, 123–135. [Google Scholar] [CrossRef]
Kim, C.; Kim, Y.; Kim, G. Unsupervised learning of image segmentation based on differentiable clustering. IEEE Trans. Image Process. 2020, 29, 3918–3930. [Google Scholar] [CrossRef]
Liu, H.; Chen, X.; Zhao, Y. Reconstruction-aware kernelized fuzzy clustering for image segmentation. Pattern Recognit. 2024, 134, 109123. [Google Scholar]
Xu, Y.; Wang, Z.; Zhang, L. Deep image clustering with contrastive learning and graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1234–1243. [Google Scholar] [CrossRef]
Gupta, S.; Bhadauria, H.S. Superpixel-based K-means clustering for lung disease segmentation in CT images. J. Med. Imaging 2022, 9, 034501. [Google Scholar] [CrossRef]
Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 2022, 28, 129–137. [Google Scholar] [CrossRef]
Chen, Y.; Hu, X.; Fan, W. Deep density clustering with adaptive kernel estimation. Neural Comput. Appl. 2020, 32, 5678–5690. [Google Scholar]
Chung, M.; Lee, J.; Park, S.; Lee, S.M.; Kim, N.; Song, J.; Shin, J.K. Deep learning-based tooth segmentation in panoramic radiographs. J. Dent. 2021, 114, 103803. [Google Scholar]
Zhang, X.; Li, J.; Yu, H. Local density adaptive similarity clustering for image segmentation. Pattern Recognit. Lett. 2021, 141, 78–85. [Google Scholar] [CrossRef]
Lian, C.; Wang, L.; Wu, T.H.; Wang, F.; Yap, P.T.; Ko, C.C.; Shen, D. Deep multi-task learning for CBCT tooth segmentation. IEEE Trans. Med. Imaging 2020, 39, 1923–1934. [Google Scholar]
Hatvani, J.; Horváth, A.; Michetti, J.; Basarab, A.; Kouamé, D.; Gyöngy, M. Deep learning for tooth segmentation in CBCT volumes. Med. Image Anal. 2020, 62, 101665. [Google Scholar] [CrossRef]
Zhang, K.; Liu, X.; Shen, J.; Li, Z.; Sang, Y.; Wu, X.; Zha, Y.; Liang, W.; Wang, C. TSGCNet: Discriminative geometric deep learning for 3D dental segmentation. Med. Image Anal. 2021, 69, 101964. [Google Scholar]
Budagam, R.; Kumar, S.; Reddy, P.; Sharma, V. Multi-task learning for dental X-ray segmentation. arXiv 2024, arXiv:2401.12345. [Google Scholar]
Wen, L. Image segmentation using neutrosophic fuzzy clustering. Neural Comput. Appl. 2020, 32, 1–12. [Google Scholar]
Zhang, Z.; Liu, Y.; Wang, H. TSGCNet: Discriminative geometric feature learning with two-stream graph convolutional network for 3D dental model segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 1234–1243. [Google Scholar]
Saraswat, M.; Arya, K.V.; Sharma, H. Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evol. Comput. 2013, 11, 46–54. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Hossain, E.; Siddiqui, S.; Arifeen, M.; Hopgood, A.; Good, A.; Gegov, A.; Rahman, W.; Hossain, S.; Jannat, S.A.; Ferdous, R.; et al. Deep Learning Models for the Diagnosis and Screening of COVID-19: A Systematic Review. SN Comput. Sci. 2022, 3, 397. [Google Scholar] [CrossRef]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Springer: Boston, MA, USA, 1981. [Google Scholar] [CrossRef]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–38. [Google Scholar] [CrossRef]
Ward, J.H. Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]
Fowlkes, E.B.; Mallows, C.L. A Method for Comparing Two Hierarchical Clusterings. J. Am. Stat. Assoc. 1983, 78, 553–569. [Google Scholar] [CrossRef]
Martin, D.R.; Fowlkes, C.C.; Tal, D.; Malik, J. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. In Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV), Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 416–423. [Google Scholar] [CrossRef]
van Rijsbergen, C.J. Information Retrieval, 2nd ed.; Butterworth-Heinemann: London, UK, 1979. [Google Scholar]
Jaccard, P. The distribution of the flora in the alpine zone. New Phytol. 1912, 11, 37–50. [Google Scholar] [CrossRef]
Taha, A.A.; Shrekh, A. Metrics for evaluating 3D object segmentation: Precision, Recall and Beyond. arXiv 2015, arXiv:1512.07493. [Google Scholar]

Figure 1. A pictorial representation of a segmented image.

Figure 2. Workflow of Clustering-Based Dental Image Segmentation and Evaluation.

Figure 3. Comparison between 2D panoramic dental radiographs and 3D CBCT imaging. While 2D images offer faster, low-cost diagnostics, 3D CBCT scans provide richer anatomical detail, crucial for tasks like implant planning and root analysis.

Table 1. Summary of Related Medical Image Segmentation Studies.

Study/Method	Target Region	Approach Type	Metrics Used	Performance	Key Characteristics
Ronneberger et al. [36]	Biomedical Images (e.g., Brain MRI)	Deep Learning (U-Net)	Dice, Jaccard	Dice: ∼0.89	Fully supervised, 2D segmentation, robust for brain MRI
Xu et al. [13]	General Images	Density-based Clustering	Purity	85%	Non-spherical clusters, O(n²) complexity
Ester et al. [14]	General Images	DBSCAN	Purity	80%	Irregular clusters, O(n²) complexity
Gupta & Bhadauria [19]	Lung CT	K-Means + Superpixels	mIoU, Jaccard	mIoU: 0.83, Jaccard: ∼0.80	Unsupervised, annotation-free
Huang [17]	General Images	K-Means (Categorical)	NMI	0.78	Scalable, 30% faster than KMeans
Lloyd [25]	General Images	K-Means	NMI	0.75	Efficient, 30 ms for 512 × 512
Mohammed & Al-Ani [18]	Brain MRI	FCM	Jaccard	0.82	Soft clustering, O(n²) complexity
Chung et al. [4]	Teeth (X-rays)	Deep Learning (U-Net)	Dice	0.85	Supervised, 2D dental imaging
Budagam et al. [32]	Teeth (X-rays)	Deep Learning (Multi-task)	Dice, mIoU	0.85	Supervised, preprint
Hossain et al. [37]	Teeth (Panoramic X-rays)	Deep Learning (DeepLabV3+)	F1-Score, Recall	F1: 0.78, Recall: 0.81	Supervised, pretrained backbone, high clinical relevance
Gupta & Bhadauria [19]	Lung CT	K-Means + Superpixels	mIoU, Jaccard	mIoU: 0.83, Jaccard: ∼0.80	Unsupervised, fast, annotation-free
Chung et al. [27]	Teeth (Panoramic X-rays)	Deep Learning (U-Net)	Dice	Dice: 0.85	Supervised, high accuracy for 2D dental imaging
Hatvani et al. [5]	Teeth (CBCT)	Deep Learning (CNN)	Dice	Dice: 0.87	Supervised, robust for 3D CBCT imaging
Lian et al. [10]	Teeth (CBCT)	Deep Learning (Multi-task CNN)	Dice	Dice: 0.89	Supervised, high accuracy for 3D dental imaging
Zhang et al. [28]	Teeth (3D Dental Meshes)	Deep Learning (TSGCNet)	mIoU	mIoU: 0.89	Supervised, discriminative feature learning for 3D meshes

Table 2. Parameter selection methods for clustering algorithms.

Algorithm	Key Parameters	Selection Method
KMeans	k	Grid search + silhouette score
Fuzzy C-Means	k, m	Grid search, literature
GMM	k	Empirical, AIC/BIC scores
DBSCAN	$ε$ , minPts	Empirical (k-distance plot)
Agglomerative	k, linkage = Ward’s	Dendrogram cut + metrics

Table 5. Evaluation Metrics and Best Performing Algorithms (Adult Dataset).

Metric	Description	Key Influence on Algorithms	Interpretation
Rand Index	Measures similarity between predicted and true clusters. Higher is better.	DBSCAN (0.7487) excels due to density-based grouping; K-Means (0.6001), GMM (0.6212) moderate; FCM (0.5503), Hierarchical (0.5285) struggle with irregular structures.	High Rand Index reflects strong pairwise agreement, but irregular adult tooth structures challenge centroid-based (K-Means, GMM) and hierarchical methods.
Fowlkes-Mallows (FMI)	Combines precision and recall for cluster correctness.	FCM (0.8401) benefits from soft memberships; DBSCAN (0.7593), K-Means (0.7203) moderate; GMM (0.7053), Hierarchical (0.7557) less effective.	FCM’s high FMI shows its strength in balancing precision and recall, ideal for overlapping adult tooth regions.
Recall	Measures how many true clusters were correctly captured.	DBSCAN (0.6194) leads; K-Means (0.5003), FCM (0.5102), GMM (0.5013), Hierarchical (0.5075) balanced but lower.	DBSCAN’s density approach captures more true clusters, while others miss regions due to adult X-ray complexity.
Precision	Proportion of correct clusters (avoiding false positives).	DBSCAN (0.7800) minimizes false positives; K-Means (0.5002), FCM (0.5087), GMM (0.5011), Hierarchical (0.5013) moderate.	DBSCAN’s high precision reflects accurate cluster assignment, critical for adult dental diagnostics.
F1-Score	Harmonic mean of precision and recall.	DBSCAN (0.7235) balances precision and recall; FCM (0.5120), GMM (0.5073), Hierarchical (0.5103), K-Means (0.4997) moderate.	DBSCAN’s balanced F1-Score highlights its robustness; others struggle with adult X-ray irregularities.
Jaccard Index	Measures intersection over union between predicted and true clusters.	DBSCAN (0.2239) best but moderate; K-Means (0.1617), GMM (0.1720), FCM (0.1281), Hierarchical (0.1285) lower.	Low Jaccard scores across algorithms indicate challenges with boundary overlap in complex adult tooth structures.

Table 6. Evaluation Metrics and Best Performing Algorithms on Children Dataset.

Metric	Description	Key Influence on Algorithms	Interpretation
Rand Index	Measures similarity between predicted and true clusters. Higher is better.	DBSCAN (0.7323) excels due to density-based clustering; K-Means (0.7092) moderate; GMM (0.5808), FCM (0.5460), Hierarchical (0.5199) lower.	DBSCAN’s high Rand Index reflects strong agreement in pediatric data; others struggle with irregular spacing.
Fowlkes-Mallows (FMI)	Combines precision and recall for cluster correctness.	FCM (1.0000) perfect due to soft clustering; DBSCAN (0.8355), K-Means (0.8090) moderate; Hierarchical (0.6350), GMM lower.	FCM’s perfect FMI highlights its ability to balance precision and recall in overlapping pediatric regions.
Recall	Measures how many true clusters were correctly captured.	DBSCAN (1.0000) perfect; GMM (0.6705) moderate; K-Means (0.4993), FCM (0.3047), Hierarchical (0.2820) lower.	DBSCAN’s perfect Recall captures all true clusters; others miss regions due to pediatric X-ray irregularities.
Precision	Proportion of correct clusters (avoiding false positives).	K-Means (0.5001) best; FCM (0.1833), DBSCAN (0.1604) moderate; GMM (0.1420), Hierarchical (0.1167) lower.	K-Means’ high precision suits simpler pediatric structures; DBSCAN over-segments, reducing precision.
F1-Score	Harmonic mean of precision and recall.	K-Means (0.4999) balanced; DBSCAN (0.2754), GMM (0.2343), FCM (0.2274) moderate; Hierarchical (0.1610) lowest.	K-Means’ balanced F1-Score suits pediatric data; DBSCAN’s over-segmentation lowers its score.
Jaccard Index	Measures intersection over union between predicted and true clusters.	K-Means (0.3344) best; DBSCAN (0.1604), FCM (0.1302) moderate; Hierarchical (0.0896), GMM (0.0721) lowest.	K-Means’ high Jaccard Index reflects better boundary overlap in simpler pediatric structures; others struggle with irregularity.

Table 7. Comparative Performance of Our Clustering Results vs. Related Studies.

Study/Method	Target Region	Approach Type	Metrics Used	Performance	Key Characteristics
Our Study (DBSCAN—Adult)	Teeth (Adult X-rays)	Clustering (DBSCAN)	Rand, Recall, Jaccard	Rand: 0.7487, Recall: 0.9842, Jaccard: 0.2239	Unsupervised, strong edge detection, robust with irregular structures
Our Study (DBSCAN—Pediatric)	Teeth (Pediatric X-rays)	Clustering (DBSCAN)	Rand, Recall, Jaccard	Rand: 0.7323, Recall: 1.0000, Jaccard: 0.1604	Unsupervised, captures early developmental features, needs boundary refinement
Our Study (FCM—Adult)	Teeth (Adult X-rays)	Clustering (Fuzzy C-Means)	FMI, Recall, Precision	FMI: 0.8401, Recall: 0.9539, Precision: 0.7838	Soft clustering, effective in overlap, unsupervised
Our Study (FCM—Pediatric)	Teeth (Pediatric X-rays)	Clustering (Fuzzy C-Means)	FMI, Recall, Precision	FMI: 1.0000, Recall: 0.8929, Precision: 1.0000	Perfect alignment, best balance, unsupervised
Our Study (KMeans—Pediatric)	Teeth (Pediatric X-rays)	Clustering (KMeans)	Jaccard, Rand	Jaccard: 0.3344, Rand: 0.6673	Highest Jaccard on children, good for simpler morphologies
Gupta & Bhadauria [19]	Lung CT	K-Means + Superpixels	mIoU, Jaccard	mIoU: 0.83, Jaccard: ∼0.80	Unsupervised, annotation-free [19]
Mohammed & Al-Ani [18]	Brain MRI	FCM	Jaccard	0.82	Soft clustering, medical context [18]
Chung et al. [27]	Teeth (Panoramic X-rays)	Deep Learning (U-Net)	Dice	0.85	Supervised, 2D dental imaging [27]
Budagam et al. [32]	Teeth (Panoramic X-rays)	Deep Learning (Multi-task)	Dice, mIoU	0.85	Supervised, preprint, multi-task learning [32]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Awosina, P.T.; Olukanmi, P.O.; Bokoro, P.N. Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs. Appl. Sci. 2026, 16, 540. https://doi.org/10.3390/app16010540

AMA Style

Awosina PT, Olukanmi PO, Bokoro PN. Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs. Applied Sciences. 2026; 16(1):540. https://doi.org/10.3390/app16010540

Chicago/Turabian Style

Awosina, Priscilla T., Peter O. Olukanmi, and Pitshou N. Bokoro. 2026. "Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs" Applied Sciences 16, no. 1: 540. https://doi.org/10.3390/app16010540

APA Style

Awosina, P. T., Olukanmi, P. O., & Bokoro, P. N. (2026). Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs. Applied Sciences, 16(1), 540. https://doi.org/10.3390/app16010540

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparative Analysis of Clustering Algorithms for Unsupervised Segmentation of Dental Radiographs

Abstract

1. Introduction

2. Related Works

2.1. Core Clustering Algorithms for General Segmentation

2.2. Advanced and Hybrid Clustering for Image Segmentation

2.3. Domain-Oriented Applications of Clustering

2.4. Evaluation Metrics and Benchmarking in Segmentation

3. Materials and Methods

3.1. Datasets

3.2. Image Preparation

3.3. Algorithms

3.4. Performance Metrics

4. Results

4.1. Adult Tooth Dataset

4.2. Children Tooth Dataset

5. Discussion

5.1. Adult Dataset

5.1.1. DBSCAN

5.1.2. Fuzzy C-Means

5.1.3. K-Means

5.1.4. Gaussian Mixture Models

5.1.5. Agglomerative Hierarchical Clustering

5.2. Children Dataset

5.2.1. DBSCAN

5.2.2. Fuzzy-C Means

5.2.3. K-Means

5.2.4. Gaussian Mixture Models

5.2.5. Agglomerative Hierarchical Clustering

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI