Next Article in Journal
Floyd–Warshall Algorithm for Sparse Graphs
Previous Article in Journal
Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder
Previous Article in Special Issue
GMHCA-MCBILSTM: A Gated Multi-Head Cross-Modal Attention-Based Network for Emotion Recognition Using Multi-Physiological Signals
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective

by
Priyanka Manchegowda
1,
Manohar Nageshmurthy
1,*,
Suresha Raju
1 and
Dayananda Rudrappa
2
1
Department of Computer Science, School of Computing, Mysuru Campus, Amrita Vishwa Vidyapeetham, Mysuru 570026, India
2
Department of Forensic Medicine, Mysore Medical College and Research Institute, Mysuru 570001, India
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(12), 765; https://doi.org/10.3390/a18120765 (registering DOI)
Submission received: 24 October 2025 / Revised: 22 November 2025 / Accepted: 28 November 2025 / Published: 4 December 2025
(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (4th Edition))

Abstract

Estimating bone age is essential for accurate diagnoses, appropriate care based on biological age, and fairness in legal matters. In the Indian medicolegal context, determining age through a clinical approach involves analyzing multiple joints; however, the traditional method can be tedious and subjective, relying heavily on human expertise, which may lead to biased decisions in age-related legal disputes. Moreover, commonly used radiographs often exhibit pixel-level variations due to heterogeneous contrast, which complicate segmentation tasks and lead to inconsistencies and reduced model performance. The study presents a multifocal region-based symbolic segmentation technique to automatically retain the soft-tissue region that harbors a growth pattern of an ossification center. Experimental results demonstrate an 84.5% Jaccard similarity, an 81.4% Dice coefficient, an 88.3% precision, a 90.0% recall, and a 91.5% pixel accuracy on a novel multifocal dataset of Indian inhabitants. The proposed segmentation technique outperforms U-Net, Attention U-Net, TransU-Net, DeepLabV3+, Adaptive Otsu, and Watershed segmentation in terms of accuracy, indicating strong generalizability across joints and improving reliability. Compared with 86.4% without segmentation, the proposed integration of segmentation with VGG16 classification increases the overall accuracy to 93.8%, demonstrating that target-focused-region processing reduces unnecessary computations and improves feature discrimination without sacrificing accuracy.

1. Introduction

The skeletal maturity assessment is essential for maintaining medicolegal investigations, assessing growth patterns, and diagnosing and treating endocrinological issues. From infancy to adulthood, a person’s bones undergo continuous changes in ossification centers, epiphyseal growth, and fusion. Following these morphological patterns is a reliable biological indicator of age. In the Indian medicolegal context, age estimation is typically inferred to be 1–21 years by examining the appearance and fusion patterns of various ossification centers in multiple joints—particularly the wrist, elbow, shoulder, and pelvis [1,2]. These four joints collectively offer substantial diagnostic value, since they cover a wide range of maturation: early ossification changes are prominent in the wrist, reflections of transitional stages in the elbow and shoulder are noticeable, and the pelvis joint undergoes the best fusion process later. Although significant, existing automated systems predominantly focus on wrist radiographs, limiting their generalizability and applicability in forensic scenarios involving the Indian population.
Despite the rise of deep learning and vision-based techniques for skeletal age determination, their performance remains limited by two main factors: poor representation of the relevant anatomical region of interest (ROI) and noisy, heterogeneous data that influence feature extraction. Precise segmentation is accordingly essential, as it separates the ossification center and discards overwhelming background layouts, encouraging the model to learn discriminative skeletal features more effectively. Without segmentation, the automated age estimation system tends to underperform, especially when input radiographs present challenges such as non-standard posture, plaster, and implants to treat fractured regions, varying contrast and illumination, radiological noise and arti-facts, and intra- and inter-class variations in background structures.
The literature reveals that limited publicly available datasets exist, including those from the Radiological Society of North America (RSNA), the Digital Hand Atlas (DHA), and MURA, all originating from Western populations and predominantly containing wrist images. However, skeletal development factors, such as race, diet, sex [3], geographical location, nutritional status, physical activity [4,5], hormone imbalances, and metabolic abnormalities, significantly influence bone growth; investigators may detect differences in the appearance of ossification centers and the fusion process [1]. Studies show that populations of Asian and African descent generally exhibit more extended skeletal development than European groups, resulting in prominent ethnic variation in growth rates. Genetic roots in biological variations and environmental factors influence young children’s bone health, leading to racial differences in skeletal maturation that account for the variation [3,4,5,6,7,8]. Hence, Western reference data and models cannot be directly applied to Indian medicolegal evaluations, creating a critical gap and a pressing need to curate multi-joint datasets representing the Indian population, especially for forensic age estimation, where legal consequences demand high reliability.
For sophisticated medical procedures and computer-aided diagnosis, segmenting medical images is a crucial preprocessing step, enabling efficient feature extraction by locating key structures that improve diagnostic effectiveness. Extracting anatomical cues from particular ossification centers rather than whole X-ray images greatly increases interpretability and accuracy for age estimation. Nevertheless, as Figure 1 illustrates, medical segmentation in this field remains challenging. These elements highlight the need for a robust segmentation pipeline that can differentiate between soft and hard tissue boundaries with anatomical consistency, enabling the model to learn age-dependent skeletal features more effectively. To address these gaps, the proposed work introduces a comprehensive medicolegal-relevant solution for automated estimation from an Indian perspective. The significant contributions are as follows:
  • A novel Indian dataset of wrist, elbow, shoulder, and pelvis X-rays highlights the population’s specific skeletal features, curated in accordance with ethical guidelines;
  • A region-based symbolic segmentation (RSSeg) hierarchical patch-processing segmentation preserves soft-tissue cues to accurately retain ossification centers across multiple joints at an early stage;
  • A classification model focuses on extracting anatomical cues from the segmented region, enhancing robustness, reliability, and generalizability for suitable medicolegal age estimation.
Figure 1. Image samples of challenging conditions encountered during data acquisition: (a) non-standard pose; (b) plaster of Paris on the ROI; (c) fracture on the ossification joint; (d) implants on the ROI; (e) heterogeneous contrast; and (f) radiological artifacts.
Figure 1. Image samples of challenging conditions encountered during data acquisition: (a) non-standard pose; (b) plaster of Paris on the ROI; (c) fracture on the ossification joint; (d) implants on the ROI; (e) heterogeneous contrast; and (f) radiological artifacts.
Algorithms 18 00765 g001
The paper’s organization begins with a brief introduction and a thorough review of relevant research on skeletal age determination in Section 2. Section 3 presents the proposed multifocal RSSeg model architecture and its detailed description. Further, the experiment and results are presented in Section 4, followed by a discussion. Finally, Section 5 concludes by summarizing the paper and offering closing remarks.

2. Related Work

The literature focuses on developing a generalized segmentation model that is essential for precise bone age estimation, ROI capture, and the generation of critical clinical information for assessing skeletal maturity. Differences between chronological and bone ages indicate that bone growth is not progressing properly, which can help identify endocrine disorders and other metabolic or genetic problems [2,8]. Clinicians determine bone age by examining the ossification centers of the carpal bones and evaluating the emergence and fusion of epiphyses in both short and long tubular bones [8]. In recent years, many datasets have been made available for estimating age through bones. However, they often have only a few samples and can be hard for researchers to access when they want to build a robust system.
Studies employ imaging techniques across retrospective datasets: RSNA [8], DHA [9], PACS [10]: wrist X-rays, teeth [11], elbow [12], shoulder X-ray (MURA) [13,14], shoulder 3.0 T MRI [15], pelvis radiographs [16,17], and pelvis CT [18]. However, these issues persist when using these datasets, which present several challenges, including uneven illumination, non-standard poses, blurriness, varying noise, and heterogeneous contrast, all of which can complicate the normalization process [8,9,10,18,19]. To overcome these dataset challenges, this study highlights several preprocessing techniques that enhance image quality, including histogram processing [16], gamma transformation [19], Contrast-Limited Adaptive Histogram Equalization (CLAHE) [8], and generative adversarial networks [20]. Multiple 2D transformation techniques—including angle rotation, zooming, and shearing—restructure diverse hand poses. To balance training data in specific learning models, up-sampling techniques such as cropping, translation, flipping, and rotation address class imbalance [21]. Bilinear interpolation and Gaussian pyramid are used to scale the image, thereby avoiding information loss [22]. Grey-scale and min-max normalization regularize pixel values in X-rays [23]. The median filter, anisotropic diffusion filter, and wavelet packet decomposition suppress background noise through effective elimination [10].
The machine learning-based segmentation approaches, including threshold, region, edge, and cluster-based methods, focus on extracting the hand region. The watershed segmentation extracts the hand mask by first applying a Gaussian pyramid, a Sobel filter, and a median filter, then selecting the largest connected object [22]. Improved Adaptive Otsu thresholding segmentation, introduced by an optimization algorithm used on RSNA and manual datasets, yields better performance [24]. Texture-based Adaptive Crossed-reconstruction k-means clustering yields better homogeneity by varying k-values in the k-means cluster [25]. Template-matching-based Particle Swarm Optimization automates bone image segmentation by extracting edges [26]. Constrained Local Models integrate local models with global constraints to segment the hand region by examining intensity patterns at several landmark locations, defining the hand’s boundary, and predicting similar intensities using template matching [27]. Active shape model joint segmentation automatically crops the hand contour by grouping similar pixels using k-means clustering [28]. EMROI and CROI segmentation suppress the radiological marks, and hand rotation improves the segmentation performance [29].
Deep learning-based segmentation automatically identifies hand regions, achieving remarkable performance in separating the background from the foreground. Optimized U-Net deals with multi-objective segmentation using the Whale-based Class Topper Optimization activation function [30]. Replace the U-Net’s encoder with a VGG16 encoder pre-trained on ImageNet, and then, intentionally weaken the entire network to optimize mask generation [20]. Mask R-CNN includes a parallel branch to predict hand masks, which dramatically improves accuracy by utilizing a RoIAlign layer to address misalignment [31]. Modified k-means clustering segments the epiphysis bones by removing background noise via histogram equalization. The Hotelling T2 model and ADF suppress background noise. WPD extracts edges from the auto-cropped EROI and CROI regions of the H-image, obtained via texture and cluster analysis, which performs better than Iterative Thresholding and Adaptive K-Means Clustering [32]. To accomplish hand-semantic segmentation, Fully Connected DenseNet employs three transition-up and three transition-down blocks [33]. A Faster R-CNN to identify bone locations and the Global-Local Fusion Segmentation Network integrates global and local context for overlapping bones and extracts the ROI in a dataset of elbow radiographs [12]. Attention U-Net, Swin U-Net, and U-Net underwent five-fold cross-validation training using histogram-equalized pelvis X-rays. Among them, Swin U-Net achieved a specificity of 98.50%, indicating exceptional performance [18].
The comprehensive review emphasizes that there is a lack of studies on vision-based automation in the Indian medicolegal context, due to the limited availability of datasets for key growth regions, including the wrist, elbow, shoulder, and pelvis. Identifying ossification centers in these joints remains a complex process due to intra- and inter-pixel variations that obscure their appearance and to fusion conditions that overlap with the background, affecting the accuracy of age estimation. Therefore, developing a generalized segmentation model is crucial for identifying specific growth features while preserving the relevant foreground data across multiple joints for the medicolegal framework. In addition, this review not only highlights the importance of multi-joint datasets but also highlights the need for robust, reliable, and generalizable segmentation in skeletal age estimation solutions tailored to diverse anatomical structures and population-specific variations.

3. Proposed Methodology

This section details the development of a multifocal RSSeg approach for skeletal age estimation based on region-specific pixel variations in ossified joints and soft tissues, thereby capturing growth patterns and automating the process through enhanced techniques. Figure 2 shows the proposed multifocal RSSeg approach for skeletal age estimation.

3.1. Preprocessing

The collected radiological image dataset faces several challenges, including noise and illumination. The methodology uses Gaussian filters to mitigate noise-induced image distortion [10], as shown in Equation (1):
G ( x , y ) = ( 1 / ( 2 π σ ^ 2 ) ) e ^ ( ( x ^ 2 + y ^ 2 ) / ( 2 σ ^ 2 ) )
where the G x , y denotes the Gaussian function value at the point x , y , where x , y represents the coordinate of the filter, and sigma is the standard deviation of the Gaussian distribution.
The CLAHE method enhances image contrast [8] by locally adjusting pixel intensities within small regions, called tiles. It clips the histogram to prevent noise amplification, redistributes overused pixels evenly and performs histogram equalization to provide clearer visibility of bone structures. Subsequently, bilinear interpolation is applied to combine the processed tiles, resulting in enhanced contrast of bone features without introducing artifacts, which significantly improved the model’s performance.
The preprocessing technique involves comparing the original sample image with the enhanced image, as shown in Figure 3. Additionally, Equations (2) and (3) represent the mathematical formula for CLAHE.
g x , y = C D F f x , y C D F m i n T x × T y C D F m i n × L 1
H i = min H i , C + E L
where the Cumulative Distribution Function C D F is computed from the clipped histogram H i , H i is the histogram, C is the clip limit, i represents the pixel intensity, the excess pixels E are redistributed evenly across all bins, L is the total intensity levels, T x × T y is the local tile size, and C D F m i n is the minimum C D F value in the tile.

3.2. Region-Based Symbolic Segmentation

Segmentation plays a vital role in identifying the specific age group for the Indian medicolegal process. The region-based symbolic segmentation process involves separating an image into two distinct regions: the foreground (the bone region) and the background [9,34]. The challenges, such as illumination and homogeneous contrast among bone, tissues, and artifacts, lead to intra- and inter-class similarity between bone and other background information, making the segmentation task difficult [8]. In medicolegal cases, the appearance and fusion of ossification centers are crucial in determining approximate age groups. The conventional segmentation approach overlooks vital details of various joint ossification centers due to uncertainty and the soft boundary between bone and background, resulting in the loss of the essential bone pattern.
To preserve the appearance and fusion of ossification centers, the RSSeg technique efficiently handles ambiguous boundaries by capturing smooth variations within and between pixels. The use of mean and standard deviation of local patches to represent uncertainty in interval data while maintaining ossification center patterns, thereby improving segmentation efficiency.
The RSSeg model efficiently retains the soft-tissue region, which exhibits a growth pattern of the ossification center, aiding age estimation. Initially, divide the enhanced image into a 10 × 10 grid of patches, resulting in a total of 100 patches, ranging from 1 to 100, and further divide each patch into 5 × 5 sub-patches, totaling 25, which are comprised of 1 to 25 sub-patches, to capture local interval variations within the regions, as illustrated in Figure 4. To measure the impurity of each patch ( P c ) efficiently, compute the entropy E for each sub-patch ( S p ) by calculating the probability of each bin pixel, which consists of 26 bins, and the entropy of the sub-patch is calculated using Equation (4). Reject the total entropy of the sub-patches if it is less than a specific threshold E. The mean and standard deviation fall within the intensity thresholds T1 and T2 of the sub-patch; the algorithm discards the region as background ( B g ) . Otherwise, it retains the region as foreground F g , with Equation (5) providing the mathematical representation.
E P c , S p = P c = 1 100 S p = 1 25 x = 0 26 P x × log 2 P x
where P(x) is the probability of the pixel bin (x) varying between 1 and 26.
P a t c h S e g = B g , F g , i f f   E P > T 1   a n d   m e a n P < T 2 o t h e r w i s e
The distribution and density of local muscles and other soft tissues vary in terms of pixel intensity, which plays a crucial role in shaping and maintaining the specific bone viscosity observed at the wrist, elbow, shoulder (mainly involved in movement and fine motor control), and pelvis (engaged in stable weight-bearing) functional requirements. Individually analyze the optimal empirical thresholds to account for each joint’s anatomical variation. The entropy threshold E filters out low-texture patches, while the mean-intensity threshold T 1 and standard deviation T 2 remove background. For wrist entropy E w , calculate Shannon entropy based on empirical observations, if E w values less than 0.1 are considered background information and discarded to avoid additional computational burden. On the other hand, to retain the growth pattern of the wrist, perform the empirical analysis of the mean T w 1 and standard deviation T w 2 intensities of sub-patches. Due to variations in pixel values across image sub-patches, the model dynamically estimates the local mean and standard deviation. T w 1 varies from 55 to 150, and T w 2 varies from 30 to 70 based on the sub-patch foreground pixel. Similarly, elbow entropy E e is less than 0.13, and T e 1 (mean of elbow) varies from 60 to 153, and T e 2 (standard deviation of elbow) varies from 25 to 60, which are slightly higher thresholds that avoid soft-tissue blurring in mid-texture regions. Shoulder entropy E s is less than 0.125; T s 1 (mean of shoulder) varies from 62 to 153, and T s 2 (standard deviation of shoulder) varies from 28 to 64; moderate thresholds retain edges while filtering other spaces. For the pelvis, E p (pelvis entropy) is lower at 0.14; T p 1 (mean of pelvis) varies from 52 to 160, and T p 2 (standard deviation of pelvis) varies from 26 to 170, while higher entropy reduces noise in pelvis regions with high soft-tissue overlap. These thresholds ensure that only patches containing meaningful bone and soft-tissue structures to retain the ossification center are preserved for symbolic segmentation, improving both efficiency and accuracy.
The symbolic approach uses the mean-standard deviation interval representation to retain the intra-class pixel variations observed in X-ray samples [35]. This approach helps maintain the internal heterogeneity within samples from each region. Equations (6) and (7) show the mean and standard deviation for the sample region, respectively:
μ = i = 0 m j = 0 n I i , j
σ = 1 m × n i = 0 m j = 0 n ( I i , j μ )
where i and j indicate the number of rows and columns, respectively, and the μ and σ represent the mean and standard deviation for each X-ray image of the foreground and background images, respectively.
To create an interval data representation for each sub-patch, calculate the μ and σ for the relevant areas. An interval’s lower bound represents the difference between its μ and σ , while its upper bound represents the sum of its μ and σ . Ultimately, μ and σ are the two instances of these intervals of every class found. Equations (8) and (9) formulate the class representative C R :
C R i = μ i f σ i f μ i f + σ i f , μ i b σ i b μ i b + σ i b
C R i = S f , S f + , S b , S b +
where
S f = μ i f σ i f ,   S f + = μ i f + σ i f ,     S b = μ i b σ i b ,   a n d   S b + = μ i b + σ i b
The method computes the similarity between a pixel in a specific region and S f , S f + to assess its foreground belongingness, and similarly calculates the background similarity using [ S b , S b + ] . The pixel classifies the region based on the highest similarity between the foreground and background C R s .
Suppose the crisp value falls within the upper S f and lower ( S f + ) bounds of the interval pixels; one is the similarity value; otherwise, it is zero, used to compute the reference interval region. This process repeats for the remaining pixels.
For a test sample ( s q ) corresponding to two classes, the acceptance count A C is evaluated based on the summation of similarity between the test sample and class representatives given in Equation (10):
A C = i = 1 2 S i m s q , C R i 2
where
S i m s q , C R i 2 = 1 ,                 i f   S q S i   a n d   S q S i + 0 ,                                                                 o t h e r w i s e
Figure 5 presents the segmentation results for four ossification bones, comparing the proposed RSSeg model with popular segmentation models, including U-Net, Attention U-Net, TransU-Net, DeepLabV3+, Adaptive Otsu, and Watershed, and testing against the ground truth.

4. Experimentation and Result

This section provides a detailed description of the dataset, evaluation metrics, experimental setup, and results of the RSSeg model. The proposed model was compared with state-of-the-art models, considering both individual- and multi-joint, to ensure the effectiveness of the study. Additionally, the evaluation compares the model against existing datasets to ensure consistent performance and efficiency. Finally, the visual analyses effectively illustrate the results.

4.1. Dataset

The study focuses on collecting X-ray data of two categories: retrospective and prospective data on various ossifications of bones, including the wrist, elbow, shoulder, and pelvis, from a diverse range of participants aged 0–21 years and above. The retrospective data were obtained from esteemed healthcare organizations with ethical approval from Mysore Medical College and Research Institute, Mysore, including Krishna Rajendra and Apollo BGS Hospitals, for training the model. Additionally, prospective data were collected from volunteers under 18 years old, with their parents’ or guardians’ approval, in radiological laboratories to assess the model’s effectiveness. In the Indian medicolegal context, age is categorized into seven distinct groups [1,2], as outlined in Table 1, which presents the class intervals in years, and Figure 6 illustrates samples of multiple joints corresponding to these class intervals.
Figure 7 clarifies the anatomy of the multiple joints, as validated by the forensic expert, to ensure forensic accuracy, especially regarding ossification centers and joint margins relevant to skeletal age estimation. Specifically, class 1 represents individuals from infancy to toddlerhood (ages: 0.1 to 4 years), referred to as the Indian Medicolegal Infant-Toddler (IMIT). Class 2 represents individuals from 5 to 7 years, labeled as Indian Medicolegal Child (IMC); class 3 represents individuals from 8 to 12 years, labeled as Indian Medicolegal Pre-Adolescent (IMPA); class 4 represents individuals from 13 to 14 years, labeled as Indian Medicolegal Pre-Teen (IMPT); class 5 represents individuals from 15 to 18 years, labeled as Indian Medicolegal Teen (IMT); class 6 represents individuals from 19 to 21 years, labeled as Indian Medicolegal Young Adult (IMYA), while class 7 encompass individuals aged 21year and older, labeled as Indian Medicolegal Adult (IMA) [1].
Initially, the Institutional Ethical Committee of Mysore Medical College and Research Institute, along with the affiliated hospital, granted ethical approval for both prospective and retrospective data collection. A total of 5107 X-rays of various joints were collected, comprising 4959 samples from retrospective data and 148 from prospective data. Some challenges observed in the dataset include class imbalance, heterogeneous contrast, Gaussian noise, non-uniform sizes and resolutions, and different image formats, such as DICOM, JPEG, and PNG. The dataset, comprising 1356 wrist, 1228 elbow, 1506 shoulder, and 980 pelvis X-ray samples, is tabulated in Table 2.
Following the expert’s guidance, all annotations and ground truth were created and subsequently verified by the forensic expert to ensure anatomical and forensic accuracy, especially regarding ossification centers and joint margins relevant to skeletal age estimation. The ground truth using Photoshop was manually created and validated by the expert, serving as the baseline for comparing the proposed segmentation method. During the experiment, the performance of the proposed model was evaluated by comparing its outcomes with the ground truth and computing the percentage of matching area. It was also cross-verified under the supervision of a domain expert, and Figure 5 displays samples of the ground truth image.

4.2. Evaluation Metric

Evaluation metrics quantify the performance of the proposed study by examining the RSSeg model using well-known segmentation metrics and by evaluating various classification model behaviors with and without segmented data using classification metrics. Segmentation accuracy was measured using metrics such as Jaccard similarity, Dice-coefficient, precision, recall, and pixel accuracy [11]. Equations (10)–(14) represent the segmentation evaluation metrics.
Jaccard similarity J S i m determines the similarity between the intersected regions’ intensity of the ground truth G T and the predicted region P :
J S i m S , T = S T S + T S T
Dice coefficient D i c e determines the degree of relationship between the ground truth and the predicted region:
D i c e = 2 × S T S + T
Precision P r e c is the ratio of the intersection to the entire number of predicted pixels:
P r e c = i j T i j × S i j i j T i j
Recall R e c is the ratio of the intersection to the entire number of ground truth pixels:
R e c = i j T i j × S i j i j S i j
where S = G T   a r e a   p i x e l s and T = P   a r e a   p i x e l s .
Pixel accuracy P A c c is the proportion of intensities that are accurately classified in the image. P A c c is an intuitive metric:
P A c c = i = 1 k n i i i = 1 k t i
where n i i is the total number of intersected intensities of both P and G T of the i t h class and t i is the total number of intensities in the i t h class of G T .

4.3. Experimental Setup

Python 3.12.4, part of the Anaconda distribution, was used for the experiment conducted in Spyder IDE. The model was constructed and tested by using TensorFlow version 2.17.0 with integrated Keras. For GPU acceleration, the setup used CUDA and cuDNN on a machine equipped with an Intel(R) Core(TM) i7-10750H CPU, 16 GB of RAM, and a 6 GB NVIDIA GeForce RTX 3060 GPU. To maintain consistency across the experiments, all samples were resized to 250 × 250 × 3 pixels. A Gaussian filter was applied empirically with σ values of 1, 1.5, 2, 2.5, 3, and 3.5, among others. Notably, a σ value of 2.5 resulted in the best enhancement, particularly due to the presence of Gaussian noise. CLAHE was used to enhance contrast by setting the clip factor to 2 and using standard tile sizes of 25 × 25 pixels, which helped reduce local contrast artifacts. Bilinear interpolation was also frequently used to smooth transitions between tiles.
In the dataset, the wrist consists of 1356 images, the elbow consists of 1228 total samples, the shoulder consists of 1506 total samples, and the pelvis consists of 980 total samples. To analyze the experimental results with U-Net [8,19], Attention U-Net [18], TransU-Net [11], and DeepLabV3+ [21] models, a patient-level stratified five-fold cross-validation was applied to preserve seven age-group proportions and prevent subject overlap between folds. The images were resized to 256 × 256 pixels, normalized and augmented with rotations and flips.
The RSSeg segmentation model divides the enhanced image into a 10 × 10 grid of patches, resulting in 100 patches, and further divides each patch into 5 × 5 sub-patches, yielding 25 sub-patches to capture local interval variations within regions. To efficiently measure the impurity of each patch, compute the entropy for each sub-patch by calculating the probability of each bin pixel, which consists of 26 bins. Discard the total entropy of the sub-patch if it is less than the specific thresholds of joints. The mean and standard deviation fall within the thresholds T 1 and T 2 for the sub-patch; the algorithm retains the region as foreground; otherwise, it discards it as background. These thresholds, T 1 and T 2 , are entirely dependent on the local pattern variation within the sub-patch of a specific joint, due to the dynamic thresholds of the regional pattern, which are essential for preserving the symbolic representation of an ossification center. The proposed RSSeg approach retains such region-specific local patterns to improve the segmentation efficiency. To ensure model fairness, experiments were conducted and predicted regions with ground truth data were compared, yielding accuracies of 92.1% for the wrist, 93.7% for the elbow, 89.9% for the shoulder, and 90.3% for the pelvis.
Similarly, Table 3 and Figure 8 present the evaluation results, comparing the proposed RSSeg model with U-Net [8,19], Attention U-Net [18], TransU-Net [11], DeepLabV3+ [21], Adaptive Otsu [24], and Watershed [22]. The following parameters were used to conduct the experiment and evaluate the model’s performance. The experiment used a dataset containing four joints and their corresponding ground-truth masks. Each model was evaluated on a single joint at a time, using stratified five-fold cross-validation. Each fold represents the distribution of the seven age categories more fairly and mitigates imbalance in deep segmentation models such as U-Net, Attention U-Net, TransU-Net, and DeepLabV3+.
The U-Net model with a ResNet34 encoder was trained with each joint separately using the Adam optimizer with a learning rate of 1 × 10−4 and the Dice loss function for 50 epochs. The experimental results show improvements of 88.5 ± 0.50% for the wrist, 85.5 ± 0.51% for the elbow, 88.5 ± 0.12% for the shoulder, and 80.3 ± 0.27% for the pelvis region, outperforming Adaptive Ostu and Watershed segmentation.
The Attention U-Net model was trained using the AdamW optimizer with a learning rate of 1 × 10−4, binary cross-entropy loss, polynomial learning rate decay over 50 epochs, a batch size of 16, a fixed seed value of 42 on the multi-focal dataset, and a drop of 0.3 (decoder) with attention gates enhancing feature refinement in the skip connection. At the same time, a cosine annealing scheduler was used for stable convergence. The model achieves segmentation performances of 89.4 ± 0.89% for the wrist, 88.5 ± 0.54% for the elbow, 89.9 ± 0.10% for the shoulder, and 84.9 ± 0.25% for the pelvis.
The TransU-Net model was trained using the AdamW optimizer with a learning rate of 1 × 10−4 (and a 5 × 10−5 warm-up for TransUNet), binary cross-entropy loss, polynomial learning rate decay over 50 epochs, a batch size of 16, a fixed seed value of 42 on the multi-focal dataset, a drop of 0.1 (Transformer blocks), and transformer-enhanced encoding for long-range context modeling. The TransU-Net model demonstrates segmentation performances of 91.3 ± 0.90% for the wrist, 89.2 ± 0.51% for the elbow, 90.9 ± 0.11% for the shoulder, and 85.9 ± 0.56% for the pelvis. Compared to U-Net, Attention U-Net, TransUNet, and DeepLabV3+, the latter three show a slight improvement in overall pixel accuracy and better performance on larger structures, but require more computational resources.
The fine-tuning of DeepLabV3+ with a ResNet101 backbone pretrained on ImageNet uses the AdamW optimizer with a learning rate of 1 × 10−4, applies cross-entropy loss and incorporates polynomial learning rate decay over 50 epochs, along with Atrous rates to capture multi-scale features. The model achieves a significant improvement in pixel accuracy, reaching segmentation performances of 91.9 ± 0.06% for the wrist, 89.6 ± 0.48% for the elbow, 91.3 ± 0.15% for the shoulder, and 86.9 ± 0.35% for the pelvis.
The Adaptive Otsu method divides each image into 10 × 10 non-overlapping patches and applies local Otsu thresholding within each block to account for regional intensity variations. Post-processing included morphological closing to smooth bone boundaries and remove minor artifacts. The model achieves an overall pixel accuracy of 83.3%, with specific accuracies of 86.5% for the wrist, 77.8% for the elbow, 91.6% for the shoulder, and 77.4% for the pelvis, which are lower than those of other models.
The watershed segmentation technique prevented over-segmentation in low-contrast areas by utilizing Otsu’s thresholding, distance transformations, and a marker-controlled watershed with background markers generated from morphological erosion and foreground markers derived from regional maxima. The model achieves an overall pixel accuracy of 84.0%, with specific accuracies of 91.1% for the wrist, 76.5% for the elbow, 76.3% for the pelvis, and 92.1% for the shoulder, outperforming other models. Table 3 and Figure 8 present the performance of individual joints, measured in terms of pixel accuracy.
In comparison, various segmentation models were tested across multiple joints to measure the performance with the proposed RSSeg. The RSSeg shows better generalization, achieving an 84.5% Jaccard similarity, an 81.4% Dice coefficient, an 88.3% precision, a 90.0% recall, and a 91.5% pixel accuracy, making it suitable for real-time applications. A summary of the performance is provided in Table 4 and visualized in Figure 9.
A performance comparison of DeepLabV3+ and the proposed RSSeg model, using a paired statistical analysis across several segmentation metrics, is depicted in Table 5. With mean differences of 1.1% in Jaccard similarity, 0.5% in Dice coefficient, 1.3% in precision, 0.8% in recall, and 1.6% in pixel accuracy, the results show that RSSeg consistently outperforms DeepLabV3+. The higher segmentation performance demonstrates that the proposed RSSeg surpasses DeepLabV3+ on the multifocal dataset, as indicated by a paired t-test (p < 0.05) across all evaluated parameters.
Table 6 and Figure 10 present the effectiveness of the proposed RSSeg model on several existing datasets. Despite noise and varying acquisition protocols, the proposed model achieves pixel accuracies of 92.1% on DHA and 96.7% on RSNA. The elbow’s 95.2% pixel-level accuracy demonstrates the model’s generalizability, and the shoulder’s higher recall of 91.5% further supports this. The model also demonstrates improved efficiency across different noise and illumination conditions, and was tested on a pelvis dataset, achieving a precision of 88.3%.
This study assessed the importance of segmentation by conducting experiments on segmented and non-segmented images and analyzing the impact of segmentation on deep-learning classification models, including VGG16 [19], ResNet50 [36], and InceptionV3 [37]. The study implemented all classification models using TensorFlow, initialized them with ImageNet-pretrained weights and trained them using various splits for training, validation, and testing data, as shown in Table 7, which details the sample splits. Images were resized to 250 × 250 pixels and normalized using channel-wise means and standard deviations. The models were trained with a batch size of 16, using the Adam optimizer (learning rate = 0.0001, decay = 1 × 10−4), and employed data augmentation, including ±15° rotations and horizontal flips.
The VGG16 model was trained separately for 50, 53, 50, and 60 epochs on the wrist, elbow, shoulder, and pelvis regions, respectively, using early stopping with patience of 5 and cross-entropy loss. The implementation replaced the final layer with a two-layer dense network and a Softmax-activated classification layer. The ResNet50 model was trained for 40 epochs with early stopping (patience = 3) and binary cross-entropy loss. The architecture modified the final layers by implementing a global average pooling layer, followed by two ReLU-activated dense layers and a softmax classification layer. The InceptionV3 model was trained for 36 epochs with early stopping (patience = 5) and categorical cross-entropy loss. The top-layer replacement process used a global average pooling layer, followed by two dense layers with ReLU activations, and an output layer with a softmax activation function.
The results of age estimation models (VGG16, ResNet50, and InceptionV3) trained with varying train−test ratios (60–40, 70–30, and 80–20) were examined. The 80–20 split shows better performance on VGG16, ResNet50, and InceptionV3 than other splits, such as 60–40 and 70–30, which fail to capture more diverse growth patterns with limited training samples. The evaluation applies the training configuration mentioned above, with VGG16 demonstrating the best segmentation performance, achieving an overall accuracy of 93.8%, a precision of 92.6%, a recall of 92.9%, and an F1-score of 92.8%. Compared to its performance without segmentation, where the overall accuracy was 86.4%, this result represents a significant improvement. Interestingly, VGG16 delivers substantial improvements in the wrist and elbow regions, achieving an F1-score of nearly 94.0%, an accuracy above 94.0%, and a precision and a recall between 92.0% and 95.0%. These enhancements demonstrate that VGG16 effectively leverages segmented inputs to identify localized features crucial for age estimation. For the wrist, RSSeg results improved by 8.1% in accuracy with VGG16 (87.8–95.9%), 4.7% with ResNet50 (89.5–94.2%), and 8.0% with InceptionV3 (84.5–92.5%). For the elbow, an accuracy improved by 8.8% with VGG16 (85.8–94.6%), 4.6% with ResNet50 (88.5–93.1%), and 8.1% with InceptionV3 (85.6–93.7%). The carpal bones of the wrist and the CRITOE of the elbow ossification centers are visible at an early stage, and their appearance and fusion states are evident. VGG16 achieves better performance than others because of its ability to retain fine-grained local features. For the shoulder, the accuracies were 7.4% with VGG16 (86.5–93.9%), 5.6% with ResNet50 (86.9–92.5%), and 8.1% with InceptionV3 (83.8–91.9%). Hence, for the pelvis, the accuracies were 5.3% with VGG16 (85.5–90.8%), 6.4% with ResNet50 (84.6–91.0%), and 10.5% with InceptionV3 (80.1–90.6%). The humerus head and tubercles of the shoulder, as well as femur head, trochanters, and iliac crest of the pelvis, are complex, multi-scale anatomical features that were captured effectively by InceptionV3, which failed to be retained by ResNet50 and VGG16.
Additionally, ResNet50 performs better; its overall accuracy rises from 87.4% without segmentation to 92.7% when segmented, while its precision, recall, and F1-score all improve from 85.1% to 92.0%, 87.3% to 92.6%, and 86.2% to 92.2%, respectively. InceptionV3 shows the overall precision improving from 82.9% to 90.0%, the recall from 82.7% to 90.9%, and the F1-score from 82.8% to 90.5%, while its accuracy increases from 83.5% to 92.2%. These results are tabulated in Table 8 and visualized in Figure 11, which shows that the proposed RSSeg improves the model’s performance.
The five-fold cross-validation results show that the proposed RSSeg model performs consistently across multiple joints, with the highest accuracy of 92.0 ± 3.30% at the wrist and 88.5 ± 2.31% at the pelvis, which is slightly lower. Overall, the precision of 88.7 ± 3.68% and the recall of 89.5 ± 3.62% are similarly high, indicating reliable detection of relevant pixels, while the overall F1-score of 89.0 ± 3.58% confirms balanced, robust segmentation performance. Table 9 and Figure 12 show slightly higher variability in specific regions that reflects anatomical complexity, but overall, the model demonstrates accurate and consistent results across folds.

5. Discussion

The RSSeg model’s hierarchical grid process retains the foreground growth patterns, which are essential for bone age estimation, by computing joint- and grid-specific entropies to avoid computational burden while effectively preserving growth patterns through symbolic segmentation. Figure 13a shows that the proposed method minimizes overhead while maintaining the soft-tissue areas essential for ossification pattern analysis. Figure 13b shows the segmentation result using DeepLabV3+, which yields low performance, especially in thin osification regions, cluttered backgrounds, and small datasets. Table 8 shows the significant impact of the RSSeg model on age estimation.
The proposed RSSeg model captures the dynamic interval values to understand bone and ossification patterns by empirically calculating joint- and grid-specific entropies, means, and standard deviations. The model, which uses a 10 × 10 grid and neglects higher-level background regions with lower entropy and lower average pixel values, reduces unnecessary computation, as shown in Figure 4. However, the retained grids are further divided into 5 × 5 sub-grids, and the same retention and discarding criteria are applied to each sub-grid. Then, the region-specific symbolic segmentation is applied to the retained growth pattern, capturing local interval variations within each region and enabling the efficient handling of intra-class variations in both foreground and background. Applying a dynamic region-specific interval representation effectively handles regions with varied illumination, noise, insubstantial osification regions, and acquisition-related challenges in X-rays, thereby generalizing the segmentation task. Furthermore, the RSSeg model performs well on benchmark datasets such as RSNA and DHA, demonstrating its generality and robustness across diverse ethnic backgrounds.
Even though the RSSeg model produces efficient results, the pelvis poses additional challenges, including overlapping bones, high-density anatomical features across joints, and regions that are overly noisy and difficult to distinguish from growth patterns. In addition, these challenges make both contextual and non-contextual segmentation difficult.
In the future, integrating Transformer models such as MiDaS and ZoeDepth with an RSSeg model approach could help retain the depth anatomical features of high-density overlapping bone across joints to enhance effective segmentation.
Further, to assess the impact of the RSSeg model, the segmented results were analyzed using the VGG16, ResNet50, and InceptionV3 models. Table 8 shows that the proposed RSSeg improved overall accuracy and F1-score by 7.4% and 6.9% with VGG16, 5.3% and 6.0% with ResNet50, and 8.7% and 7.7% with InceptionV3, respectively. These results show that effective segmentation is vital for retaining growth patterns, which is essential for accurate age estimation. These results show that RSSeg is most beneficial when discriminative cues are small and local; when cues are large and global, the model becomes comparatively more critical.
To examine the results of a deep learning-based age-estimation, a model trained with varying train−test ratios was used. The models require a large number of training samples to capture contextual information. Due to limited dataset availability per class, the 80-20 split shows better performance on VGG16, ResNet50, and InceptionV3 than other splits, such as 60-40 and 70-30, which fail to capture more diverse growth patterns.
To analyze the effectiveness of deep learning-based age estimation, the models were trained on various joints with different hyperparameters. VGG16 achieves overall performance, with a 93.8% accuracy, a 92.6% precision, a 92.9% recall, and a 92.8% F1-score, demonstrating the model’s ability to retain fine-grained local features and surpassing other models. However, ResNet-50 and Inception-V3 yield slightly lower performance, reflecting their greater focus on global shape and multi-scale context. In contrast, deeper pyramidal architectures partially capture growth patterns through depth-wise convolution and pooling.
For the wrist, elbow, and shoulder, accuracies with VGG16 (95.9%, 94.6%, and 93.9%) were higher than with ResNet50 (94.2%, 93.1%, and 92.5%) or InceptionV3 (92.5%, 93.7%, and 91.9%). The carpal bones of the wrist, the CRITOE bones of the elbow, and the humeral head and tubercles of the shoulder ossification centers are visible at an early stage, and their fusion states gradually vary. Additionally, fusion of the shoulder joints is complete at the adolescent stage. Capturing minute variation among the dominant regions is essential without losing contextual information in the deeper pooling process. In contrast, ResNet50 shows a slight improvement in accuracy, reaching 91.0% compared to VGG16. The femur head, trochanters, and iliac crest of the pelvis; these high-density, varied anatomical features are captured effectively by ResNet50. Despite improvements, the pelvis remains the most challenging anatomical structure, resulting in lower segmentation performance and lower classification efficiency compared to other joints.

6. Conclusions

The RSSeg model offers a robust and automated solution for skeletal age estimation in the Indian medicolegal context, addressing the key limitations of existing methods. The study utilizes a novel dataset comprising both retrospective and prospective X-ray data that capture multiple ossification centers, including the wrist, elbow, shoulder, and pelvis, across a diverse age range from 0 to 21 years and above. The distinguishing characteristic of the proposed model is its retention of soft growth regions, which are crucial for extracting prominent growth features using small, multi-patch analysis. By leveraging entropy-based filtering, it reduces the computational burden and efficiently eliminates high-purity background regions. The symbolic segmentation applied to the foreground patches effectively retains the growth pattern. The RSSeg model, compared with popular existing segmented models (U-Net, Attention U-Net, TransU-Net, DeepLabV3+, Adaptive Otsu, and Watershed), achieves a notable improvement in pixel accuracy of 91.5% and generalizability across multiple joints, while maintaining lower computational overhead through targeted patch analysis. Furthermore, integrating RSSeg with the VGG16 classifier significantly improves overall accuracy, increasing from 83.5% to 92.2% compared to the case without segmentation, clearly indicating that focused-region processing minimizes redundant computations and enhances feature discrimination without compromising accuracy. This process has the potential to improve transparency and ensure fairness in legal matters and medical practice.
Future work aims to scale the model by incorporating additional ossification centers and expanding the dataset to refine intra-class variability and improve class imbalance handling, thereby achieving even greater accuracy. Integrating transformer models with an RSSeg model approach that retains both contextual and non-contextual variation with depth anatomical features across joints for effective segmentation. The multi-joints collectively offer substantial diagnostic value, as they span a wide range of maturation. To address this effectively, a transformer-based weighted multifocal age estimation is required.

Author Contributions

Conceptualization, P.M., M.N., and D.R.; methodology, P.M., M.N., and S.R.; software, P.M., M.N., and S.R.; validation, M.N. and D.R.; formal analysis, P.M., M.N., and D.R.; investigation, P.M.; resources, P.M. and M.N.; data curation, P.M. and D.R.; writing—original draft preparation, P.M. and M.N.; writing—review and editing, P.M., M.N., S.R., and D.R.; visualization, P.M. and M.N.; supervision, M.N. and D.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ref. MMC EC 42/22. Ethical approval was waived by the Institutional Ethical Committee at Mysore Medical College and Research Institute and the associated hospital in Mysore, given the study’s retrospective and prospective nature and the fact that all procedures performed were part of routine care.

Data Availability Statement

The raw/processed data required to reproduce the above findings are made available upon request to the corresponding author, subject to institutional ethical guidelines and privacy regulations.

Acknowledgments

The authors thank Krishna Rajendra and Apollo BGS Hospitals, Mysuru, Karnataka, India, for their support in collecting the retrospective data. We thank Dayananda R and Unnimaya S, Post Graduate, Dept. of Forensic Medicine, Mysore Medical College and Research Institute, and Athira Satheesh, Apollo BGS Hospitals, for their guidance, labelling, and validation of the dataset and verification of the experimental outcomes.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
RSSegregion-based symbolic segmentation
ROIregion of interest
RSNARadiological Society of North America
DHADigital Hand Atlas
CLAHEContrast-Limited Adaptive Histogram Equalization
CDFCumulative Distribution Function
P c patch
E entropy
S p sub-patch
B g background
F g foreground
T 1 mean-intensity threshold
T 2 standard deviation
C R class representative
A C acceptance count
IMITIndian Medicolegal Infant-Toddler
IMCIndian Medicolegal Child
IMPAIndian Medicolegal Pre-Adolescent
IMPTIndian Medicolegal Pre-Teen
IMTIndian Medicolegal Teen
IMYAIndian Medicolegal Young Adult
IMAIndian Medicolegal Adult
CRITOEcapitellum, radius, internal epicondyle, trochlea, olecranon, external epicondyle

References

  1. Bhardwaj, V.; Kumar, I.; Aggarwal, P.; Singh, P.K.; Shukla, R.C.; Verma, A. Demystifying the Radiography of Age Estimation in Criminal Jurisprudence: A Pictorial Review. Indian J. Radiol. Imaging 2024, 34, 496–510. [Google Scholar] [CrossRef]
  2. Reddy, K.N. The Essentials of Forensic Medicine and Toxicology, 34th ed.; Jaypee Brothers Medical Publisher (P), Ltd.: New Delhi, India, 2017; p. 501. [Google Scholar]
  3. Walker, M.D.; Novotny, R.; Bilezikian, J.P.; Weaver, C.M. Race and Diet Interactions in the Acquisition, Maintenance, and Loss of Bone12. J. Nutr. 2008, 138, 1256S–1260S. [Google Scholar] [CrossRef]
  4. Zengin, A.; Prentice, A.; Ward, K.A. Ethnic differences in bone health. Front. Endocrinol. 2015, 6, 24. [Google Scholar] [CrossRef]
  5. Grgic, O.; Shevroja, E.; Dhamo, B.; Uitterlinden, A.G.; Wolvius, E.B.; Rivadeneira, F.; Medina-Gomez, C. Skeletal maturation in relation to ethnic background in children of school age: The Generation R Study. Bone 2020, 132, 115180. [Google Scholar] [CrossRef]
  6. Martin, D.D.; Wit, J.M.; Hochberg, Z.E.; Sävendahl, L.; Van Rijn, R.R.; Fricke, O.; Cameron, N.; Caliebe, J.; Hertel, T.; Kiepe, D.; et al. The use of bone age in clinical practice—Part 1. Horm. Res. Paediatr. 2011, 76, 1–9. [Google Scholar] [CrossRef]
  7. Kim, S.Y.; Oh, Y.J.; Shin, J.Y.; Rhie, Y.J.; Lee, K.H. Comparison of the Greulich-Pyle and Tanner Whitehouse (TW3) Methods in Bone Age Assessment. J. Korean Soc. Pediatr. Endocrinol. 2008, 13, 50–55. [Google Scholar]
  8. Pan, X.; Zhao, Y.; Chen, H.; Wei, D.; Zhao, C.; Wei, Z. Fully Automated Bone Age Assessment on Large-Scale Hand X-Ray Dataset. Int. J. Biomed. Imaging 2020, 2020, 8460493. [Google Scholar] [CrossRef]
  9. Wibisono, A.; Mursanto, P. Multi region-based feature connected layer (RB-FCL) of deep learning models for bone age assessment. J. Big Data 2020, 7, 67. [Google Scholar] [CrossRef]
  10. Chen, X.; Li, J.; Zhang, Y.; Lu, Y.; Liu, S. Automatic feature extraction in X-ray image based on deep learning approach for determination of bone age. Future Gener. Comput. Syst. 2020, 110, 795–801. [Google Scholar] [CrossRef]
  11. Nagaraju, Y.; Darshan, D.; Sahanashree, K.J.; Nagamani, P.N.; Satish, B.B. BoneSegNet: Enhanced 2D-TransUnet Model for Multiclass Semantic Segmentation of X-Ray Images of Human Hand Bone. In Proceedings of the 2024 IEEE North Karnataka Subsection Flagship International Conference (NKCon), Bagalkote, India, 21–22 September 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar] [CrossRef]
  12. Wei, D.; Wu, Q.; Wang, X.; Tian, M.; Li, B. Accurate instance segmentation in Pediatric elbow radiographs. Sensors 2021, 21, 7966. [Google Scholar] [CrossRef]
  13. Alzubaidi, L.; Salhi, A.; Fadhel, M.A.; Bai, J.; Hollman, F.; Italia, K.; Pareyon, R.; Albahri, A.S.; Ouyang, C.; Santamaría, J.; et al. Trustworthy deep learning framework for the detection of abnormalities in X-ray shoulder images. PLoS ONE 2024, 19, e0299545. [Google Scholar] [CrossRef]
  14. Turk, S.; Bingol, O.; Coskuncay, A.; Aydin, T. The impact of implementing backbone architectures on fracture segmentation in X-ray images. Eng. Sci. Technol. Int. J. 2024, 59, 101883. [Google Scholar] [CrossRef]
  15. Altinsoy, H.B.; Gurses, M.S.; Bogan, M.; Unlu, N.E. Applicability of 3.0 T MRI images in the estimation of full age based on shoulder joint ossification: Single-centre study. Leg. Med. 2020, 47, 101767. [Google Scholar] [CrossRef]
  16. Ma, Y.G.; Cao, Y.J.; Zhao, Y.H.; Zhou, X.J.; Huang, B.; Zhang, G.C.; Huang, P.; Wang, Y.H.; Ma, K.J.; Chen, F.; et al. Sex Estimation of Medial Aspect of the Ischiopubic Ramus in Adults Based on Deep Learning. Fa Yi Xue Za Zhi 2023, 39, 129–136. [Google Scholar] [CrossRef]
  17. Li, Y.; Huang, Z.; Dong, X.; Liang, W.; Xue, H.; Zhang, L.; Zhang, Y.; Deng, Z. Forensic age estimation for pelvic X-ray images using deep learning. Eur. Radiol. 2019, 29, 2322–2329. [Google Scholar] [CrossRef]
  18. JLee, J.M.; Park, J.Y.; Kim, Y.J.; Kim, K.G. Deep-learning-based pelvic automatic segmentation in pelvic fractures. Sci. Rep. 2024, 14, 12258. [Google Scholar] [CrossRef]
  19. Gao, Y.; Zhu, T.; Xu, X. Bone age assessment based on deep convolution neural network incorporated with segmentation. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1951–1962. [Google Scholar] [CrossRef]
  20. Liu, B.; Zhang, Y.; Chu, M.; Bai, X.; Zhou, F. Bone age assessment based on rank-monotonicity enhanced ranking CNN. IEEE Access 2019, 7, 120976–120983. [Google Scholar] [CrossRef]
  21. Zulkifley, M.A.; Abdani, S.R.; Zulkifley, N.H. Automated bone age assessment with image registration using hand X-ray images. Appl. Sci. 2020, 10, 7233. [Google Scholar] [CrossRef]
  22. Han, Y.; Wang, G. Skeletal bone age prediction based on a deep residual network with spatial transformer. Comput. Methods Programs Biomed. 2020, 197, 105754. [Google Scholar] [CrossRef]
  23. Chen, X.; Zhang, C.; Liu, Y. Bone age assessment with X-ray images based on contourlet motivated deep convolutional networks. In Proceedings of the 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Vancouver, BC, Canada, 29–31 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
  24. Deshmukh, S.; Khaparde, A. Faster region-convolutional neural network oriented feature learning with optimal trained recurrent neural network for bone age assessment for pediatrics. Biomed. Signal Process. Control. 2022, 71, 103016. [Google Scholar] [CrossRef]
  25. Chai, H.Y.; Wee, L.K.; Swee, T.T.; Salleh, S.H. Adaptive crossed reconstructed (acr) k-mean clustering segmentation for computer-aided bone age assessment system. Int. J. Math. Models Methods Appl. Sci. 2011, 5, 628–635. [Google Scholar]
  26. Liu, J.; Qi, J.; Liu, Z.; Ning, Q.; Luo, X. Automatic bone age assessment based on intelligent algorithms and comparison with TW3 method. Comput. Med. Imaging Graph. 2008, 32, 678–684. [Google Scholar] [CrossRef]
  27. Adeshina, S.A.; Lindner, C.; Cootes, T.F. Automatic segmentation of carpal area bones with random forest regression voting for estimating skeletal maturity in infants. In Proceedings of the 2014 11th International Conference on Electronics, Computer and Computation (ICECCO), Abuja, Nigeria, 29 September 2014–1 October 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1–4. [Google Scholar] [CrossRef]
  28. Cunha, P.; Moura, D.C.; Guevara López, M.A.; Guerra, C.; Pinto, D.; Ramos, I. Impact of ensemble learning in the assessment of skeletal maturity. J. Med. Syst. 2014, 38, 87. [Google Scholar] [CrossRef]
  29. Giordano, D.; Spampinato, C.; Scarciofalo, G.; Leonardi, R. An automatic system for skeletal bone age measurement by robust processing of carpal and epiphysial/metaphysial bones. IEEE Trans. Instrum. Meas. 2010, 59, 2539–2553. [Google Scholar] [CrossRef]
  30. Deshmukh, S.; Khaparde, A. Multi-objective segmentation approach for bone age assessment using parameter tuning-based U-net architecture. Multimed. Tools Appl. 2022, 81, 6755–6800. [Google Scholar] [CrossRef]
  31. Salim, I.; Hamza, A.B. Ridge regression neural network for pediatric bone age assessment. Multimed. Tools Appl. 2021, 80, 30461–30478. [Google Scholar] [CrossRef]
  32. Rajitha, B.; Agarwal, S. Segmentation of Epiphysis Region-of-Interest (EROI) using texture analysis and clustering method for hand bone age assessment. Multimed. Tools Appl. 2022, 81, 1029–1054. [Google Scholar] [CrossRef]
  33. Zhao, C.; Han, J.; Jia, Y.; Fan, L.; Gou, F. Versatile framework for medical image processing and analysis with application to automatic bone age assessment. J. Electr. Comput. Eng. 2018, 2018, 2187247. [Google Scholar] [CrossRef]
  34. Mao, J.; Men, P.; Guo, H.; An, J. Region-based two-stage MRI bone tissue segmentation of the knee joint. IET Image Process. 2022, 16, 3458–3470. [Google Scholar] [CrossRef]
  35. Guru, D.S.; Vinay Kumar, N. Symbolic representation and classification of logos. In Proceedings of the International Conference on Computer Vision and Image Processing: CVIP 2016, Roorkee, India, 26–28 February 2016; Springer: Singapore, 2016; Volume 1, pp. 555–569. [Google Scholar] [CrossRef]
  36. Priyanka, M.; Sreekumar, S.; Arsh, S. Detection of COVID-19 from the chest X-ray images: A comparison study between CNN and ResNet-50. In Proceedings of the 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India, 16–17 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–7. [Google Scholar] [CrossRef]
  37. Ozdemir, C.; Gedik, M.A.; Kaya, Y. Age Estimation from Left-Hand Radiographs with Deep Learning Methods. Trait. Du Signal 2021, 38, 1565–1574. [Google Scholar] [CrossRef]
Figure 2. The proposed multifocal region-based symbolic segmentation approach for skeletal age estimation from an Indian medicolegal perspective.
Figure 2. The proposed multifocal region-based symbolic segmentation approach for skeletal age estimation from an Indian medicolegal perspective.
Algorithms 18 00765 g002
Figure 3. Sample input images of multiple joints: (a) before enhancement; (b) after enhancement via CLAHE.
Figure 3. Sample input images of multiple joints: (a) before enhancement; (b) after enhancement via CLAHE.
Algorithms 18 00765 g003
Figure 4. The proposed RSSeg model is essential for hierarchically distinguishing foreground from background in the improved image while preserving region-specific growth features. Initially, a 10 × 10 patch of the image is processed; if the minimum criteria are met, then retain it. The retained patch is then further divided into 5 × 5 sub-patches, and the exact requirements for each sub-patch are measured to determine whether to maintain or discard it.
Figure 4. The proposed RSSeg model is essential for hierarchically distinguishing foreground from background in the improved image while preserving region-specific growth features. Initially, a 10 × 10 patch of the image is processed; if the minimum criteria are met, then retain it. The retained patch is then further divided into 5 × 5 sub-patches, and the exact requirements for each sub-patch are measured to determine whether to maintain or discard it.
Algorithms 18 00765 g004
Figure 5. Segmentation results: (a) original image; (b) ground truth; (c) U-Net; (d) Attention U-Net; (e) TransU-Net; (f) DeepLabV3+; (g) Adaptive Otsu; (h) Watershed; and (i) proposed RSSeg model.
Figure 5. Segmentation results: (a) original image; (b) ground truth; (c) U-Net; (d) Attention U-Net; (e) TransU-Net; (f) DeepLabV3+; (g) Adaptive Otsu; (h) Watershed; and (i) proposed RSSeg model.
Algorithms 18 00765 g005
Figure 6. A sample image of the appearance and fusion of various ossification centers across multiple joints in the Indian population.
Figure 6. A sample image of the appearance and fusion of various ossification centers across multiple joints in the Indian population.
Algorithms 18 00765 g006
Figure 7. The appearance and fusion state of various joint anatomies (a) Wrist (IMIT: capitate, hamate, triquetral, lunate, radius base and 1st MC head and base, 3rd MC head, 5th MC head, 1st PP, 3rd PP, 5th PP, 3rd MP, 5th MP, 1st DP, and 3rd DP (appearance); IMC: appearance of trapezoid, trapezium, scaphoid, and ulna base, 5th DP (appearance); IMPA: pisiform (appearance), MC’s, PP’s, MP’s, and DP’s (non-union with separate epiphysis); IMPT: MC’s, PP’s, MP’s, and DP’s (partial union); IMT: MC’s, PP’s, MP’s, and DP’s (complete union); IMYA: nil; IMA (above): nil); (b) Elbow (IMIT: capitellum (appearance); IMC: medial/internal epicondyle and radius head (appearance); IMPA: trochlea, lateral/external epicondyle, and ulna head (appearance); IMPT: capitellum, trochlea, and lateral epicondyle (conjoint epiphysis); olecranon (appearance); IMT: medial epicondyle, radius head, and ulna head (conjoint epiphysis); olecranon (fuses with shaft of ulna seen in lateral view); IMYA: nil; IMA: nil); (c) Shoulder (IMIT: humerus head, greater tubercle (appearance); IMC: lesser tubercle (appearance); IMPA: humerus head, greater tubercle, and lesser tubercle (conjoint epiphysis); IMPT: acromion process (appearance); IMT: acromion process (appearance and conjoint epiphysis with the scapular); IMYA: humerus head, greater tubercle, and lesser tubercle (conjoint epiphysis fuses with shaft of humer); IMA: nil); (d) Pelvis (IMIT: femur head and greater trochanter (appearance); IMC: ischiopubic ramus (fusion); IMPA: lesser trochanter (appearance); IMPT: lesser trochanter and iliac crest (appearance); IMT: ischial tuberosity (appearance), femur head, greater trochanter, and lesser trochanter (fusion); IMYA: lliac crest and ischial tuberosity (fusion); IMA: nil) [1].
Figure 7. The appearance and fusion state of various joint anatomies (a) Wrist (IMIT: capitate, hamate, triquetral, lunate, radius base and 1st MC head and base, 3rd MC head, 5th MC head, 1st PP, 3rd PP, 5th PP, 3rd MP, 5th MP, 1st DP, and 3rd DP (appearance); IMC: appearance of trapezoid, trapezium, scaphoid, and ulna base, 5th DP (appearance); IMPA: pisiform (appearance), MC’s, PP’s, MP’s, and DP’s (non-union with separate epiphysis); IMPT: MC’s, PP’s, MP’s, and DP’s (partial union); IMT: MC’s, PP’s, MP’s, and DP’s (complete union); IMYA: nil; IMA (above): nil); (b) Elbow (IMIT: capitellum (appearance); IMC: medial/internal epicondyle and radius head (appearance); IMPA: trochlea, lateral/external epicondyle, and ulna head (appearance); IMPT: capitellum, trochlea, and lateral epicondyle (conjoint epiphysis); olecranon (appearance); IMT: medial epicondyle, radius head, and ulna head (conjoint epiphysis); olecranon (fuses with shaft of ulna seen in lateral view); IMYA: nil; IMA: nil); (c) Shoulder (IMIT: humerus head, greater tubercle (appearance); IMC: lesser tubercle (appearance); IMPA: humerus head, greater tubercle, and lesser tubercle (conjoint epiphysis); IMPT: acromion process (appearance); IMT: acromion process (appearance and conjoint epiphysis with the scapular); IMYA: humerus head, greater tubercle, and lesser tubercle (conjoint epiphysis fuses with shaft of humer); IMA: nil); (d) Pelvis (IMIT: femur head and greater trochanter (appearance); IMC: ischiopubic ramus (fusion); IMPA: lesser trochanter (appearance); IMPT: lesser trochanter and iliac crest (appearance); IMT: ischial tuberosity (appearance), femur head, greater trochanter, and lesser trochanter (fusion); IMYA: lliac crest and ischial tuberosity (fusion); IMA: nil) [1].
Algorithms 18 00765 g007
Figure 8. Performance of 5-fold cross-validation comparison of the proposed RSSeg model over state-of-the-art models for individual joints with respect to pixel accuracy.
Figure 8. Performance of 5-fold cross-validation comparison of the proposed RSSeg model over state-of-the-art models for individual joints with respect to pixel accuracy.
Algorithms 18 00765 g008
Figure 9. Performance of 5-fold cross-validation analysis of the proposed RSSeg model over state-of-the-art models considering multiple joints.
Figure 9. Performance of 5-fold cross-validation analysis of the proposed RSSeg model over state-of-the-art models considering multiple joints.
Algorithms 18 00765 g009
Figure 10. Performance of the proposed RSSeg model over the existing datasets.
Figure 10. Performance of the proposed RSSeg model over the existing datasets.
Algorithms 18 00765 g010
Figure 11. Performance analysis of the state-of-the-art classification models considering multiple joints with RSSeg and without a segmentation approach for an 80:20 data split.
Figure 11. Performance analysis of the state-of-the-art classification models considering multiple joints with RSSeg and without a segmentation approach for an 80:20 data split.
Algorithms 18 00765 g011
Figure 12. Performance of the proposed RSSeg model over five-fold cross-validation on the multifocal dataset.
Figure 12. Performance of the proposed RSSeg model over five-fold cross-validation on the multifocal dataset.
Algorithms 18 00765 g012
Figure 13. (a) The proposed RSSeg model reduces computational complexity in training and testing, preserving ossification centers, which are highlighted by the yellow bounding box region; (b) the DeepLabV3+ model yields a segmented result with the loss of ossification centers in the region, which are vital for age estimation and highlighted by the red bounding box area.
Figure 13. (a) The proposed RSSeg model reduces computational complexity in training and testing, preserving ossification centers, which are highlighted by the yellow bounding box region; (b) the DeepLabV3+ model yields a segmented result with the loss of ossification centers in the region, which are vital for age estimation and highlighted by the red bounding box area.
Algorithms 18 00765 g013
Table 1. Indian medicolegal class interval based on forensic expert intuition.
Table 1. Indian medicolegal class interval based on forensic expert intuition.
Number of ClassesClass LabelClass Interval (in Years)
1IMIT>0.1 to ≤4
2IMC>4 to ≤7
3IMPA>7 to ≤12
4IMPT>12 to ≤14
5IMT>14 to ≤18
6IMYA>18 to ≤21
7IMA>21
Table 2. Numbers of samples of multiple joints of the Indian population.
Table 2. Numbers of samples of multiple joints of the Indian population.
Ossification JointsIMITIMCIMPAIMPTIMTIMYAIMATotal
Wrist1711352901802282021501356
Elbow1681502301692051581481228
Shoulder2842411982561931691651506
Pelvis9810416121276146183980
Table 3. Performance of 5-fold cross-validation of the proposed RSSeg model over state-of-the-art models for individual joints with respect to pixel accuracy.
Table 3. Performance of 5-fold cross-validation of the proposed RSSeg model over state-of-the-art models for individual joints with respect to pixel accuracy.
Segmentation ModelsWrist (%)Elbow (%)Shoulder (%)Pelvis (%)Overall (%)
U-Net88.5 ± 0.5085.5 ± 0.5188.5 ± 0.1280.3 ± 0.2785.7 ± 0.35
Attention U-Net89.4 ± 0.8988.5 ± 0.5489.9 ± 0.1084.9 ± 0.2588.2 ± 0.45
TransU-Net91.3 ± 0.9089.2 ± 0.5190.9 ± 0.1185.9 ± 0.5689.3 ± 0.53
DeepLabV3+91.9 ± 0.0689.6 ± 0.4891.3 ± 0.1586.9 ± 0.3589.9 ± 0.26
Adaptive Otsu86.577.891.677.483.3
Watershed91.176.592.176.384.0
Proposed RSSeg92.193.789.990.391.5
Table 4. Performance of 5-fold cross-validation analysis of the proposed RSSeg model over state-of-the-art models considering multiple joints.
Table 4. Performance of 5-fold cross-validation analysis of the proposed RSSeg model over state-of-the-art models considering multiple joints.
Segmentation
Method
Jaccard Similarity (%)Dice Coefficient
(%)
Precision
(%)
Recall
(%)
Pixel Accuracy (%)
U-Net80.1 ± 0.42 79.2 ± 0.3886.5 ± 0.3586.0 ± 0.4085.7 ± 0.35
Attention U-Net81.2 ± 0.1279.9 ± 0.5385.9 ± 0.2287.1 ± 0.1788.2 ± 0.45
TransU-Net83.1 ± 0.1378.2 ± 0.1481.5 ± 0.2788.8 ± 0.0989.3 ± 0.53
DeepLabV3+83.4 ± 0.3080.9 ± 0.2887.0 ± 0.2189.2 ± 0.4089.9 ± 0.26
Adaptive Otsu82.377.385.084.283.3
Watershed81.979.583.286.084.0
Proposed RSSeg84.581.488.390.091.5
Table 5. Performance comparison of the paired statistical test on the proposed RSSeg over DeepLabV3+.
Table 5. Performance comparison of the paired statistical test on the proposed RSSeg over DeepLabV3+.
MetricDeepLabV3+ (%)Proposed RSSeg (%)Mean Difference (%)Paired Test Significance (%)
Jaccard Similarity83.4 ± 0.3084.51.1p < 0.05
Dice Coefficient80.9 ± 0.2881.40.5p < 0.05
Precision87.0 ± 0.2188.31.3p < 0.05
Recall89.2 ± 0.4090.00.8p < 0.05
Pixel Accuracy89.9 ± 0.2691.51.6p < 0.05
Table 6. Performance of the proposed RSSeg model over the existing datasets.
Table 6. Performance of the proposed RSSeg model over the existing datasets.
DatasetJaccard Similarity (%)Dice Coefficient (%)Precision (%)Recall (%)Pixel Accuracy (%)
RSNA [8]94.194.896.695.596.7
DHA [9]90.590.491.190.892.1
Elbow [12]94.395.192.994.195.2
Shoulder [10]88.589.390.091.590.3
Pelvis [15]85.384.588.387.188.2
Table 7. Sample distribution of the multiple joints’ dataset for a varied training-validation-testing data split for age estimation.
Table 7. Sample distribution of the multiple joints’ dataset for a varied training-validation-testing data split for age estimation.
Ossification JointsTotal SamplesTrainingValidationTesting
607080201510201510
Wrist13568149491084271203136271204136
Elbow1228737860982245184123246184123
Shoulder150690410541204301226151301226151
Pelvis9805886867841961479819614798
Table 8. Performance analysis of the state-of-the-art classification models considering multiple joints with RSSeg and without a segmentation approach for various data split ratios.
Table 8. Performance analysis of the state-of-the-art classification models considering multiple joints with RSSeg and without a segmentation approach for various data split ratios.
Data SplitEvaluation MetrixModelsWith SegmentationWithout Segmentation
VGG16ResNet50InceptionV3VGG16ResNet50InceptionV3
60:40Accuracy (%)Wrist94.192.489.885.689.382.8
Elbow92.193.292.485.687.384.3
Shoulder93.791.291.884.284.483.1
Pelvis88.990.488.985.583.379.7
Overall92.291.890.785.286.182.5
Precision (%)Wrist93.192.989.285.187.683.2
Elbow92.192.489.586.288.286.3
Shoulder92.188.890.483.681.479.4
Pelvis86.987.986.780.681.278.6
Overall91.190.589.083.984.681.9
Recall
(%)
Wrist92.694.589.884.988.583.2
Elbow92.693.291.186.889.882.6
Shoulder89.690.190.783.686.581.4
Pelvis88.888.386.983.482.180.7
Overall90.991.589.684.786.782.0
F1-score
(%)
Wrist92.893.789.585.0088.0583.2
Elbow92.392.890.386.588.9984.4
Shoulder90.889.490.583.683.980.4
Pelvis87.888.186.881.9881.679.6
Overall90.991.089.384.385.681.9
70:30Accuracy (%)Wrist94.993.991.186.187.784.3
Elbow93.790.991.885.486.384.2
Shoulder92.191.791.586.485.382.6
Pelvis89.989.888.584.58478.6
Overall92.791.690.785.685.882.4
Precision (%)Wrist93.992.488.786.486.883.1
Elbow91.791.889.384.786.585.3
Shoulder91.889.287.985.180.479.3
Pelvis87.586.986.480.279.878.7
Overall91.290.188.184.183.481.6
Recall
(%)
Wrist94.495.189.885.687.783.2
Elbow90.291.588.584.084.580.5
Shoulder91.791.887.785.786.482.2
Pelvis89.788.186.283.782.679.2
Overall91.591.6388.184.885.381.3
F1-score
(%)
Wrist94.193.789.286.087.283.1
Elbow90.991.688.984.385.582.8
Shoulder91.790.587.885.483.380.7
Pelvis88.687.586.381.981.278.9
Overall91.490.888.184.484.381.4
80:20Accuracy (%)Wrist95.994.292.587.889.584.5
Elbow94.693.193.785.888.585.6
Shoulder93.992.591.986.586.983.8
Pelvis90.891.090.685.584.680.1
Overall93.892.792.286.487.483.5
Precision (%)Wrist94.894.290.186.987.984.8
Elbow93.693.890.486.188.286.7
Shoulder92.991.290.885.882.580.7
Pelvis89.288.688.782.481.779.4
Overall92.692.090.085.385.182.9
Recall
(%)
Wrist95.095.191.387.489.083.5
Elbow94.293.592.888.890.184.1
Shoulder91.791.890.785.786.482.2
Pelvis90.789.888.983.983.781.1
Overall92.992.690.986.587.382.7
F1-score
(%)
Wrist94.994.690.787.188.484.1
Elbow93.993.691.687.489.185.4
Shoulder92.391.590.785.784.481.4
Pelvis89.989.288.883.182.780.2
Overall92.892.290.585.986.282.8
Note: Bold values indicate overall model performance across training–testing splits.
Table 9. Performance of the proposed RSSeg model over five-fold cross-validation on the multifocal dataset.
Table 9. Performance of the proposed RSSeg model over five-fold cross-validation on the multifocal dataset.
MetricWrist (%)Elbow (%)Shoulder (%)Pelvis (%)Overall (%)
Accuracy92.0 ± 3.3091.5 ± 3.2290.9 ± 3.1088.5 ± 2.3190.7 ± 2.98
Precision90.8 ± 3.7190.5 ± 3.2987.7 ± 4.0386.0 ± 3.6888.7 ± 3.68
Recall90.5 ± 4.5091.1 ± 3.1388.5 ± 3.6187.9 ± 3.2489.5 ± 3.62
F1-score90.6 ± 3.9890.7 ± 3.1588.0 ± 3.7386.9 ± 3.4489.0 ± 3.58
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Manchegowda, P.; Nageshmurthy, M.; Raju, S.; Rudrappa, D. A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective. Algorithms 2025, 18, 765. https://doi.org/10.3390/a18120765

AMA Style

Manchegowda P, Nageshmurthy M, Raju S, Rudrappa D. A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective. Algorithms. 2025; 18(12):765. https://doi.org/10.3390/a18120765

Chicago/Turabian Style

Manchegowda, Priyanka, Manohar Nageshmurthy, Suresha Raju, and Dayananda Rudrappa. 2025. "A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective" Algorithms 18, no. 12: 765. https://doi.org/10.3390/a18120765

APA Style

Manchegowda, P., Nageshmurthy, M., Raju, S., & Rudrappa, D. (2025). A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective. Algorithms, 18(12), 765. https://doi.org/10.3390/a18120765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop