Review Reports
- Priyanka Manchegowda1,
- Manohar Nageshmurthy1,* and
- Suresha Raju1
- et al.
Reviewer 1: Anonymous Reviewer 2: Anonymous
Round 1
Reviewer 1 Report (Previous Reviewer 1)
Comments and Suggestions for AuthorsThe manuscript entitled A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective introduces an innovative methodology for automatic skeletal age estimation in the specific medicolegal context of the Indian population. The authors rightly identify a critical gap resulting from the inadequacy of Western reference models (such as RSNA or DHA) for assessing Asian populations, which exhibit ethnic differences in skeletal growth. The proposed method, Multifocal Region-based Symbolic Segmentation (RSSeg), achieves excellent performance in segmenting X-ray images of four key joints (wrist, elbow, shoulder, pelvis).
In light of the strong novelty and experimental results, but given the need to clarify methodological and organizational details, I recommend acceptance of the manuscript after Major Revisions:
- The manuscript describes the creation of a new, key multifocal dataset. The availability of this Indian population-specific dataset is fundamental for future research and for enabling reproducibility by other scientists. Currently, the manuscript lacks a Data Availability Statement. Required Action: The authors must add a clear Data Availability Statement specifying whether and how the data (5,107 X-ray samples) can be made publicly available (e.g., in a repository) or upon justified request, taking into account any ethical and privacy constraints.
- The pelvic region remains the most problematic, which is attributed to overlapping bones and dense anatomical structures. Although RSSeg achieved 90.3% pixel accuracy for the pelvis, this is the lowest result among the four joints. The improvement in VGG16 classification accuracy in this region (from 85.5% to 90.8%) is also less significant than for the wrist (increase from 87.8% to 95.9%). Required Action: In the Discussion section, expand the analysis of the reasons for weaker performance in the pelvic region and clearly formulate a plan for future actions (e.g., “developing more advanced segmentation methods or region-specific model improvements”) to enhance effectiveness in these anatomically challenging locations.
- The RSSeg methodology relies on empirical thresholds for entropy (E), mean (T1), and standard deviation (T2), which are analyzed individually for each joint to account for its anatomical variance. Required Action: Please provide a more detailed justification for the selection of these thresholds (e.g., $E_{wrist} < 0.1$, $E_{pelvis} < 0.14$). Additionally, discuss in the Discussion section how this empirical nature may affect the model’s generalizability when applied to datasets from other medical facilities with significantly different X-ray image quality.
- Table 8 presents classification results for VGG16, ResNet50, and InceptionV3 for different data splits. Although VGG16 is highlighted in the discussion as the best model (93.8% overall accuracy at 80:20 split), there is no deeper analysis of how RSSeg impacted ResNet50 and InceptionV3 for each of the four joints individually (Wrist, Elbow, Shoulder, Pelvis). Required Action: Please expand the Discussion section to include these details.
- In Table 3 (Overall Pixel Accuracy) for DeepLabV3+, the value is reported as $89.9 \pm 0.26$, whereas in Table 5 it is $89.93 \pm 0.26$. Although the difference is minimal and does not affect conclusions, Required Action: Please ensure consistent rounding throughout the manuscript to maintain full numerical coherence.
- The authors plan to expand the dataset size in future work to refine handling of intra-class variability. Required Action: Please emphasize this commitment in the Conclusion section, as it will strengthen credibility and indicate the direction of future research.
Author Response
We sincerely thank the reviewer 1 for the thorough evaluation of our manuscript and for providing constructive, insightful, and technically valuable comments. The reviewer’s detailed feedback has significantly strengthened the quality, clarity, and scientific rigor of the manuscript. We have carefully addressed each of the raised concerns and made the required revisions accordingly.
Comment 1: The manuscript describes the creation of a new, key multifocal dataset. The availability of this Indian population-specific dataset is fundamental for future research and for enabling reproducibility by other scientists. Currently, the manuscript lacks a Data Availability Statement. Required Action: The authors must add a clear Data Availability Statement specifying whether and how the data (5,107 X-ray samples) can be made publicly available (e.g., in a repository) or upon justified request, taking into account any ethical and privacy constraints.
Response: Thank you for highlighting the importance of a data availability statement. We have now added the data availability statement to the manuscript. The revised statement explains that the dataset of 5,107 anonymized X-ray samples is currently part of an ongoing study and is being prepared for public release. Upon completion, the dataset will be deposited in a secure public repository to support transparency, reproducibility, and broader scientific use. Until the repository is finalized, the data can be made available upon reasonable request, subject to institutional ethical guidelines and privacy regulations.
Comments 2: The pelvic region remains the most problematic, which is attributed to overlapping bones and dense anatomical structures. Although RSSeg achieved 90.3% pixel accuracy for the pelvis, this is the lowest result among the four joints. The improvement in VGG16 classification accuracy in this region (from 85.5% to 90.8%) is also less significant than for the wrist (increase from 87.8% to 95.9%). Required Action: In the Discussion section, expand the analysis of the reasons for weaker performance in the pelvic region and clearly formulate a plan for future actions (e.g., “developing more advanced segmentation methods or region-specific model improvements”) to enhance effectiveness in these anatomically challenging locations.
Response: We thank the reviewer for this valuable suggestion. We have now expanded the Discussion section (page number 21) to provide a more detailed analysis of the factors contributing to the weaker performance in the pelvic region. Specifically, we discuss the inherent anatomical complexity of the pelvis, the overlapping ossification centres, variability in pelvis maturation patterns within the population, and challenges arising from limited contrast and inconsistent radiographic quality. These factors collectively make feature extraction and age-related pattern discrimination more difficult compared to wrist or elbow radiographs.
Comments 3: The RSSeg methodology relies on empirical thresholds for entropy (E), mean (T1), and standard deviation (T2), which are analyzed individually for each joint to account for its anatomical variance. Required Action: Please provide a more detailed justification for the selection of these thresholds (e.g., $E_{wrist} < 0.1$, $E_{pelvis} < 0.14$). Additionally, discuss in the Discussion section how this empirical nature may affect the model’s generalizability when applied to datasets from other medical facilities with significantly different X-ray image quality.
Response: We thank the reviewer for this insightful comment. The manuscript has now been revised to include a detailed justification of the empirical thresholds used in the RSSeg methodology (page 7), and the discussion section has been updated to address the impact of using empirically determined thresholds (page 21).
Comment 4: Table 8 presents classification results for VGG16, ResNet50, and InceptionV3 for different data splits. Although VGG16 is highlighted in the discussion as the best model (93.8% overall accuracy at 80:20 split), there is no deeper analysis of how RSSeg impacted ResNet50 and InceptionV3 for each of the four joints individually (Wrist, Elbow, Shoulder, Pelvis). Required Action: Please expand the Discussion section to include these details.
Response: We thank the reviewer for this valuable comment. We have now expanded how the varied training-testing ratio impacts the performance of models, as explained in the Experimental Setup section 4.3 (page number 17 and 18) and Discussion section (page number 21), to provide a more in-depth analysis of the performance of VGG16, ResNet50, and InceptionV3 across the four joints and how the proposed RSSeg method influenced their classification results.
Comment 5: In Table 3 (Overall Pixel Accuracy) for DeepLabV3+, the value is reported as $89.9 \pm 0.26$, whereas in Table 5 it is $89.93 \pm 0.26$. Although the difference is minimal and does not affect conclusions, Required Action: Please ensure consistent rounding throughout the manuscript to maintain full numerical coherence.
Response: Thank you for pointing this out. We have carefully reviewed all numerical values across the manuscript and ensured consistent rounding. Specifically, we now maintain a uniform single-decimal value for all reported metrics and a two-decimal-place format for error values (e.g., 89.9 ± 0.26%). This revision ensures numerical coherence and consistency throughout the manuscript.
Comment 6: The authors plan to expand the dataset size in future work to refine handling of intra-class variability. Required Action: Please emphasize this commitment in the Conclusion section, as it will strengthen credibility and indicate the direction of future research.
Response: We are dedicated to significantly increasing the dataset's size and diversity to facilitate transparency and reproducibility, and to anonymize the dataset for public release. Which captures intra-class variability across various age groups and anatomical regions will reduce bias from underrepresented age ranges and ossification stages, enabling a better understanding of developmental patterns in subsequent work. Future research will build on this planned data expansion and public distribution to create computational models that are more reliable, broadly applicable, and region-specific.
Reviewer 2 Report (Previous Reviewer 2)
Comments and Suggestions for AuthorsThe authors have addressed most of the major concerns from the first round. The paper is now technically sound and significantly improved.
Author Response
Comment 1: The authors have addressed most of the major concerns from the first round. The paper is now technically sound and significantly improved.
Response: We sincerely thank the reviewer for the positive and encouraging feedback. We are pleased to hear that the revisions have addressed the major concerns and that the manuscript is now considered technically sound and significantly improved. We appreciate the reviewer 2's time and effort throughout the review process, and we believe the manuscript has benefited greatly from the constructive comments provided.
Round 2
Reviewer 1 Report (Previous Reviewer 1)
Comments and Suggestions for AuthorsThank you for your answers
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsReview of the paper titled “A Multifocal RSSeg Approach for Skeletal Age Estimation in an Indian Medicolegal Perspective”:
The paper presents a valuable contribution to the field of automated skeletal age estimation. The authors introduce a Multifocal Region-based Symbolic Segmentation (RSSeg) approach, which stands out for its ability to segment multiple joints while preserving soft tissue regions essential for analyzing growth patterns of ossification centers.
The RSSeg model achieved a promising pixel accuracy of 91.5% and demonstrated superior generalizability across multiple joints compared to competing models. Its integration with the VGG16 classifier further highlighted the importance of segmentation, improving overall accuracy from 86.4% to 93.8%.
A notable strength of the paper is the introduction of a novel dataset. The study focuses on the Indian medicolegal context, utilizing an innovative dataset (5107 samples) covering four key joints (wrist, elbow, shoulder, pelvis), categorized into seven age groups aligned with Indian legal requirements (e.g., IMIT, IMC, IMPA, IMYA).
Below are several recommendations aimed at enhancing the scientific rigor, statistical reliability, and clarity of the paper:
- Data Splitting Strategy: The current experiments use a fixed data split of 80% (training), 10% (validation), and 10% (testing) for each joint. While consistent for comparisons, this single random split may be prone to bias, especially given the noted class imbalance issues. Recommendation: To ensure more robust and generalizable evaluation of RSSeg and its integration, the authors are encouraged to implement stratified k-fold cross-validation (e.g., 5-fold), as practiced in competing studies cited in the paper.
- Statistical Significance of Performance Improvements: The authors frequently claim "significant improvement" in RSSeg’s performance. However, no formal statistical tests are provided to confirm that observed differences in performance metrics (e.g., between RSSeg and DeepLabV3+) are statistically significant rather than due to chance. Recommendation: Perform significance testing (e.g., Student’s t-test or non-parametric tests) for performance comparisons to scientifically substantiate the superiority of the proposed model.
- Pelvic Region Segmentation Challenges: The pelvic region remains the most challenging due to overlapping bones, and the improvement in accuracy post-segmentation is the lowest among all joints. Recommendation: Include visual examples of pelvic region segmentation compared to other models (similar to Figure 4), allowing readers to assess how RSSeg handles complex anatomy. Section 4.4 (Discussion) should also offer concrete suggestions for future RSSeg modifications to better capture the unique features of the pelvic region.
- Threshold Values in Patch Filtering: Thresholds $T1$ (entropy) and $T2$ (mean) are critical for patch filtering and computational load reduction. Recommendation: To ensure full reproducibility, the authors should provide the empirically determined values of these thresholds in Section 3.2.
- Presentation of Evaluation Metrics: Definitions of standard segmentation metrics (Jaccard Similarity, Dice-coefficient, Precision, Recall, Pixel Accuracy) are currently presented as separate numbered subsections (4.2.1–4.2.5). Recommendation: For widely known and standard metrics, separate subsections are unnecessary. Instead, definitions and formulas (Equations 11–15) should be integrated into a single, cohesive Section 4.2 titled “Evaluation Metric” to improve readability and structure.
- Formatting of Figure 5: Figure 5 (“Sample image of various ossification joints of Indian population”) appears disproportionately stretched or distorted, hindering visual assessment of input data quality. Recommendation: Adjust the formatting of Figure 5 to ensure illustrations are clear, proportionally accurate, and properly represent the visual aspect of the dataset.
None
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript has serious issues that should be addressed before resubmitting elsewhere:
- The introduction is long and descriptive but lacks focus. It provides background on skeletal age estimation and medicolegal context, but the research gap is not clearly highlighted. The novelty of RSSeg over existing segmentation methods is not convincingly justified.
- The dataset is collected from Indian hospitals but is relatively small and imbalanced across age groups and joints. The authors do not explain how sampling bias or class imbalance might affect generalizability of the model.
- The ground truth was created using Photoshop without mention of radiologist/expert verification. This raises questions about reliability of the reference standard for segmentation.
- Many mathematical formulas for segmentation (entropy, patch thresholds, symbolic representation) are presented, but there is little explanation of why these choices were made and how hyperparameters (e.g., thresholds T1, T2) were determined.
- Only a few models (U-Net, DeepLabV3+, Otsu, Watershed) were compared. Other modern architectures (Swin U-Net, Attention U-Net, transformer-based methods) were omitted, weakening the claim of superiority.
- Performance is reported on pixel accuracy, Jaccard, Dice, precision, and recall, but no statistical significance testing (confidence intervals, p-values) is provided. The improvements over benchmarks are small and may not be meaningful.
- Although RSNA, DHA, and some joint datasets were tested, it is unclear how images were preprocessed to match the Indian dataset. Domain shift and reproducibility issues are not addressed.
- The paper emphasizes medicolegal importance but does not explain how this model would be practically used in court cases or clinical workflows. Without explainability, the model may not be trusted in forensic settings.
- The segmentation method is complex, but no visualization of what features drive age classification is provided. This is critical for medical and legal adoption.
- Much of the discussion re-states numerical results instead of critically analyzing limitations, sources of error, or potential biases in the dataset.
- The paper frames the work as important for “justice” (SDG 16), but it does not sufficiently discuss the ethical risks of misclassification in medicolegal cases, nor safeguards needed before deployment.
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors propose a method for segmentation in X-ray images, to remove the background from the regions of interest (bone, joints) from the background. The method is focused on improving the performance of classification models that are trained for biological age estimation. The proposed method uses a hierarchical patch-processing strategy to preserve soft-tissue areas. Results indicate an improvement over traditional segmentation approaches, when pair with classical classification architectures. Here is a list of comments about the manuscript.
- Please do a revision of the document to remove small typos and issues with citations.
- The authors describe a series of methos for segmenting joints on x-ray images and indicate that ROI segmentation, with automatic techniques, is underrepresented and underperforming for accurate age estimation. Did the authors perform a revision of segmentations models for other soft tissues and if their capabilities correctly translate to joint segmentation? As an example: methods for lungs, heart, breast tissue segmentation.
- Authors compared with classic methos for semantic segmentation, could authors provided a comparison performance to at least a more recent segmentation architecture for medical tasks such (but not necessarily any of these):
https://www.nature.com/articles/s41598-024-71025-x
https://link.springer.com/chapter/10.1007/978-3-319-10404-1_100
https://www.sciencedirect.com/science/article/abs/pii/S1361841514000048?via%3Dihub
- Authors indicate that the dataset consisted of 5107 x-rays, and from those 37 were used for testing. The set imbalance between training/val is too big, please test with a representative test set that is bigger which could indicate a more accurate representation of the whole training data. Usual splits are 70/10/20 or 80/10/10 (train, val, test).