Next Article in Journal
A Joint Diagnosis Model Using Response Time and Accuracy for Online Learning Assessment
Previous Article in Journal
FFMamba: Feature Fusion State Space Model Based on Sound Event Localization and Detection
Previous Article in Special Issue
Better with Less: Efficient and Accurate Skin Lesion Segmentation Enabled by Diffusion Model Augmentation
 
 
Article
Peer-Review Record

Deep Learning-Based Segmentation for Digital Epidermal Microscopic Images: A Comparative Study of Overall Performance

Electronics 2025, 14(19), 3871; https://doi.org/10.3390/electronics14193871
by Yeshun Yue 1,2, Qihang He 1,2 and Yaobin Zou 2,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Electronics 2025, 14(19), 3871; https://doi.org/10.3390/electronics14193871
Submission received: 23 August 2025 / Revised: 28 September 2025 / Accepted: 28 September 2025 / Published: 29 September 2025
(This article belongs to the Special Issue AI-Driven Medical Image/Video Processing)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents a comparative study of eight deep learning models (FCN-8s, SegNet, UNet, ResUNet, NestedUNet, DeepLabV3+, TransUNet and AttentionUNet) for the segmentation of Digital Epidermal Microscopic (DEM) images. The authors introduce a manually labeled dataset, apply extensive data augmentation and evaluate models across accuracy (DSC, IoU, Recall, Precision) and efficiency (Params, FLOPs, inference time). The paper’s main contribution lies in its systematic benchmark of CNN- and Transformer-based architectures for DEM image segmentation, highlighting trade-offs between accuracy and computational efficiency. The manuscript is relevant, well structured and of clear interest to the field of medical image analysis.

Comments

  1. Introduction: The context is well established. However, it would help to highlight the practical implications of DEM segmentation for clinical or cosmetic applications in more detail.

  2. Methods: Clarify the exact split of data into training/validation/test (percentages are mentioned, but reproducibility would benefit from stating whether the split was random or subject-based).

  3. Methods: The choice of hyperparameters (e.g., learning rate, batch size) is standard but not justified. A short justification for why these were selected (or whether tuning was attempted) would add transparency.

  4. Results: Results are comprehensive. However, adding error bars or variance across runs would improve robustness of the reported performance. Were all results based on a single training run or averaged over multiple runs?

  5. Discussion: The discussion is strong, but it could further connect findings to broader implications: e.g., which models are more suitable for clinical real-time use vs. research contexts.

  6. Conclusion: Clear and concise. However, the limitations section could explicitly mention that the dataset is limited in demographic diversity. This might restrict generalization.

  7. Data Availability: The dataset is newly built but not openly shared, only available “upon reasonable request.” This limits reproducibility for the wider community. Consider clarifying under what conditions the data can be accessed.

  8. Limitations: The study includes only 46 volunteers and 261 original images, which is relatively small. Although augmentation expands the dataset, generalization may still be limited. This limitation should be emphasized more clearly in the discussion.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

• Add a brief introduction between sections and subsections
• The dataset is constructed from 46 volunteers and only 261 local images before data augmentation. This is very small for a deep learning study with eight different architectures. The heavy reliance on artificial augmentation can limit external validity and promote overfitting to artificially created patterns.
• Annotations are performed in Photoshop by "experienced researchers," but neither the number of annotators nor the inter-annotator agreement measure is reported. Without this information, reliability cannot be assessed.
• It is trained with small images (256x256) and evaluated with larger images. This is presented as a way to assess generalization. However, it also introduces a bias: the models were optimized on a domain other than the test domain (artificial crops vs. whole images). This may explain part of the performance differences between architectures, rather than the intrinsic capabilities of the models.
• Although the training conditions are stated to be unified (200 epochs, same learning rate), each model has different architectural requirements (depth, number of parameters, optimization). Using the same hyperparameters may have disadvantaged specific models.
• It is not reported whether there was a systematic search for hyperparameters for each model, which undermines the robustness of the comparisons.
• Ad hoc transformations (max-scaling and reciprocal) are applied to combine heterogeneous metrics in a radar area. This combines accuracy and efficiency into a single score, but lacks a clear justification for the relative weights.
• The differences between models in DSC/IoU are minimal: FCN-8s (0.8690) and AttentionUNet (0.8696) are practically identical in performance. However, AttentionUNet is excessively emphasized as "superior." Please justify.
• Some models with more modern architectures (TransUNet, DeepLabV3+) show inferior performance. This could be due to the small dataset and not necessarily to inadequate architectures. The text does not discuss this aspect.
• FLOPs and parameters are reported, but how they were calculated is not explained. Different implementations may vary in operation count.
• The final ranking, which places FCN-8s as the best overall model, is methodologically weak. Accuracy and efficiency metrics are mixed without a rigorous multi-criteria analysis.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Introduction: 

-Most of the sentences are devoted to image DEMs, their challenges, and deep learning models, but previous research and their results are not specifically reviewed.

-The references ([1] to [17]) are more for explaining the features of DEMs or introducing the models, rather than for systematically critiquing or comparing previous work.

A complete scientific introduction should show:

What previous work has been done?

What are its limitations?

How does the current study fill these gaps?

 Materials and Methods:

Demographic and sampling location information is accurate, but it would be better to state whether the sample size is sufficient for statistical significance or what the limitations are.

The model descriptions are comprehensive, but sometimes the sentences are long and repetitive. For example, in the descriptions of UNet and ResUNet, the same concepts are repeated several times.

It is better to summarize each model in one or two key sentences, and summarize additional details (such as the encoder/decoder, which is the same in all models) in one sentence.

Results: The term "Avg CUDA Time" should be changed to "average GPU inference time" to make it more understandable to the scientific reader.

Conclusions:

The Conclusions section should be short, focused, and summarizing. Some of the sentences about future research are very long and could be presented in a more concise form.

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

After reviewing the comments and the authors' responses, I consider that the authors have addressed most of the observations satisfactorily. Introductory sections have been added, the limitations have been clarified (dataset size, lack of exhaustive optimization, training bias due to image cropping, limitations of the classification system), and the text and figures have been improved. However, the response to the criticism regarding annotation (lack of information on the number of annotators and the level of inter-annotator agreement) is insufficient: they simply justify the use of a single expert annotator without providing any mechanisms for quality validation, leaving this methodological aspect unresolved. Nevertheless, given that they acknowledge the key limitations in the discussion and have enriched the context with comparisons and clarifications, the manuscript is acceptable in this second review.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop